July 25, 2021
There's not exactly one specific source that covers everything you might want to know while writing a Ruby native extension. Rather, there are a few different sources that each have different specialties, and you can usually figure out what you need to know through some combination of them. Thus, this is mainly a list of resources with short summaries of what you might be able to look up in them. I also give a couple answers to miscellaneous questions I had to really hunt for, and some of my own tips on using C++ or Rust in place of C if you're interested in that, as well as a bit on writing JRuby extensions.
If you've never written a native extension at all and need to get your bearings, two very useful starting points are the "Gems with Extensions" guide from RubyGems and the "Example - Creating the dbm Extension" tutorial from the Ruby documentation. With any luck this article will be useful to you afterwards, when you find yourself with specific points of confusion.
This article is part of the official Ruby documentation and is
usually the best place to look first. It covers the relationship
between Ruby types and the C types supplied by the Ruby header
(including conversion functions), the C APIs of some of the most
ubiquitous Ruby types like Array
and String
, how to use Ruby's
features from C (like defining classes and methods), and various
strategies for passing information between Ruby and C. It also
includes a short example describing the creation of an extension from
start to finish, as well as six appendices: an overview of the Ruby
interpreter source files, a terse but decently fleshed-out API
reference, a link to the
MakeMakefile
documentation for writing your extconf.rb
, two appendicies with
advice about working with the garbage collection features, and some
information about getting along with Ruby's new
Ractor
class
for thread-safe parallelism.
This might seem like everything you would want to know, but this document can be a little light on details sometimes. Thankfully, if it doesn't answer your question, there're a few other places you can look.
This is a particularly thorough unofficial guide to the Ruby-C interface. Beyond just giving a tour of the C API, it also has information on embedding a Ruby interpreter and a couple of nicely-detailed longish-form examples. Much of the reference information can be found in the official Ruby docs somewhere, but it does a good job of gathering related information together concisely, and some of the information is the product of level-headed header diving on the author's part. Makes a nice supplement to "Creating Extension Libraries for Ruby".
This is a huge list of functions and macros in Ruby's C API from the
official documentation; all have their type signatures documented and
many also have text documentation. Lots of the functions mentioned in
it aren't described in "Creating Extension Libraries for Ruby", so if
you're trying to figure out how to work with a function that's
missing from that document, it's likely to be here. As a general
rule, also, even the functions that do get a mention there are
described more comprehensively here, so I usually turn to this page
first if I want to know about a specific function. The text
descriptions are in Japanese, but I imagine they would be amenable to
machine translation if needed since they're pretty cut-and-dried.
Note that functions which don't begin with rb_
are not made
available to native extensions through ruby.h
, at least as a
general rule.
It's unfortunate, but none of the pages above fully documents
everything in the Ruby C API. For some situations, you just have to
read the headers to figure out how to do what you want to do. It
might feel spooky to depend on something that isn't in the above
docs, but most things are pretty reliable and future-proof as long
they're brought in by ruby.h
and aren't marked as deprecated. I
know that sounds ominous, but the deprecation warnings stick around
for a long time; the Ruby devs do worry about breaking backwards
compatibility as many of Ruby's most popular gems include a
significant amount of C code, and it's not uncommon for gems to
depend on functions that are minimally documented. Here're a few
headers in particular you might find interesting off-the-cuff:
include/ruby/internal/globals.h
: Symbols for stuff in the Ruby core, like the standard data types and exceptions and things.include/ruby/internal/special_consts.h
:
Useful stuff related to immediate
values, like the
Q* constants (Qfalse
etc.) and the *_P macros for checks on
immediate value type (NIL_P()
and so on). Note that if you want to
check if something is an immediate value in general, you should use
SPECIAL_CONST_P()
instead of IMMEDIATE_P()
as IMMEDIATE_P()
returns false for Qnil
and Qfalse
(aside from their definitions
in that header, see
doc/ChangeLog-1.9.3:59198
).~/.rbenv/versions/3.0.2/include/
in my case right now) under
ruby-3.0.0/x86-64-linux/rb_mjit_min_header-3.0.2.h
. Your is code is
unlikely to actually build with this header included unless designed
around that, just in case you were curious, as it wasn't designed for
general use. It does includes a huge swath of the Ruby API in one
place, though, making it interesting to browse through despite its
density.The C API
tests
can be very helpful to consult if you have a question that the API
docs don't answer satisfactorily. The C portions of the
implementations of the base classes are also very useful to consult
as an extension author to get a sense of how everything works
together (look for <class name>{.c,.h}
files in the root directory
of the repo) as are as the contents of the
ext
directory which
contains the native extensions for the standard library gems. Aside
from reading the source code directly, you've probably noticed that
the English-language ruby-doc.org
documentation lets you view the source for every method that comes
with the language, including those implemented in C. If you want to
know if there's a special way to call a certain method on an instance
of one of the core Ruby classes or the like via its C API, the
fastest way to figure that out is sometimes to look up the method on
ruby-doc.org and view its source there. Be aware that some parts of
the Ruby core depend on things that aren't made available to
extension developers.
This is a detailed, book-length exposition of the behavior and implementation of the Ruby interpreter from a C perspective, written by the fun-to-read Minero Aoki, author of some of the Ruby standard library classes and such (his original Japanese-language book is here). Its main shortcoming is its venerable age—it's contemporaneous with Ruby 1.7.3, meaning it was current around 2003–4 or so. Despite this, much of the material is still broadly useful, especially if you have some sense of how Ruby has changed over time and can read between the lines a little. If you need information on communicating with the interpreter and "Creating Extension Libraries for Ruby" doesn't answer your question, this book might do the trick, with a bit of luck.
If you've already looked in the linked sources and haven't found the answer to your question yet, perhaps one of these will help.
If you want to talk to a module, class, constant, etc. that's already
in scope on the Ruby side, the technique differs based on whether or
not the value in question is part of the Ruby core. If it is, you can
use one of the symbols in
include/ruby/internal/globals.h
.
If it's not, you can use VALUE rb_const_get(VALUE klass, ID
name)
,
which is comparable to klass::name
in Ruby. The name
parameter
in this case permits the literal use of "::
" (i.e. you could pass
rb_intern("MyModule::MyClass")
for the name
parameter and get MyClass
back, assuming that
MyModule
was contained within klass
). For values defined
at the top level, use
rb_const_get(rb_cObject, ID name)
,
i.e. Object::name
, which is equivalent to plain name
. Putting it
all together, if you had a class Pigeon
in an Animals
module at
the top level, you could retrieve it via rb_const_get(rb_cObject,
rb_intern("Animals::Pigeon"))
.
There are also functions rb_const_get_at(VALUE klass, ID
name)
,
which looks up name
in klass
only (i.e. it ignores its ancestors
and the top level), and rb_const_get_from(VALUE klass, ID
name)
,
which does check klass
's ancestors but not the top level. These are
obviously rather niche—in the Ruby 3.0.2 source, rb_const_get_at()
is used in a few places to fish out internal helper classes like
Racc::Parser
and
things that should only be defined at the top level like
SCRIPT_LINES__
and such, and rb_const_get_from()
is only used as part of larger
constant lookup routines in the definition of
Module#const_defined?
and
Module#const_get
.
Note that if you specifically want to create a new instance of a core
class, there are special-case functions/macros you can use for many
of them that make this quick and easy.
Here is a list of all of them as of
Ruby 3.0.2. The header file names there are mainly just for the sake
of clarity, as these should generally be available just from
including ruby.h
. If you want to generate a list like this in your
own environment, here's a
script.
If you want to have a C function be called when the program exits,
you can use void rb_set_end_proc(void (*func)(VALUE), VALUE
data)
.
Like
Kernel#at_exit
,
this adds the function to a list of functions which are called in
reverse of the order which they were added. As you can see, the first
argument should be the address of a function with signature void
func(VALUE data)
, i.e. the function must take a single VALUE
argument
even if it does nothing with it, although it's fine if you pass
Qnil
for rb_set_end_proc()
's second argument in that case.
Alternatively, you could theoretically use void
ruby_vm_at_exit(void(*func)(ruby_vm_t *))
, which runs at the very
tail end of the Ruby VM in question's shutdown process and takes a
pointer to the dead VM. You have to include ruby/vm.h
to use this,
and most of the Ruby API won't work from here, although you could
conceivably inspect the VM's state using the functions in
internal/vm.h
or the like if you're working on Ruby itself. Note that, as with
Kernel#at_exit
, the registered function in both of these cases will
not be called if the Ruby process is abruptly terminated via
SIGKILL
or
Kernel#exit!
or the like, although it will be called in the case of SIGTERM
or
an unhandled exception or what have you.
If you like, you can also write your extension in C++, Rust, or any other language that supports the use of C linkage. If you're using JRuby, you can't use C native extensions, but you can write native extensions in Java, or another language that runs on the JVM.
Using C++ in place of C has official support and is very easy. All
you really need to do is declare your Ruby entry point function
extern "C"
. You can safely include ruby.h
from C++ code, and you
don't necessarily need to change anything about your extconf.rb
as
MakeMakefile
is C++-aware.
There is a native extension test project called
cxxanyargs
in the Ruby source repository at this time, which is testing correct
ANYARGS
behavior in C++ (as you might expect :P). It's a full native
extension with an extconf.rb
and so on, and is of a very nice size
and level of complexity to be used a reference for writing your own
C++ native extensions.
Something worth noting that doesn't seem to be documented anywhere is
that as of Ruby 2.7
MakeMakefile
has a class method [](name)
which takes the name of a programming
language and returns an extension of MakeMakefile
more thoroughly
customized for that language; the only language that it has built-in
support for right now is C++. In other words, you can get this
C++-oriented version of MakeMakefile
via MakeMakefile['C++']
.
MakeMakefile
by default supports the compilation of both C and C++
files, so you don't necessarily need to use this, but the
MakeMakefile['C++']
version of MakeMakefile
is modified such that
#try_link?
and #try_compile?
and other such checks will use the
platform's C++ compiler instead of the C compiler. This can be useful
to verify things about the C++ environment on the current platform;
the cxxanyargs
native extension mentioned above does this. Since
this feature is undocumented right now, it may change in the near
future; the commit in which it was
added
is marked as [EXPERIMENTAL], although it's stayed put for nearly two
years now.
For Rust, I get the impression that the easiest approach at the
moment is to use the rutie
crate. It's pretty easy to get set up—a little bit more work than a
traditional native extension but not by much, as I got the tutorial
working in a few minutes. One thing to note is that it doesn't hook
into the native extension pathway on its own, so if you want to
distribute a gem using it you'll need to write a skeleton
extconf.rb
and Makefile
to trigger cargo's build process during
gem install. You can take a look at how the hypothesis-specs gem
does
this
for an example.
As a side note, that's the most popular gem making use of rutie at the time of writing with ~13k downloads and the team describes it as in early alpha, which gives you some sense of how widely-used rutie is right now. If you want to make heavy-duty use of rutie it's possible that you'll need to spend some time contributing to it to help iron out what kinks it may have.
There was also a crate called
Helix for writing native
extensions, but the project was abandoned in Oct. 2020, so you may
need to use old versions of Rust and Ruby etc. if you want to make
use of it. Of course, you can also just use the Rust
FFI, although you'll
need to get creative with your extconf.rb
, Rakefile
, etc. to
have your gem get along with Cargo. Be warned if you go the raw FFI
route that talking back-and-forth with C in Rust has the potential to
be a painful undertaking,
especially if you're new to Rust (and/or C).
The JRuby wiki has a Maven-based tutorial and example project for doing this. Their wiki also links to this blog post, which is rather old by now but might still be useful. You could also look at how Puma does it, as they have a Java-based native extension in place for JRuby compatability; this might be an especially useful example if you want to offer a C native extension for CRuby and a Java native extension for JRuby side-by-side in the same gem.
♦️♦️♦️