Ruby native extension reference materials

July 25, 2021

There's not exactly one specific source that covers everything you might want to know while writing a Ruby native extension. Rather, there are a few different sources that each have different specialties, and you can usually figure out what you need to know through some combination of them. Thus, this is mainly a list of resources with short summaries of what you might be able to look up in them. I also give a couple answers to miscellaneous questions I had to really hunt for, and some of my own tips on using C++ or Rust in place of C if you're interested in that, as well as a bit on writing JRuby extensions.

If you've never written a native extension at all and need to get your bearings, two very useful starting points are the "Gems with Extensions" guide from RubyGems and the "Example - Creating the dbm Extension" tutorial from the Ruby documentation. With any luck this article will be useful to you afterwards, when you find yourself with specific points of confusion.

"Creating Extension Libraries for Ruby"

This article is part of the official Ruby documentation and is usually the best place to look first. It covers the relationship between Ruby types and the C types supplied by the Ruby header (including conversion functions), the C APIs of some of the most ubiquitous Ruby types like Array and String, how to use Ruby's features from C (like defining classes and methods), and various strategies for passing information between Ruby and C. It also includes a short example describing the creation of an extension from start to finish, as well as six appendices: an overview of the Ruby interpreter source files, a terse but decently fleshed-out API reference, a link to the MakeMakefile documentation for writing your extconf.rb, two appendicies with advice about working with the garbage collection features, and some information about getting along with Ruby's new Ractor class for thread-safe parallelism.

This might seem like everything you would want to know, but this document can be a little light on details sometimes. Thankfully, if it doesn't answer your question, there're a few other places you can look.

"The Definitive Guide to Ruby's C API"

This is a particularly thorough unofficial guide to the Ruby-C interface. Beyond just giving a tour of the C API, it also has information on embedding a Ruby interpreter and a couple of nicely-detailed longish-form examples. Much of the reference information can be found in the official Ruby docs somewhere, but it does a good job of gathering related information together concisely, and some of the information is the product of level-headed header diving on the author's part. Makes a nice supplement to "Creating Extension Libraries for Ruby".

"関数一覧 (Function catalog)"

This is a huge list of functions and macros in Ruby's C API from the official documentation; all have their type signatures documented and many also have text documentation. Lots of the functions mentioned in it aren't described in "Creating Extension Libraries for Ruby", so if you're trying to figure out how to work with a function that's missing from that document, it's likely to be here. As a general rule, also, even the functions that do get a mention there are described more comprehensively here, so I usually turn to this page first if I want to know about a specific function. The text descriptions are in Japanese, but I imagine they would be amenable to machine translation if needed since they're pretty cut-and-dried. Note that functions which don't begin with rb_ are not made available to native extensions through ruby.h, at least as a general rule.

The Ruby C headers

It's unfortunate, but none of the pages above fully documents everything in the Ruby C API. For some situations, you just have to read the headers to figure out how to do what you want to do. It might feel spooky to depend on something that isn't in the above docs, but most things are pretty reliable and future-proof as long they're brought in by ruby.h and aren't marked as deprecated. I know that sounds ominous, but the deprecation warnings stick around for a long time; the Ruby devs do worry about breaking backwards compatibility as many of Ruby's most popular gems include a significant amount of C code, and it's not uncommon for gems to depend on functions that are minimally documented. Here're a few headers in particular you might find interesting off-the-cuff:

include/ruby/internal/globals.h: Symbols for stuff in the Ruby core, like the standard data types and exceptions and things.
include/ruby/internal/special_consts.h: Useful stuff related to immediate values, like the Q* constants (Qfalse etc.) and the *_P macros for checks on immediate value type (NIL_P() and so on). Note that if you want to check if something is an immediate value in general, you should use SPECIAL_CONST_P() instead of IMMEDIATE_P() as IMMEDIATE_P() returns false for Qnil and Qfalse (aside from their definitions in that header, see doc/ChangeLog-1.9.3:59198).
The MJIT header: This is the header that Ruby's MJIT includes in the C code it generates. The reason I mention it is mainly as a curiosity (unless you're hacking away on the MJIT, of course). It's actually platform-specific and generated at build time when you compile Ruby, so if you want to do anything with it by hand you'll have to go hunting for it. For me it's in Ruby's include directory (~/.rbenv/versions/3.0.2/include/ in my case right now) under ruby-3.0.0/x86-64-linux/rb_mjit_min_header-3.0.2.h. Your is code is unlikely to actually build with this header included unless designed around that, just in case you were curious, as it wasn't designed for general use. It does includes a huge swath of the Ruby API in one place, though, making it interesting to browse through despite its density.

The rest of the Ruby source

The C API tests can be very helpful to consult if you have a question that the API docs don't answer satisfactorily. The C portions of the implementations of the base classes are also very useful to consult as an extension author to get a sense of how everything works together (look for <class name>{.c,.h} files in the root directory of the repo) as are as the contents of the ext directory which contains the native extensions for the standard library gems. Aside from reading the source code directly, you've probably noticed that the English-language ruby-doc.org documentation lets you view the source for every method that comes with the language, including those implemented in C. If you want to know if there's a special way to call a certain method on an instance of one of the core Ruby classes or the like via its C API, the fastest way to figure that out is sometimes to look up the method on ruby-doc.org and view its source there. Be aware that some parts of the Ruby core depend on things that aren't made available to extension developers.

Ruby Hacking Guide

This is a detailed, book-length exposition of the behavior and implementation of the Ruby interpreter from a C perspective, written by the fun-to-read Minero Aoki, author of some of the Ruby standard library classes and such (his original Japanese-language book is here). Its main shortcoming is its venerable age—it's contemporaneous with Ruby 1.7.3, meaning it was current around 2003–4 or so. Despite this, much of the material is still broadly useful, especially if you have some sense of how Ruby has changed over time and can read between the lines a little. If you need information on communicating with the interpreter and "Creating Extension Libraries for Ruby" doesn't answer your question, this book might do the trick, with a bit of luck.

A couple things that aren't that clear from any of these

If you've already looked in the linked sources and haven't found the answer to your question yet, perhaps one of these will help.

Talking to things already in scope in Ruby

If you want to talk to a module, class, constant, etc. that's already in scope on the Ruby side, the technique differs based on whether or not the value in question is part of the Ruby core. If it is, you can use one of the symbols in include/ruby/internal/globals.h. If it's not, you can use VALUE rb_const_get(VALUE klass, ID name), which is comparable to klass::name in Ruby. The name parameter in this case permits the literal use of "::" (i.e. you could pass rb_intern("MyModule::MyClass") for the name parameter and get MyClass back, assuming that MyModule was contained within klass). For values defined at the top level, use rb_const_get(rb_cObject, ID name), i.e. Object::name, which is equivalent to plain name. Putting it all together, if you had a class Pigeon in an Animals module at the top level, you could retrieve it via rb_const_get(rb_cObject, rb_intern("Animals::Pigeon")).

There are also functions rb_const_get_at(VALUE klass, ID name), which looks up name in klass only (i.e. it ignores its ancestors and the top level), and rb_const_get_from(VALUE klass, ID name), which does check klass's ancestors but not the top level. These are obviously rather niche—in the Ruby 3.0.2 source, rb_const_get_at() is used in a few places to fish out internal helper classes like Racc::Parser and things that should only be defined at the top level like SCRIPT_LINES__ and such, and rb_const_get_from() is only used as part of larger constant lookup routines in the definition of Module#const_defined? and Module#const_get.

Note that if you specifically want to create a new instance of a core class, there are special-case functions/macros you can use for many of them that make this quick and easy. Here is a list of all of them as of Ruby 3.0.2. The header file names there are mainly just for the sake of clarity, as these should generally be available just from including ruby.h. If you want to generate a list like this in your own environment, here's a script.

Hooking into exit

If you want to have a C function be called when the program exits, you can use void rb_set_end_proc(void (*func)(VALUE), VALUE data). Like Kernel#at_exit, this adds the function to a list of functions which are called in reverse of the order which they were added. As you can see, the first argument should be the address of a function with signature void func(VALUE data), i.e. the function must take a single VALUE argument even if it does nothing with it, although it's fine if you pass Qnil for rb_set_end_proc()'s second argument in that case.

Alternatively, you could theoretically use void ruby_vm_at_exit(void(*func)(ruby_vm_t *)), which runs at the very tail end of the Ruby VM in question's shutdown process and takes a pointer to the dead VM. You have to include ruby/vm.h to use this, and most of the Ruby API won't work from here, although you could conceivably inspect the VM's state using the functions in internal/vm.h or the like if you're working on Ruby itself. Note that, as with Kernel#at_exit, the registered function in both of these cases will not be called if the Ruby process is abruptly terminated via SIGKILL or Kernel#exit! or the like, although it will be called in the case of SIGTERM or an unhandled exception or what have you.

Using languages other than C

If you like, you can also write your extension in C++, Rust, or any other language that supports the use of C linkage. If you're using JRuby, you can't use C native extensions, but you can write native extensions in Java, or another language that runs on the JVM.

C++

Using C++ in place of C has official support and is very easy. All you really need to do is declare your Ruby entry point function extern "C". You can safely include ruby.h from C++ code, and you don't necessarily need to change anything about your extconf.rb as MakeMakefile is C++-aware.

There is a native extension test project called cxxanyargs in the Ruby source repository at this time, which is testing correct ANYARGS behavior in C++ (as you might expect :P). It's a full native extension with an extconf.rb and so on, and is of a very nice size and level of complexity to be used a reference for writing your own C++ native extensions.

Something worth noting that doesn't seem to be documented anywhere is that as of Ruby 2.7 MakeMakefile has a class method [](name) which takes the name of a programming language and returns an extension of MakeMakefile more thoroughly customized for that language; the only language that it has built-in support for right now is C++. In other words, you can get this C++-oriented version of MakeMakefile via MakeMakefile['C++']. MakeMakefile by default supports the compilation of both C and C++ files, so you don't necessarily need to use this, but the MakeMakefile['C++'] version of MakeMakefile is modified such that #try_link? and #try_compile? and other such checks will use the platform's C++ compiler instead of the C compiler. This can be useful to verify things about the C++ environment on the current platform; the cxxanyargs native extension mentioned above does this. Since this feature is undocumented right now, it may change in the near future; the commit in which it was added is marked as [EXPERIMENTAL], although it's stayed put for nearly two years now.

Rust

For Rust, I get the impression that the easiest approach at the moment is to use the rutie crate. It's pretty easy to get set up—a little bit more work than a traditional native extension but not by much, as I got the tutorial working in a few minutes. One thing to note is that it doesn't hook into the native extension pathway on its own, so if you want to distribute a gem using it you'll need to write a skeleton extconf.rb and Makefile to trigger cargo's build process during gem install. You can take a look at how the hypothesis-specs gem does this for an example.

As a side note, that's the most popular gem making use of rutie at the time of writing with ~13k downloads and the team describes it as in early alpha, which gives you some sense of how widely-used rutie is right now. If you want to make heavy-duty use of rutie it's possible that you'll need to spend some time contributing to it to help iron out what kinks it may have.

There was also a crate called Helix for writing native extensions, but the project was abandoned in Oct. 2020, so you may need to use old versions of Rust and Ruby etc. if you want to make use of it. Of course, you can also just use the Rust FFI, although you'll need to get creative with your extconf.rb, Rakefile, etc. to have your gem get along with Cargo. Be warned if you go the raw FFI route that talking back-and-forth with C in Rust has the potential to be a painful undertaking, especially if you're new to Rust (and/or C).

Java / JRuby

The JRuby wiki has a Maven-based tutorial and example project for doing this. Their wiki also links to this blog post, which is rather old by now but might still be useful. You could also look at how Puma does it, as they have a Java-based native extension in place for JRuby compatability; this might be an especially useful example if you want to offer a C native extension for CRuby and a Java native extension for JRuby side-by-side in the same gem.

♦️♦️♦️