I’ve been wanting to start exploring the MRI Ruby source code for quite some time now, but didn’t quite know where to start. I wanted to devise a way of figuring out what parts of the Ruby source code were important, so if I were to run a “Hello, world!” program I could see exactly what the Ruby VM had to do to execute that.
Prerequisite Knowledge: It would help if you’ve used GDB before, but it should be self explanatory if you haven’t. If you’ve never used Ruby this post might give you tips on how to explore language code bases, but other than that it’ll probably be uninteresting. It’ll also help you if you have a vague idea of what a “symbol table” is in a binary file.
This is going to be quite meandering. I’m basically just going to be picking interesting things in the differences between certain running programs and looking at the C source to figure out what’s happening. If that sounds like your cup of tea, let’s do it! :)
What I was looking to find out was the path of execution Ruby followed when
interpreting and running a simple
puts "Hello, world!". I wanted a list of C
functions, in the order they were called, from start to finish.
ltrace, dtrace, strace
The suite of
*trace tools didn’t quite fit my purposes.
ltrace only traces
calls into dynamic libraries (through some really clever symbol stubbing, might
look more into how that works and write a post on it at some point).
probably would be able to do what I want but I didn’t have the patience to learn
it inside out (plus I wanted to check the difference between Ruby 1.9 and Ruby
2.0, which I wouldn’t be able to do without modifying the 1.9 source code).
strace only traces system calls and, while that’s useful, I’m specifically
interested in the C calls from inside the Ruby code base.
After a fair bit of Googling, I came across a
that explained how to do exactly what I wanted using a really simple
script. This is an example of what such a script would look like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
The idea here is to set breakpoints on functions that you’re interested in and print out the latest item in the backtrace whenever the breakpoint is hit, then just continue. Easy peasy.
The slight complication is that I’m interested in all of the C functions I can
get my hands on. We can check how many functions Ruby has by using the
$ nm `which ruby` nm: /home/sam/.rbenv/shims/ruby: File format not recognized
Ahh yes. Balls. I’m using
rbenv to manage multiple Ruby installs on one
machine, so the
ruby in my path isn’t actually a compiled executable file,
it’s a bash script that sets up some environment variables and then calls some
other executable to find which version of Ruby it needs to hit.
rbenv developers are good people and offer an
command that finds the real executable for us:
$ nm `rbenv which ruby` Lots and lots of symbols...
Let’s apply some filtering. Each line that
nm outputs has the following
<memory address> <symbol type> <symbol name>
We’re only interested in functions, which according to the
nm man pages is a
symbol type of “t”. So let’s try and get a better count:
$ nm `rbenv which ruby` | grep " t " | wc -l 4546
That’s using Ruby 2.0.0-p247. There’s quite a large difference between 2.x and 1.9.x, check this out:
$ rbenv local 1.9.3-p448 $ nm `rbenv which ruby` | grep " t " | wc -l 3825
Not sure what conclusions we can draw from that, but it’s interesting nonetheless.
Auto generating the gdb script
At this point I’d hope nobody would have any intention of hand-crafting
thousands of lines of
gdb script, so let’s hack together something that
generates it for us:
1 2 3 4 5 6 7 8 9 10 11 12 13
Who needs error checking? This beauty will output the necessary script block for each and every function inside the Ruby executable.
Gluing it all together
First, we have to generate our gdb script. We’ll do that separately:
$ ruby gdbscript.rb `rbenv which ruby` > script.gdb
Next, we need to invoke gdb with our script and the necessary Ruby program. I
did it as a one-liner and wrapped it around the
time program to see how long
it would take. It ended up looking like this:
$ time gdb --batch -x script.gdb --args `rbenv which ruby` -e "puts 'hello'"
Now, I had no idea how long this would take. I certainly didn’t think it would take this long for 1.9.3:
real 339m14.492s user 196m39.789s sys 147m3.939s
Unfortunately, I didn’t make a note of how long 2.0 took (I fell asleep). It took all night and a lot of the next day, though. Easily over twice as long as 1.9.3, which completely dashes all hopes of this being a remotely useful way of figuring out what Ruby is doing under the hood. However, we’ve collected some potentially fun data, so let’s soldier on.
Analysing the output
I ran this 4 times. 2 different Ruby programs over both of the Ruby versions,
2.0 and 1.9.3. The two Ruby programs were the simple
puts 'hello' and, for
exit 42. I saved all of the output to files, check out
how big they are (keep in mind this is one function call per line):
$ wc -l * 697049 1.9.3_exit42.out 696583 1.9.3_hello.out 1991463 2.0.0_exit42.out 1990666 2.0.0_hello.out
Firstly, it’s a lot of lines regardless. Secondly, look at the difference between Ruby 1.9.3 and Ruby 2.0.0! There’s almost 3 times more lines for Ruby 2.0. You generally find that Ruby 2.0 starts up a lot faster than Ruby 1.9, though… I wonder what’s going on.
So what does the output look like?
Here’s some example output:
$ head -n 20 rtrace-1.9.3-exit42.out #0 0x00007ffff7bc5990 in frame_dummy () from /usr/lib/libpthread.so.0 #0 0x00007ffff7bc5910 in register_tm_clones () from /usr/lib/libpthread.so.0 #0 0x0000000000414d20 in frame_dummy () #0 0x0000000000414cc0 in register_tm_clones () #0 main (argc=3, argv=0x7fffffffe678) at main.c:31 #0 fill_standard_fds () at ruby.c:1842 #0 0x0000000000589100 in fstat () #0 0x0000000000589100 in fstat () #0 0x0000000000589100 in fstat () #0 get_stack (addr=0x7fffffffe548, size=0x7fffffffe550) at thread_pthread.c:487 #0 get_stack (addr=0x7fffffffe448, size=0x7fffffffe450) at thread_pthread.c:487 #0 Init_BareVM () at vm.c:2224 #0 vm_init2 (vm=0x846f20) at vm.c:1670 #0 rb_objspace_alloc () at gc.c:432 #0 Init_native_thread () at thread_pthread.c:436 #0 native_thread_init (th=0x847520) at thread_pthread.c:450 #0 native_cond_initialize (cond=0x8475b0, flags=1) at thread_pthread.c:242 #0 ruby_thread_set_native (th=0x847520) at thread_pthread.c:428 #0 native_mutex_initialize (lock=0x83e6c0 <signal_thread_list_lock>) at thread_pthread.c:219 #0 mutex_debug (msg=0x5b33f4 "init", lock=0x83e6c0 <signal_thread_list_lock>) at thread_pthread.c:178
The first few bits have memory addresses rather than symbol names and I’m not 100% sure why that is. You don’t see it in the rest of the file, though, so I assume it’s some voodoo that’s done during the bootstrapping of the process. Explanations welcome!
Working with the data
Let’s do something obvious to begin with: which C functions get called the most:
$ cat rtrace-1.9.3-exit42.out | grep "#0 " | cut -d " " -f 3 | sort | uniq -c | sort -nr | head 43031 parser_nextc 22583 us_ascii_mbc_enc_len 22529 parser_tokadd 21272 vm_malloc_fixup 21197 vm_malloc_prepare 20378 insn_len 19841 vm_xmalloc 16910 rb_enc_dummy_p 15822 parser_tokadd_mbchar 14751 compile_data_alloc
The file name should be self explanatory. I’m grepping for lines beginning with “#0 ” because those are the backtrace lines (there are a few lines in there that aren’t backtrace lines, when threads die and whatnot). Then I’m grabbing the 3rd space separated field, sorting it, counting it, sorting it again and grabbing the first 10.
No real surprises… Parsing/lexing/memory/encoding stuff. Let’s compare it against Ruby 2.0:
$ cat rtrace-2.0.0-exit42.out | grep "#0 " | cut -d " " -f 3 | sort | uniq -c | sort -nr | head 122836 parser_nextc 72578 markable_object_p 67072 GET_THREAD 64400 parser_tokadd 62980 mbc_enc_len 61260 gc_mark_ptr 56964 gc_mark 52222 insn_len 43518 parser_tokadd_mbchar 37379 compile_data_alloc
Mooooostly the same… Some differences, though. I compiled both versions of Ruby specially to include maximum debugging information, which is why we’re seeing MACRO-style calls in here too. Let’s dig around some of the more interesting looking differences.
Spelunking in a C code base
If you’ve not really rummaged around inside a C code base before, it seems really, really scary. There are tools that can help, though!
I’m a vim user, and no self-respecting vim user that does any C code leaves the
warm confines of their bed without having “cscope” installed. I use
this plugin, which makes some really
nice mappings into useful cscope commands. For example, if I want to find the
definition of a symbol (which is what I do most often when exploring a new code
base), I can type
<leader>fg and it will open a split window with potential
definitions of that symbol.
It does other really cool stuff, too, like finding functions that call a
specific function, or finding files that
#include the current file and so on.
:help cscope will tell you way more than I can.
To GET_THREAD or not to GET_THREAD
GET_THREAD is never called in the 1.9 examples. It did get
changed a fair bit between 1.9 and 2.0, though, look at the 1.9 version:
1 2 3 4 5 6 7 8 9 10 11 12
OPT_CALL_CFUNC_WITHOUT_FRAME symbol defined, because the Ruby 2.0
version gets inlined, the two version of this do exactly the same thing.
From what I can tell from the patch notes, this is part of an optimisation that is disabled by default that delays the creation of a “frame” (stack frame? not sure) for C function calls from Ruby. I tried to dig into it but my unfamiliarity with the Ruby code base is holding me back a bit.
I did check to make sure I had macro information compiled into my 1.9 version
and it is there, there are other macros listed in my outputs, so this doesn’t
seem like me fucking up… Bit weird that 2.0 makes such heavy use of
GET_THREAD and 1.9 doesn’t. Anyone got any ideas? :)
In the Ruby 2.0
GET_THREAD snippet from above there’s an intriguing looking
UNLIKELY. There’s a corresponding
LIKELY macro as well, and
they expand to the following in both 1.9 and 2.0:
__builtin_expect is a GCC specific thing. The documentation says the
You may use `__builtin_expect` to provide the compiler with branch prediction information. In general, you should prefer to use actual profile feedback for this (-fprofile-arcs), as programmers are notoriously bad at predicting how their programs actually perform. However, there are applications in which this data is hard to collect.
So, branch prediction information. I tried to write a small, demonstrative example of when using this builtin made a difference to the generated assembly code but every time the output was exactly the same regardless of whether I used likely or unlikely. According to this post you have to write something suitably non-trivial in order to make sure the compiler doesn’t just optimise your code without the need for the prediction data.
However, even in the blog post’s example code, compiling with and without the
__builtin_expect only changes 1 line of the generated assembly.
The only thing that change is doing is converting between a 32bit signed integer
to a 64bit signed integer (because
__builtin_expect returns a long).
is a command that moves a value between a 32bit register and a 64bit register,
doing a sign extension if it
The above difference only appears with
-O0 specified, any other optimisation
level just returns assembler that is exactly the same. I used the following
commands to generate the assembler I compared against:
$ gcc -O3 -S -DEXPECT_RESULT=1 builtin_expect_test.c -o expect.s $ gcc -O3 -S -DDONT_EXPECT builtin_expect_test.c -o dont_expect.s $ diff expect.s dont_expect.s
No difference. It didn’t even matter when changing what the expected result was
supposed to be. Which sort of questions some of guy’s conclusions, but maybe
there are differences between the versions of GCC we were using or perhaps I did
something wrong? Either way, I can’t seem to find an example that clearly shows
__builtin_expect is doing with the information you’re giving it.
GC in 2.0 but not in 1.9
Waaay up the article, when we took a look at the 10 most called functions for each ruby version, you may notice that there’s nothing said about garbage collection for 1.9, but the 2.0 list has 3 GC related functions in it! Why is this?
According to a slide deck by Joe Damato, the Ruby garbage collector runs when Ruby has run out of space on its own heap to allocate objects. Seeing as we’re not really doing any object allocations (unless using literals counts (it probably does)), my guess is that loading the standard library is where the problem lies.
Has the standard library significantly increased in size? I’m using David
sloccount tool to find out. Results:
$ sloccount lib/ ruby: 80731 (100.00%)
$ sloccount lib/ ruby: 104673 (100.00%)
That’s a fairly large difference, maybe it would account for the extra GC calls? It’s really hard to know for sure. I’m not sure if it would account for so many more lines generated for the Ruby 2.0 runs over the 1.9 runs…
To be completely fair for 1.9, it does do GC stuff, it’s just not in the top 10. If we look at the top 20 functions called for 1.9:
$ cat rtrace-1.9.3-exit42.out | grep "#0 " | cut -d " " -f 3 | sort | uniq -c | sort -nr | head -n 20 43031 parser_nextc 22583 us_ascii_mbc_enc_len 22529 parser_tokadd 21272 vm_malloc_fixup 21197 vm_malloc_prepare 20378 insn_len 19841 vm_xmalloc 16910 rb_enc_dummy_p 15822 parser_tokadd_mbchar 14751 compile_data_alloc 13382 gc_mark 13009 enc_set_index 11493 verify_list 9784 mbc_enc_len 9473 search_nonascii 8563 ADD_ELEM 8133 check_int 7597 coderange_scan 7592 ruby_sip_hash24 7525 yylex
We can see
gc_mark in there, but not a whole lot else. Without digging into
the inner workings of Ruby’s GC and the differences between 1.9 and 2.0 (which
is quite a lot, I hear) I won’t be able to know why this is the case. If anyone
feels like giving an explanation or linking to something that explains this, I
would be happy to read it :)
The data is really interesting and there’s probably a tonne more stuff you could do with it. If anyone wants a copy of the data, I’ll happily try and get it to you. It’s a few hundred megabytes and this blog is in a GitHub repo and hosted on Heroku, so I’ll avoid committing it directly unless I absolutely have to!
So yeah. This is part of what my spare time has been spent doing recently, woo! It’s a lot of fun, I totally recommend just dicking around with this kind of stuff. I always find I run into something interesting in one way or another.