I started working on an x86-based pure-Ruby virtual machine! Not a virtual machine to run Ruby, a virtual machine built in Ruby that executes something that looks a little bit like x86 but isn’t. Why?
This is a write up of where I’m at so far.
First, some background into why I’m doing this.
A friend linked me to a brilliant PDF by Agner Fog called “The microarchitecture of Intel, AMD and VIA CPUs”. It’s an optimization guide for assembly programmers and compiler makers, two things I’ve been interested in for some time. It’s a very detailed guide on things such as branch prediction, out-of-order execution, superscalar architectures and all of the other wonderful things that intensely intelligent people have dreamed up to make processors faster over the years.
I got part-way through the branch prediction section and started thinking it would be really fun to attempt to implement branch prediction algorithms myself. To make it even more interesting, I could benchmark them against each other and maybe try and come up with something new! But to do this, I needed a set of test data… Something like a list of key-value pairs with the key as the branch target and the value of whether or not the branch was taken. You probably need more information than that to implement branch prediction, but it’s a start.
There must be loads of this data lying around, right? Surely people have dumped branch target and hit/miss info before for this exact purpose, right?
Well, if they have it certainly isn’t easy to find. I found a few papers on branch prediction and they all seem to use virtual machines with lots of hooks in them to test their algorithms. This seemed a bit complicated to me, the recommended virtual machine SimpleScalar looked like it had a high learning curve and required its own set of binutils to be compiled. I’ll pass on that for the time being.
It’s at this point that I thought it would be really fun to build my own VM! There was a project in University to do just that but I never really got around to it. This seemed like a perfect opportunity to pick the idea back up and build something cool.
Here’s a link to the Assam source code. I’ve linked to a specific commit that will always be relevant to this blog post.
Currently it supports a very limited instruction set. I’ve mainly focussed on laying out the code and binary data in a sensible way (which I probably haven’t achieved because I know nothing about laying out a binary file). The code itself is quite modular and well commented where it needs to be.
Sample Assam assembly file
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
And the output of dumping the registers at the end:
1 2 3 4 5 6 7 8 9 10
As you can see, I’ve cheated. The assembler works as a Ruby DSL. However, the processor doesn’t know anything about the DSL and would work just the same if you passed it some valid binary code that it understands. The binary code that it does understand is entirely of my own invention and probably isn’t the best way of doing things, but it suits what I’m trying to currently achieve fairly well. If you want to learn more, here is the source code for the Assam assembler. It’s heavily commented because it’s overly complicated. Sorry about that.
Let’s take a deeper look at what makes Assam tick.
So I needed a good way to represent RAM in Ruby. Strings are just big byte arrays, right? I could just use one of those, couldn’t I? Yes, yes I could. Here’s the Memory class that is the backbone of everything that needs to be stored somewhere in Assam:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
The only obvious downside to doing it this way that I could figure out was that you needed to create a string that represents all of memory, and if you want to represent a lot of memory then you need to use a lot of memory. This processor has no concept of virtual memory yet but it’s something I’d love to implement :D
The Utils.pack_for and Utils.unpack_for methods
Because we need to store everything in terms of binary strings, the
String#unpack methods are endlessly helpful. However, depending on the
size of what you want to store/retrieve and whether you want to do a signed or
unsigned operation, you need to switch between a lot of different pack/unpack
Q> = 8 byte, unsigned, big endian value l> = 4 byte, signed, big endian value
I made sure that everything was big endian. Relying on native endianness would have probably caused problems further down the line.
Utils#unpack_for are actually exactly the same
code, the name just changes to be descriptive about what the code is doing. I
probably could condense the code more or just use one method but it hasn’t
bothered me enough to change it yet. The methods ensure that I am always using
the right pack/unpack code all over my code base, and it also makes it very easy
to switch between signed and unsigned store/retrieves.
The processor is register based, rather than stack based. It’s trying to mimic the x86 architecture as much as it possibly can. All of the registers get mapped onto a Memory object that is separate from RAM, and only 256 bytes in size. This is supposed to mimic a processor’s on-board memory.
In the DSL, registers are represented as instance variables. The Register class also overrides some arithmetic operations. The reason for this is to facilitate the different modes of addressing memory. For example, all of the following are valid memory expressions in Assam asm:
1 2 3 4 5 6
And it achieves this by overriding the + and * operators on the Register class, storing the result in a MemoryExpression class. If you’re curious, the code for that is here. It’s a slightly messy process and could probably be done better, but it works.
This is one part of the VM that I’m not particularly happy with. It doesn’t do a good job of checking that arguments are the correct size. The assembler purists may have noticed that I’m trying to copy NASM syntax with my DSL and I’ve avoided size postfixes in my instructions. The actual reason for it is because I wanted to avoid doing destination size checking (trying to store 32bit values in 16bit registers for example). Also, my assembler knowledge is very, very basic. So I don’t know what is and is not valid in edge cases.
All of this aside, the code for the Assam instruction set is in this file. It’s another DSL that makes specifying new instructions relatively simple. It’s tightly coupled to the Assembler, which makes sense seeing as the assembler has to create binaries that correspond to these instructions.
Each instruction needs to know how many arguments it takes and what the size of those arguments is. I’m not happy about this… I think it would be much nicer if you could specify what values an instruction could have or, better yet, put metadata directly in the generated binary that tell the processor how many bytes to read to get the arguments for this operation. Having it directly encoded into the instruction set isn’t a nice idea I don’t think.
You’ll notice in the instruction set file that there’s a lot of reference to a
MemoryLocation class. This class is a wrapper around any value that is not a
direct value. It provides two methods,
write and it accepts an opts
hash whose only current purpose is to tell it whether this is a signed operation
Here’s the code for MemoryLocation:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
Dead simple. The
MemoryLocation.from method is a helper for creating a
MemoryLocation object from a Register object. It just literally indexes into
either the RAM or the register memory without the instruction set implementer
having to know where a value is coming from or how large it is (though they can
inspect that if they want to (and they should!)).
Running a program
Earlier we saw a sample of assembling and running a program under the Assam virtual machine, but what’s actually happening under the hood? Sadly, it’s nothing clever.
Assam has 65k of RAM and it loads your program into memory location 0x1000, sets the instruction pointer to that location and then starts reading the machine instructions. “65k?” I hear you ask, “but your registers have the ‘e’ prefix!”. Yeah. About that… They’re 32bit registers and you can use 32bit values with them but the address size is only 16 bits. Implementing a virtual memory system is on the TODO list.
There’s a very simple and very unrealistic interrupt system in place that mimics the Linux syscall interface (well, just sys_exit at the moment), so that programs actually have a way of exiting. I guess this is the entry point for creating an operating system that runs on Assam…
Binary file formats
I spent a lot of time reading the ELF specification and understanding what all of the parts of it meant in order for my virtual machine to support a real binary file format. Sadly, I’m yet to get to a point where I understand it well enough to implement it. It’s definitely on the TODO list, though.
One of the problems is that the ELF format contains a lot of things in it that I don’t need at the moment, such as information for dynamically linking libraries at runtime and relocating executables when they’re loaded into memory (the latter is probably more important than the former but at the moment I wouldn’t want to use them).
Wait… wasn’t this about branch prediction?
Yes! It was. So the idea was to have a nice system that lets you hook into Assam to get runtime information. This is also something that’s on the TODO list, but I don’t think it’s all that far off. I got carried away with all of the internals and lost sight of the initial goal a bit, but it’s no big deal. I think this project could be really educational once it gets to a point where it resembles a real virtual machine with a real instruction set. Currently, however, that’s a long way off.
You can’t really use Assam very easily currently. It would be nice to eventually distribute it as a gem and maybe produce some kind of PDF that details the rationale behind why parts of it are implemented the way they are (partly what this blog post is being written to facilitate).
It really should support ELF, or some other binary format. It doesn’t matter what binary data you put in there, so I wouldn’t need to implement an x86 disassembler or anything (though it would be a fun project!). I could just throw my own binary format in there. Supporting relocation and linking is a must, though.
Building a real interrupt system. I know very, very little about interrupts other than the Linux kernel uses them as part of its syscall interface. What else are they used for? I have no idea and links to good information on this topic would be highly appreciated.
IO. How the fuck do drivers work? How do I IO? I have no idea. Looking into and implementing it would be pretty essential if it were to be a “real” VM.
It’s going to be a long road.