It’s quite easy to get scared by large systems of things that seem like magic. The Internet is pretty remarkable. It’s probably among the largest and most complicated systems ever designed by human beings. All those hundreds of web pages we view each month arrive at our computers unscathed a large percentage of the time. Some of them will have crossed the Pacific! That small percentage of the time we have a page “hang” while loading is often resolved by simply hitting the refresh button. Ace, right?
But how do those pages actually get to us? In this post (and maybe more, I might split them out, we’ll see how it goes) I want to show how these things work with practical examples and real commands that you can run to inspect what’s going on.
Prerequisite knowledge: Basic Ruby should do it. If you’ve built a simple web application with Sinatra or Rails or something similar, you should be able to follow along without much issue.
NOTE: I’m a Mac user. Some of the examples may be Mac specific. I will try, where possible, to give equivalent commands you can run on a Linux machine but I may miss something out. If you spot something that doesn’t work on your platform, get in touch. Windows users: I’m sorry.
Humble beginnings: the Socket.
Our most basic building block is going to be the Socket. Originating in BSD back in the 80s, sockets provide a very simple API for interprocess communication (IPC) between processes that may or may not be on the same machine. The idea is that a socket allows one process to send data to another process. The data can be whatever you want it to be and you can send as much of it as you want. There are different ways of doing this that give you difference performance characteristics and guarantees.
If you’re reading around the Internet about sockets, you may see references to things called “UNIX domain sockets”. These follow the same API but are used only for IPC between processes on the same machine (“local” IPC) and don’t access or require a network connection. Instead, they are represented by a special kind of file on the filesystem. They’re unimportant to what we’re going to discuss in the rest of the post but it’s worth knowing about them to save confusion when looking things up for yourself.
Right, straight to business! Here’s a trivial example of socket-based communicating in a single process:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
Yikes! Look at all those strange constants and strings and things. I went all low-level APIs on you from the get-go, sorry about that. Let’s break it down line by line and figure out what all of that means.
Creating a socket
Remember earlier when I said that sockets are all about sending data from one place to another and there are a variety of protocols for doing that? You need to specify what protocols you want to use when you create your socket.
Have you ever seen the acronym “TCP/IP” floating around? It stands for Transmission Control Protocol over Internet Protocol and it’s probably one of the most important bits of technology in the modern Internet era. These protocols are what allow computers to talk to each other in a reliable manner using simple, unique addresses.
The “TCP” bit of TCP/IP ensures that messages we send to another computer actually arrive there and arrive in the right order. Yup, that stuff isn’t guaranteed by default. The ordering is important because if you send a lot of data, it doesn’t all go there in one big block. It gets broken up into small “packets” and reassembled at the other end. TCP is quite a complicated beast (this book is 1,600 pages long!) but for now you can just assume it works and it’s the reason all of your bytes arrive safe and sound.
The “IP” bit of TCP/IP is how we identify machines and route traffic to them. If you’ve never seen an IP address, check this out:
$ ping google.com PING google.com (220.127.116.11): 56 data bytes 64 bytes from 18.104.22.168: icmp_seq=0 ttl=58 time=17.826 ms
Note for Mac users: You might have to run
/sbin/ping instead of just
ping a try first, though.
The “22.214.171.124” bit of that output is the IP address of “google.com”. You may get a different value, and that’s fine. Google have a lot of computers and they’ll do their best to connect you to one that’s physically close to you in order to reduce latency (the time it takes for packets to physically travel to and from their destination). Computers aren’t too fussed on fancy human-readable names like “google.com”, though, so they ask a service called the Domain Name Server (DNS) to translate “google.com” into something sensible like “126.96.36.199”.
At the highest level of abstraction, you can think of DNS as a giant lookup table of domain names (the things you type into your browser’s address bar) to IP addresses. In reality, it’s quite a large and distributed system of many lookup tables in many geographical locations. If you’ve ever bought a domain and been annoyed by how long it takes for the name to “propagate” across DNS, the reason is because it’s widely distributed.
The layers below TCP/IP know how to ship your data to the right place based on the IP address. Again, for the time being you can ignore those layers and assume that they Just Work. We’ll go into some of the gory detail a little later.
Socket::AF_INET part above means we want an IPv4 socket. Internet Protocol
version 4. This means that we are identifying machines with IP addresses that
are 32 bits long. You may have read somewhere that we’re running out of IPv4
addresses. Yup, we are.
What does this mean in practice? It means that fairly soon we won’t be able to
add more machines to the Internet. That’s roughly as bad as it sounds. What are
we going to do about it? The current plan is to use IPv6. Same principal as
IPv4 except addresses are 128 bits long. That’ll learn us for thinking we would
never use 4.2 billion addresses, now we have
addresses. No more problem \o/
If you wanted to use an IPv6 socket in Ruby you would instead specify
Socket::AF_INET6, but we won’t do that for the time being. Most of all
Internet traffic is still IPv4 and supporting both protocols is a little tricky
(but not impossible!) so we’ll skip past it for now.
Oh yeah! The “AF” part stands for “address family”. The Internet Protocol isn’t the only way to address machines, it’s just the most popular. It’s also the only one that’s really relevant to web programming, so we’ll stick with it.
What’s up with the dots and numbers notation for IP addresses?
We’ve seen a few IP addresses now and you’ll have noticed a pattern with how they look. It’s called dotted-decimal notation and it exists primarily to make addresses easier to remember. For example, which one of the following would you be more likely to remember tomorrow morning:
I know which I’d choose. Each of the four sections of an IP address represents 8 bits, so its value can range from 0 to 255. It’s always good fun watching films with hackers in them, the IP addresses are never valid despite some IP valid addresses being reserved. I’ve always thought programmers would have more respect for a film that knew a standard or two. Or you could have a chuckle and use the IP address of the NSA.
I keep seeing
PF_INET on the interwebz, what gives?
I will quote the master on this one. Beej:
In some documentation, you'll see mention of a mystical "PF_INET". This is a weird etherial beast that is rarely seen in nature, but I might as well clarify it a bit here. Once a long time ago, it was thought that maybe a address family (what the "AF" in "AF_INET" stands for) might support several protocols that were referenced by their protocol family (what the "PF" in "PF_INET" stands for). That didn't happen. Oh well. So the correct thing to do is to use AF_INET in your struct sockaddr_in and PF_INET in your call to socket(). But practically speaking, you can use AF_INET everywhere. And, since that's what W. Richard Stevens does in his book, that's what I'll do here.
You may also notice what I’m writing here has a lot of overlap with his well-known Guide to Network Programming except that he focuses on the actual C APIs and probably knows more about it than I do. I wholeheartedly encourage the enthusiastic reader to see if they can follow his guide after reading this post!
The TCP bit! This is us telling the socket that we want all of our bytes to arrive at the destination in the correct order. Believe it or not, this isn’t always what you want. Or, rather, you aren’t always willing to pay the overhead for these guarantees. TCP does a lot of admin work under the hood which is necessary but can slow down communication. Some applications, such as audio and video streaming, aren’t too fussed about losing a few bytes here or there in the name of speed, so they instead opt to use the User Datagram Protocol (UDP).
UDP takes all of the reliability and consistency guarantees of TCP, crumples them up and throws them out of the window. It’s often called the “fire and forget” protocol. TCP asks each party to acknowledge (ACK) receipt of each packet as it arrives and resends packets if no ACK is received. UDP is the irresponsible little brother who just assumes everything will be fine and the Internet Protocol will get everything there.
Everything won’t be fine. Hope is not a strategy. Assume that you should use TCP unless you know 100% that you need to use UDP.
Binding a socket to an address
A socket is pretty useless on its own. In order to fulfil any dreams it may have, it needs to sign up to communicate via an address on the local machine. This is called “binding” to an address and is achieved like so:
In Ruby, the
bind method takes an argument of type
Addrinfo (anything else
will cause it to raise an
EAFNOSUPPORT error). In C, all of this addressing is
taken care of via a data structure called a
Addrinfo class exists as a nice abstraction for Rubyists that don’t want to
worry about all the crap they need to set correctly in a
sockaddr (which is
all of them).
Interestingly, you see how we call the
tcp method of
Addrinfo? That’s kinda
just a formality in this case. You could just as well use the
udp method of
Addrinfo, it would make no difference. It’s an oddity of the Ruby Sockets
bind method makes no use of the protocol information stored in an
Addrinfo object, it only cares about the host and port being correctly coerced
sockaddr. For the truly curious and brave, you can try and follow the C
code. Start here.
Okay, okay, I hand-waved past that last part. Here, I’ll prove it to you. Open up an irb session and try this:
$ irb -r socket irb(main):001:0> tcp = Addrinfo.tcp("0.0.0.0", 7777) => #<Addrinfo: 0.0.0.0:7777 TCP> irb(main):002:0> udp = Addrinfo.udp("0.0.0.0", 7777) => #<Addrinfo: 0.0.0.0:7777 UDP> irb(main):003:0> tcp.to_sockaddr == udp.to_sockaddr => true
to_sockaddr part is all that
bind cares about. Any other information
that may be part of an
Addrinfo object is irrelevant to it.
Now, the meat of this section: what do those numbers mean? The “0.0.0.0” and “7777”? They are, respectively, a host IP address and a port number.
Despite the vast range of IPv4 addresses we could choose from, there are very
few valid ones when we’re binding a socket. In general you only really have two
choices: “127.0.0.1” or “0.0.0.0”. But Sam! What about “localhost”? Yeah, that
just translates to “127.0.0.1” under the hood. Check out the
on your machine. It contains a list of domain name to IP mappings that are local
to your machine. You can even add your own! I tend to name my virtual machines
in this file.
0.0.0.0 vs 127.0.0.1
Both of these addresses mean “your machine”, but they mean it in different ways. Your computer has a number of different “network interfaces”, you can view them by running the following command:
NOTE: This commands ships by default on Mac but depending on your Linux
distro it may not be there. I run Arch Linux and had to install the
package to get it. If in doubt, Google is your friend.
A couple of notable ones you might pick out are “eth0” and “wlan0”. If you’re on a Mac you might see “en0” instead of “wlan0”. Because I can’t think of a succinct way of explaining a network interface on my own, I’ll steal from Wikipedia:
In computing, a network interface is a system's (software and/or hardware) interface between two pieces of equipment or protocol layers in a computer network. A network interface will usually have some form of network address. This may consist of a node Id and a port number or may be a unique node Id in its own right. Network interfaces provide standardized functions such as passing messages, connecting and disconnecting, etc.
It doesn’t take a genius based on the above description and the names “eth0” and “wlan0” to guess that the former would be an Ethernet port and the latter would be a wireless LAN connection. On Macs, “en0” appears to be my wireless connection. I don’t know why.
Using the address “0.0.0.0” when binding a socket tells the operating system that you’re interested in listening to all active network interfaces for traffic. In practical terms, this means that the outside world can talk to your program (provided your firewall and ISP is cool with it (they usually aren’t (it usually isn’t a problem unless you’re doing crazy things))).
Conversely, “127.0.0.1” is the address of what’s called the “loopback”
interface. You’ll see it listed in
ifconfig as “lo0” or something similar.
It’s used when you want to do socket programming but only to processes on the
same machine. The data you send will never hit your network card or leave your
computer at all.
Ports aren’t actual, physical things. Their existence is in the operating system, and there are a maximum of 65535 of them. Why? Because standards (this will be a recurring theme). They are a way of mapping network communications to running processes. This is why binding to a port is important. You’re telling the operating system to give you all of the traffic that gets sent to that port.
When I visit a web page, I don’t specify a port. Why not?
An excellent question! It’s all to do with the standardisation of port numbers. Have you ever heard of IANA? Of course you haven’t, who has? They are the Internet Assigned Numbers Authority and they deal with all of this boring standardisation lark so that things can actually work instead of being a mess of differing and incorrect opinions.
Behold! The full list of standard, assigned port numbers and the protocols that use them. Don’t worry if most of the acronyms mean nothing to you, they mean nothing to me as well. The one we’re interested in here is the “http” port, which just so happens to be 80. Your web browser knows that if you don’t specifically give it a port you probably mean port 80. Or, more recently, port 443. I’ll leave figuring that one out as an exercise to the reader.
You’ll probably hear speak of “privileged ports” if you start working in anything network based. These are the first 1024 ports, and require root access to bind to. The reason is for security. If you’re connecting to a privileged port, there’s a certain guarantee that someone who knows what they’re doing has set it up. You can’t just have any old bean running a service on those ports!
Hello… Is there anybody in there?
Now that we have our very own port, it’s about time we started accepting traffic
on it. This is achieved with the
listen method. It takes a single argument:
the size of the connection buffer. Somebody could connect while we’re servicing
someone else, but we don’t want to turn them away. Instead, we tell the
operating system to put them in a queue and we’ll get around to them soon. 1 is
probably not a great value for this, but the example is purely illustrative.
We’ll be more sensible in later posts.
We have the “server” end of the connection set up, these two lines set up the
“client” side of the connection. Very similar to what we’ve already seen but
instead of calling
bind we call
connect. If you call
connect on a host and
port that isn’t listening, you’ll get an
connect returns, we will have made a request to the target server and the
operating system will have added us to its connection queue (the thing we
specified the size of in the call to
listen). The next step is to pick that
connection up on the server side, which we accomplish with
We get back a
Socket object and an
Addrinfo object. The socket allows us to
listen to what the client has to say and reply to it, the address tells us who
we’re talking to. In our example, we don’t much care what the client has to say,
we just want to say hello to it and then send it on its way:
My, oh my. That looks a lot like working with files, don’t'cha think? The reason
for that is that sockets sort of are files. The concept is pretty much the
same. They open, they close, bytes go in, bytes come out. For this reason, they
follow the file API very closely. There are subtle differences, though. With
normal files on disk, you can read arbitrary parts of them with a method called
seek. You cannot
seek a socket. That wouldn’t make sense without the
ability to alter the passage of time. If you manage that, submit a patch to the
To round all of this off, we receive the message and print it out for the world to see:
Of course, let’s not forget to be good citizens and clean up after ourselves:
Q: I’ve run the script but I get an “Address already in use” error. Wut?
You’re probably running the script for the second time, right?
This is one of the ugly parts of network development. When your application closes the socket it has bound to, the operating system doesn’t get rid of the socket immediately. If you’re running a real server and you suddenly disappear without warning, the client won’t be aware of that straight away. For this reason, the operating system keeps listening on that port and if any clients try and connect or send data, it tells them that the connection is dead and they should stop using it.
During this time, the operating system won’t let you reuse the port unless you really really really really want to. This is because you could be picking up connections that weren’t meant for you, which would lead to lots of confusion. If you want to throw caution to the wind, though, you would do it like this:
1 2 3 4
And we probably will for the sake of quicker development. It’s worth avoiding this practice in real systems, though, for the reasons detailed here.
Communicating across processes
This is all well and good, but we’ve been missing the point in the name of simplicity. I won’t stand for it any more! Let’s do this IPC style.
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7
Nothing has been added or removed, the code has just been separated out into two files. To see this in action, open up two terminals and in the first terminal run:
$ ruby server.rb
This will run the server. You’ll notice that nothing happens but the program doesn’t finish. That’s to be expected.
In the other terminal run:
$ ruby client.rb
What happens next is that the client connects to the server, the server sends it the string “Hello, world!” and the client prints it out. Then both processes exit. Neat, right?
server.rb hang until
Networking is fundamentally unpredictable. You can’t know when clients are going
to connect to you. Because of this, lots of networking calls will “block” until
something significant happens. The line of code in our example that does that is
the one with the
accept method in it.
accept will put the process to sleep
until a client connects, at which point the process gets woken up and
returns the appropriate connection information.
Communicating across machines
Taking this example to its logical extreme, you probably want to communicate across machines. I understand, that’s the entire point of networking, but this part is not as clear-cut as the rest of this post has been. Router configurations differ, firewalls can be pushy, things just might not work as you would like them to because reasons. As a compromise, I’m going to explain how to communicate between computers inside the same local network. This means two computers connected to the same router. It will demonstrate that you can in fact communicate between computers with the exact same code without going into the pain of router configuration and getting pissed off with the imperfect world we live in.
But first, we need some background knowledge…
Finding the right IP address
A key thing to understand in this section is that networks are node and edge
graphs. Your personal laptop connects to your router, and
your router can then connect to other things, and those other things to even
more other things and so on. Getting from your router to the website you’re
looking for involves “hopping” from one location to another until you eventually
find what you’re looking for. You can visualise it with the
$ traceroute xkcd.com traceroute to xkcd.com (188.8.131.52), 64 hops max, 52 byte packets 1 192.168.0.1 (192.168.0.1) 1.496 ms 1.329 ms 0.926 ms 2 * * * 3 02780898.bb.sky.com (184.108.40.206) 32.936 ms 31.230 ms 31.942 ms 4 ae-1.r00.londen01.uk.bb.gin.ntt.net (220.127.116.11) 26.338 ms 26.777 ms 26.356 ms 5 ae-6.r02.londen03.uk.bb.gin.ntt.net (18.104.22.168) 107.436 ms 107.745 ms 105.124 ms 6 * * * 7 ae-4.r22.nycmny01.us.bb.gin.ntt.net (22.214.171.124) 105.298 ms 142.151 ms 114.880 ms 8 ae-2.r05.nycmny01.us.bb.gin.ntt.net (126.96.36.199) 115.527 ms 107.279 ms 108.187 ms 9 ae-0.internap.nycmny01.us.bb.gin.ntt.net (188.8.131.52) 106.645 ms 106.486 ms 98.353 ms 10 border4.pc2-bbnet2.ext1.nym.pnap.net (184.108.40.206) 204.241 ms border4.pc1-bbnet1.ext1.nym.pnap.net (220.127.116.11) 106.519 ms border4.pc2-bbnet2.ext1.nym.pnap.net (18.104.22.168) 97.778 ms 11 inapvoxcust-1661.ext1.nym.net (22.214.171.124) 113.426 ms inapvoxcust-1662.border4.ext1.nym.pnap.net (126.96.36.199) 116.321 ms 122.317 ms 12 188.8.131.52 (184.108.40.206) 113.839 ms 0.te1-2.tsr1.lga11.us.voxel.net (220.127.116.11) 97.906 ms 18.104.22.168 (22.214.171.124) 108.450 ms
traceroute is avaiable by default on Macs, but maybe not on Linux.
In Arch Linux, I had to install the
traceroute package to get the command.
Understanding traceroute output
What even does that mean?
traceroute cleverly utilises a field in network
communication called Time To Live (TTL). Every packet you send somewhere has a
TTL associated with it. Whenever that packet meets a hop, the hop will decrement
the TTL. When the TTL gets to 0, the hop doesn’t forward the packet on.
Instead, it sends diagnostic information to wherever the packet came from. This
information includes where the connection got to before running out of TTL.
traceroute sends a small amount of data with successively larger TTLs. This is
what the number down the left hand side means. By doing this, it can make a note
of every step of the journey for your packet.
You’ll notice that numbers 2 and 6 are just stars. This means that no data came back for that TTL. This could be because the hop didn’t respond in time or it just didn’t bother to send you diagnostic information back. Like I said, imperfect world.
You’ll also notice that some TTLs have multiple entries. This is rather advanced
networky stuff that I don’t fully understand, but the basic idea is that there
may be multiple paths to the same destination and this is what
The first hop is also quite interesting. You may have seen the address
192.168.0.1 before. By convention, this is the address that home routers use to
address themselves on a local network. Addresses starting with 192.168 tend to
refer to things inside your network. You can easily find out what IP address
your machine has by running
ifconfig and looking at the active connections:
$ ifconfig en0: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 ether 3c:15:c2:bb:ed:7c inet6 fe80::3e15:c2ff:febb:ed7c%en0 prefixlen 64 scopeid 0x4 inet6 fd0c:d42b:9aef::3e15:c2ff:febb:ed7c prefixlen 64 autoconf inet6 fd0c:d42b:9aef::a8f7:981c:1fe3:be50 prefixlen 64 autoconf temporary inet 192.168.0.18 netmask 0xffffff00 broadcast 192.168.0.255 nd6 options=1<PERFORMNUD> media: autoselect status: active
I cut down the output to my only active network interface. The line we’re interested in is:
inet 192.168.0.18 netmask 0xffffff00 broadcast
This tells us our address on the local network is 192.168.0.18. This will be important later.
Machines decide where to hop to next by using “routing tables”. If you want to view the routing tables on your machine, you can run the following command:
$ netstat -nr
NOTE: Again, this command is there by default on Mac but not on Arch.
net-tools strikes again.
The routing tables for your local machine are very thin. Your computer doesn’t know a whole lot about the world, it leaves the problem of routing traffic to your router (clue was in the name, really). You’ll notice entries in the output that are familiar to you, such as 127.0.0.1. You’ll also notice that your local IP address is mapped to 127.0.0.1. Why do you think that is?
The Internet is a constantly changing beast, how does your router keep up with it? The truth is that it doesn’t. This is why it’s useful to have the graph structure. If your router is told to try and find 126.96.36.199, it doesn’t have a clue. It does know, however, that its mate in London knows a thing or two about the 90.x.x.x IP address range, so it might send the request over there and trust its friend to route the address appropriately. This is the hopping we’ve been referring to.
Network Address Translation
It may have occurred to you earlier that if most private home networks assign addresses that start with 192.168, that means that millions of computers all over the world will have those addresses, so how does traffic get routed to them correctly?
That’s an excellent question, and the answer lies in three letters: NAT. Network Address Translation. Different routers may implement this differently but the high level concept is that every connection made that goes from inside the local network to outside the network gets rewritten so that it seems as if it was the router itself that made the connection. The router will have been assigned a more globally reachable IP address by your Internet Service Provider. The router then holds a lookup table of what responses need to go to which IP addresses / ports inside the network when they eventually return.
Bringing all of this together
Now that we have a reasonably good idea of what moving parts are involved, the modified code that should work across machine is actually trivial.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1 2 3 4 5 6 7 8 9
If you run
server.rb on one machine, note down the IP address it outputs,
client.rb on another machine and run it, you’ll get the
exact same result as we did way back in the first example of the script.
Wait a sec. Servers aren’t supposed to die after one connection. What gives?
You’re absolutely correct. My bad. Here’s a fix for that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Now you can run
client.rb as many times as you want and you’ll get the same
reply from the server each time. You’ll have to kill the server manually when
you want it to stop, though. You can do that using the usual ctrl + c key combo.
We’ve covered a lot of ground in this post. From the humble beginnings of sockets inside of the same process, we’ve worked all the way up to sending data from a process on one machine to a process on another machine in the same network. As an added bonus, if you managed to follow all of the information in this post you will have a pretty reasonable idea of how the data gets from one machine to the other as well!
I had planned on working up to a functioning web server in this post but it’s gone past 5000 words up to now, so I’ll break that part out into a second post.
Thanks for reading! :)