Context switching is the method an operating system employs to implement “multitasking”. It’s the practice of having multiple contexts of execution in your system, often called “processes”, and switching between them really, really quickly to create the illusion that multiple things are happening at the same time.
It’s also a good method for efficiently dealing with processes that need to wait for an IO request, such as a hard disk read, to return. While one process is waiting, another can execute.
Prerequisite Knowledge: There will be some C and assembler in this post, so some familiarity with those languages would be good. Also, knowledge of what processor registers are will help.
You may also want to read this post at least twice. Just saying.
Before we begin
First, I want to talk about the ideas behind context switching, then I want to look at the xv6 implementation of context switching, then I’d like to talk a little bit about hardware vs software context switching.
If you’re unfamiliar with xv6, it’s well worth checking out. It’s an educational reimplementation of the Unix V6 kernel with a very well written and well documented source base. I’ll be referring to it throughout this post, so I would checkout the source code. Instructions to do that are on their website.
Context switching mechanisms are platform specific. Depending on what processor architecture you’re using, your context switching will look very different. I’m going to cover x86 (32 bit), as that’s the only one I know about in any detail.
The general idea is that your processor has a state, which is the current contents of all of its registers. Each process will contain a copy of the state that it’s in when it gets swapped off the processor, and when it later gets swapped back on that state will get restored. This allows processes to be paused and resumed at a later time.
Your processor doesn’t have any knowledge of processes, though. Not really. The processor only really executes one set of instructions at a time in a (mostly) linear fashion, so how do we convince it to jump between multiple processes? We need to have a list of processes, each containing processor state. We also need to have a means of jumping out of a running process and doing all of the logic necessary to swap it off the processor and swap its successor on.
What is a process?
From a high level, we all know that a process is a running program in the system. It contains some binary executable code, some data, it can be sent signals, it can spawn children and it can die. But what data makes this possible? Let’s take a look at what xv6 considers enough information to represent a process:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
Lots of interesting things in there. A lot of them are pretty obvious, such as the process ID, the size of the process memory, the parent process and so on. Other things are a little less obvious. Why does it need a kernel stack, for example? And what’s a trap frame?
Every process in xv6 needs a kernel stack for handling system calls. If, for example, a process gets stopped mid way through a system call and a new process starts, it would be a really bad idea if the new process started in the middle of a shared kernel stack. It would be equivalent to all user processes sharing one big stack, and because sharing stacks is a bad idea, each process has its own user and kernel stack.
Another reason it’s really useful to have a kernel stack for each user-mode process is that even if the user mangles its user-mode stack, the kernel stack will be intact and the operating system will be able to handle the failure gracefully.
Interrupts and Trap Frames
An integral part of the x86 architecture is its interrupt system. The processor can, at any time and for a variety of reasons, be interrupted and asked to do something else. Examples of types of interrupts include divide by zero, access to memory that doesn’t belong to you or is protected, IO requests returning, keyboard events and a special “timer” interrupt that happens periodically. This is how “pre-emptive process scheduling” is implemented. All of these things require multiple context switches. (Every single keypress you do requires at least 2 context switches!).
There is a piece of hardware inside your computer called a “programmable interval timer”*, which can be told to generate an interrupt after a given amount of time. In xv6, it is told to fire every 100 milliseconds, and xv6 has set the PIT interrupt number to point at its process scheduling code, which handles context switches.
The trap frame stored in the
proc struct, however, is concerned only with
system calls. xv6 implements its system calls in a similar way to Linux, by
having the software interrupt number
0x80 (decimal 64) be handled as a system
call. The user stores the system call number into the
eax register, fires an
int 0x80, the processor looks up the handler for that interrupt, and then the
OS transitions into kernel mode.
Part of the kernel mode transition involves creating a “trap frame”, which looks
like this (
x86.h file in xv6):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
This is built on the stack for all interrupts that happen and serves as a way for the operating system to tell the processor where to jump back to after the interrupt has been handled.
When xv6 handles a system call, the current process’s
proc->tf is set to the
interrupt trap frame, which contains the register set that will get restored to
the processor when the system call retuns. This way, system calls can interact
with the register set and give return values back to user mode.
System call code in xv6, found in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Disabling and re-enabling interrupts
Sometimes we don’t want the processor to be interrupted because we may leave
really important data structures in an inconsistent state. x86 offers two
commands to deal with this called
sti, clear interrupts and set
interrupts respectively. These two commands manipulate a bit inside the eflags
register to tell the processor whether or not we are currently interested in
When we’ve cleared interrupts, we stop handling them but they do get queued up internally by the processor. When we restore interrupts, the queued up interrupts start getting processed again. This makes sense. You wouldn’t want to completely ignore them otherwise you would have processes waiting on IO results that have been thrown away.
spinlock.c there are definitions for
popcli(), which are
a “matched” interface to the x86
sti instructions. So it takes two
popcli() to restore interrupts after two calls to
makes it easier to handle the clearing of interrupts when multiple parts of the
code base are calling
sti. Without them, you could end up with
functions turning interrupts back on before other parts of the code base are
ready for them.
If that’s still unclear, imagine the following code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
There is a gigantic problem in the above code.
func2() is calling
when it returns,
func1() still expects interrupts to be off and does some
important things. To get around that problem, we would rewrite the above code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
This way, interrupts would only be turned back on when the
popcli() inside of
func1() has been called.
Detailed steps for context switching
Let’s envisage a scenario. Process 1 is running, process 2 is waiting to run.
The PIT throws an interrupt. This causes the processor to stop what it’s doing,
abandon process 1, create a trap frame and transition into kernel mode. When an
interrupt is caught, there is a generic catch-all that sets up the trap frame
and any trap information the processor has given us and then calls the C
trap(). Here’s the assembler for that:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35
The comment at the top of the above assembler snippet mentions
you probably won’t find that file inside the xv6 repo. Why? The reason is that
x86 doesn’t make interrupt management easy, and to handle all interrupts in a
way that allows you to identify what interrupt happened you need to set up 256
interrupt handlers in assembly. Not fun. xv6 handles this with a Perl script:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
The important bit to note is that everything jumps into the
label seen above, which passes control into the
trap() function in C.
Continuing the example…
We’re now in process 1’s kernel stack in the trap handler and we need to find a
new process to run. Because this is a timer interrupt, the following bit of code
inside of the
trap() function in
trap.c will run:
1 2 3 4
Let’s dig into what
yield() is doing (
1 2 3 4 5 6 7 8 9
So first we’re acquiring a lock on the process table. This is to avoid race
conditions when multiple processors are running scheduling code. Then we set the
current process’s state to
RUNNABLE, and we call a function called
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
The majority of
sched() is concerned with making sure it’s safe to schedule a
new process. First it checks that we’re holding the process table lock, then it
checks to make sure we’re only one
pushcli() level deep, then it makes sure
the process being swapped off is not still running, then it checks if interrupts
have been cleared. The last check feels a lot like a “just in case”, because in
cpu->ncli check should cover it.
If any of the above is true, we cannot safely schedule a new process and the kernel panics.
cpu->intena is a variable that stores whether or not interrupts were
enabled before a
pushcli call. The only place in the xv6 code that
cpu->intena seems to be used is in
spinlock.c but I can’t quite understand
what it’s doing. I don’t think it’s particularly important for this explanation,
though, and we can skip over it without much trouble.
The next important call inside of
sched() is the call to
swtch(). This is
the meat and potatoes of the operation, where the actual context switching
happens. It passes in a pointer to the current process’s context, so that the
current registers can be saved, and it passes in the scheduler’s context to be
switched to. It makes sense, in that case, that this part is implemented in
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
Tada! This is what we’ve been looking for.
swtch() takes two context structs,
that look like this (
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
It contains only the registers necessary for performing a context switch, and
you can see in
swtch() that all we need to do is push the current context
values, switch the stack pointer, then pop the context values from the new stack
into the appropriate registers. Magic.
The rest of the process is less concerned with context switching and more concerned with picking a new process to run, which is a completely different kettle of fish.
Hardware context switching
x86 offers a method for doing context switching in hardware, which sounds as if it would be faster than software methods but in practice this doesn’t turn out to be the case. (According to various reading around the web. I’ve not run any benchmarks on this personally).
The problem with hardware context switching is that it handles more registers than it needs to for modern OS implementations. For example, Linux uses what’s called a “flat memory model”, which means all of the “segment selectors” are zeroed and never change.
You’re probably thinking “what the fuck are segment selectors?”. x86 uses a
segmented memory model, which means that all of memory is split up in 65kb
segments and everything is accessed with a segment selector (
%cs for code
%ds for data segment and so on) and an offset. At least, that’s what
people used to do. These days it is far more preferable to set all of the
segment selectors to 0 and access memory as if it were “flat”.
Whenever a segment selector is changed, the processor performs security checks on the new values to make sure they’re valid and a whole host of other stuff that I don’t fully understand. This is an overhead that is unnecessary and causes hardware context switching to be slower than software context switching in most cases.
Also, the x86-64 architecture has dropped support for hardware context switching according to the OSDev wiki.
So that’s context switching in about as much detail as I can really offer. xv6 is a really awesome code base and I’ve learnt a lot about relatively simple yet realistic kernel implementations without having to go through the pain of reading the Linux source code.
Once I’m at the stage where I fully understand xv6 and can tinker around with it confidently, I fully intend to move up to the Linux code base and start messing with that.
I think a natural follow-on topic from this post would be to talk about scheduling algorithms. I might do that, but it does feel a bit too much like being back in University. I dunno, I’ll see how I feel :)