Operating systems

System calls

The goal of this lab is to make you familiar with xv6, a kernel developed at MIT to learn operating systems. The second goal of this lab is that you understand how a system call works. To do this, you will implement a new synchronization mechanism in the kernel and explore the basic structures of structures of xv6.

Installation dependencies (∼ 10mn)

If you work on your laptop, you may have to install a few dependencies to do the xv6 labs. See Software to install.

Installation (∼ 20mn)

First, let's clone the xv6 repository:

git clone https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-riscv.git

Then, create a local branch named tp-syscall for your lab:

git checkout -b tp-syscall

Then, compile the kernel:

make

In this course, it is only possible to compile and run xv6 on a Linux machine. If you don't have a Linux, install one, or connect with ssh to a Linux machine.

Start xv6 with the following command:

make qemu

This command starts xv6 in the qemu emulator. When xv6 starts, it launches a simple shell allowing to execute some basic commands. The code of shell is located in the user/sh.c file.

To exit QEMU, type Ctrl-a x, i.e., hold Ctrl, type a, release Ctrl, type x. Ctrl-a is used to send a control sequence to control QEMU, for instance x to quit.

If you look at the source code, you can find the classic architecture of a system: the kernel image is called kernel (in the directory kernel/), and consists of the files indicated in the variable OBJS that you will find in the Makefile. Each of the basic command in the user space is implemented in a source file (cat.c, sh.c etc.) in the directory user. You can find the list of these commands, each prefixed with an underscore _, in the variable UPROGS (user commands) of the Makefile.

With WSL, XV6 may fail to start with the error message "No bootable hard drive". The problem comes from WSL bad handling of carriage return characters in Perl script. You can fix this by running:

sed -i -e 's/\r/\n/g' *.pl

It is also possible to run xv6 in debug mode for directly seeing the low-level kernel operation. To do this, run:

make qemu-gdb

This command launches xv6 in qemu, but the startup is interrupted just before the virtual machine starts and allows gdb to connect to it.

In another terminal, navigate to the root of your sources of xv6, then run:

gdb-multiarch

This is a version of gdb with support for different architectures, including RISC-V.

On another distribution, you may want to run riscv64-linux-gnu-gdb instead.

Check carefully the startup messages from gdb. You will see an error message similar to this one :

warning: File "/home/gael/xv6/.gdbinit" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load"

Copy and paste the line displayed just after the following line "To enable execution of this file add" in your gdb configuration file at ~/.gdbinit in order to allow the use of the .gdbinit script by gdb.

Once gdb starts correctly, type b main (break main) to instruct gdb to interrupt the execution of xv6 when it reaches the main function. Then, resume the execution of xv6 with the command c (continue), in other words, let xv6 finish booting. If all goes well, xv6 stops at the start of the main function.

Display the code of the main function with l (list). You can examine the code that initializes the system (the source code of this function can be found in main.c. Technically, this code is called from a trampoline in assembler which is located in entry.S and which takes care of installing the vital minimum allowing to switch to C. So the boot process for xv6 is the following:

QEMU makes the boot CPU jump at the start of the kernel, this is the assembler trampoline at the _entry label;
the assembler part correctly initializes the processor (we will explain how later) then jumps into the main function.

xv6 is a very simple kernel that handles all privilege modes (machine, supervisor and user). In the real life, with Linux for example, the boot process starts with the static (ROM) platform bootloader, then a flashable first-stage bootloader, then an SBI implementation that handles the machine mode, then a bootloader à-la GRUB (typically, U-Boot), and finally the kernel.

Welcome to an operating system kernel!

First system call (∼ 1h10)

The goal of this exercise is to add new system call to xv6. The system call we create is relatively artificial, but reproduces in a fairly realistic way the operation of the Linux futexes that you have studied in previous labs.

To start, we create a new command named ftest allowing us to test our new system call. This command, like any command, is executed in userland and therefore, ultimately, interacts with the kernel. For the moment, our command simply performs a simple display.

To do this, start by creating a ftest.c file in the directory user. In this file, write a function main which:

uses printf() to display "starting".
calls exit(RET_CODE) to exit the command with the given return code (e.g., 0).

The printf() and exit() functions are defined in the header file user/user.h. Before including it, you must also include kernel/types.h.

Now that we have our first command, we can compile and test it. To do this, add it in the Makefile, in its variable UPROGS. As exemplified by the existing commands, you must add ftest as $U/_ftest. Then, run xv6 (make qemu) and check that your command works by running ftest in the shell.

Congratulations, you know how to create new commands in xv6!

Now that our command is functional, we can add our two system calls. These are the calls:

int fwake(int slot, int val) : this function considers that the kernel has internal variables, called slots, that are numbered from 0 to 5. The fwake function assigns the value val to the slot ^th slot, then wakes up the processes that are waiting for the variable slot to change value (see next function);
int fwait(int slot, int val) : this function pause the process as long as the slot number slot is not val.

A typical usage of fwake/fwait is given here:

#include "kernel/types.h"
#include "user/user.h"

int main() {
    if(fork() == 0) {
        printf("%d - wait...\n", getpid());
        fwait(1, 42);
        printf("go go go!\n");
    } else {
        sleep(50);
        printf("%d - notify...\n", getpid());
        fwake(1, 42);
    }

    exit(0);
}

This program starts by creating a child process that waits until slot 1 is set to 42 (initially, a slot is set to 0). The father process sleeps for 50 ticks (around 5s, time is managed in a very approximate way in xv6) then sets slot 1 to 42, which wakes up the child process.

Start by replacing your code in ftest.c with this example code.

We now add the fwait() and fwake() functions in user/user.h. These functions simply call the appropriate functions in the kernel by making a system call. For that:

give a syscall number to our system calls by adding SYS_fwait (number 22) and SYS_fwake (number 23) in kernel/syscall.h;
then, add the user-side "implementation" of the syscalls: in user/usys.pl, add two entry's similar to the existing syscalls in this file;

the file user/usys.pl is a Perl script that generates user/usys.S, an assembly file with the actual syscall stubs
for the details, the generated assembly code does the following:

tell the assembler that the symbol we are defining is a global,
define a label with the name of the syscall (fwait, fwake),
load the number of the syscall (the macro SYS_fwait or SYS_wake you defined just before in kernel/syscall.h) into register a7,
execute the instruction ecall to trigger an environment call of the number defined by content of the register a7 (an environment call is a generalization of a syscall: one privilege level calls into the immediately more privileged level underneath, so in the case of a syscall, the user mode calls into the supervisor mode, i.e., the kernel),
return with ret;

then add the declarations of these user-side functions to the file user/user.h:

they are the signatures of the assembler functions generated by user/usys.pl in user/usys.S,
the signatures were given in the previous question: int fwait(int slot, int val) and int fwake(int slot, int val).

The test part in userland allowing to test the new system calls is now ready. Since you have not yet associated any code with these system calls in the kernel, calls must fail miserably with "unknown sys call 22" and "unknown sys call 23 " messages. Check that this is the case by compiling, by running xv6 (make qemu) and running your ftest command.

This question aims to understand step by step the sequence of events generated by calling fwait(1, 42) in your ftest command. Don't hesitate to ask your teachers when there are steps you can't get through to understand.

First, we take a look at what's going on in userland. To do this, open the file user/ftest.asm. This file contains the assembly code of your command ftest:

In that file, find the code corresponding to the call fwait(1, 42). You should see the setting up of arguments (li a1,42 and li a0, 1), and a call to fwait(): the instruction pair auipc ra,0x0; jalr 894(ra) is used to perform a jump relative to the current PC of 894 bytes. If you add the address (in hexadecimal) of the auipc instruction to the offset given to jalr (which might by different in your assembly code), you should get the address of the label fwait: written in the comment of the jalr instruction.
We can then follow the flow of execution by looking for this label (or its address in hexadecimal) in the file. You should see the assembly code for the fwait() function that you wrote in usys.S.

From the call to ecall, the RISC-V platform switches to supervisor mode, and we can follow the code flow:

the CPU continues its execution in the function uservec (in kernel/trampoline.S), that was installed as the trap vector when the kernel switched to the user process (the CPU knows it must jump there when a trap from userspace occurs);
this function saves the execution state of the user process to a memory area (the trapframe), then restores the kernel context before jumping to the C function usertrap (kernel/trap.c), two steps that are non trivial:
- A first major issue when handling traps in RISC-V, is that the supervisor code does not have any general-purpose register available (because they may all contain values for the user process). But a register is needed to hold the address of the trapframe where to store the execution state. The kernel uses the special register sscratch, that it sets before switching to the user process, to the address of the trapframe. Then, is swaps the contents of a0 and sscratch with csrw sscratch, a0, so the user process's value of a0 is saved into sscratch, and then loads the address of the trapframe to a0.
- A second major issue, is that the CPU jumps to the interrupt vector in supervisor mode, but keeps the page table of the process that trapped. But to handle the trap, the kernel code must restore the kernel's page table. It means extra care must be put into ensuring the code can execute both in the user process page table and in the kernel page table. This is the purpose of this "trampoline" code: it is mapped by the kernel at the same virtual addresses in all user processes, and in the kernel page table.
the usertrap() function checks the content of the register scause to determine the cause of the trap, if this is a syscall, it delegates to syscall() (kernel/syscall.c);
the syscall() generic function checks the syscall number in the register a7 (remember the li a7, SYS_fwait in userland), and calls the matching syscall implementation (or displays the error message you saw in the previous test if it does not exist);
it also sets the return value in the register a0 to the value returned by the syscall implement (or -1 because the call was unsuccessful);
finally, the execution flow unstacks the calls, goes through the function usertrapret to prepare the switch back to the user process, goes back to the return trampoline function userret (in kernel/trampoline.S) that restores the context of the user process, and eventually returns to the user process itself.

To navigate more easily in the sources of xv6, we invite you to use the ctags tool (sudo apt-get install exuberant-ctags) which allows you to index source code and perform research.

To generate an index, run the command make tags which invokes etags *.S *.c (you can also modify the Makefile to also index *.h files).

Then, the use of the index depends on your editor:

With emacs:

To find the definition of a function (or of a type), position the cursor on the type, then do M-. ("META dot", so the Esc key then the . key)
To go back, type M-,

With VIM:

To find the definition of a function or a type, position the cursor on the function, then type C-] (Ctrl and ])
To go back, type C-t

Now that our userland test infrastructure is in place, we can write the fwait and fwake system calls in the kernel. You're about to write your first lines of kernel code!

In the kernel/syscall.c file, modify the syscalls table to match respectively the system call number SYS_fwait to its kernel implementation sys_fwait(), and the system call number SYS_fwake to its kernel implementation sys_fwake(). Also, add the signatures of these functions just before the definition of the syscalls structure, as it is done for sys_uptime() for instance.

The signatures of the functions that implement the syscalls are different from the user interfaces of the syscalls. We will see next that arguments must be handled in a special way.

In proc.c, which takes care of managing the processes and who therefore owns the list of processes, add the functions sys_fwait() and sys_fwake(). Each of these functions must display a message using the function kernel printf then return 0. The printf on the kernel side behaves much like the printf function that you know, but it is no the same implementation, as it can only print on the console (no notion of stdout).

Check that the messages are displayed when you run the command ftest.

Congratulations, you now know how to create system calls in xv6 and you are able to test them!

Implementing system calls (∼ 1h)

We first take care of the sys_fwait() function. First, let's get the arguments of the system call. To do this, use the argint() function. You will find an example of its usage in sysproc.c (where other syscalls about processes are implemented). Print the values of the first two parameters of sys_fwait and verify that they are 1 and 42.

We now create the slots. In proc.c, add a variable int slots [] that you initialize with an array of 5 elements at 0.

As the xv6 kernel can run in parallel on multiple cores, you must acquire a lock before accessing slots. Indeed, it is quite possible that another core executes fwake while fwait is being executed, so the fwake would be missed and the process that called fwait would be stuck waiting.

To put the lock in place:

declare a variable struct spinlock slots_lock; in proc.c;
initialize this lock in the function procinit() like other existing locks;

Finally, acquire this lock in sys_wait() after extracting the arguments, then release it.

In order to wake up the processes that are waiting on a slot, we add an integer wait_slot field in the structure of a process. When this field is 1, the process is waiting for a slot to change value. Add this field to the proc structure in the file proc.h.

In sys_wait, you must be able to find the structure proc corresponding to the current process. For this you can call myproc(). Next, as long as slots[slot] is not equal to val (where slot and val are the arguments of sys_wait), it should :

set the wait_slot field of the current process to 1;
set the state of the current process to SLEEPING, which makes the process ineligible;
call sched() so that xv6 runs another process (xv6 chooses a process in RUNNABLE state and therefore ignore our process),
reset wait_slot to 0.

Be careful with the locks:

The operations on the proc structure must be done while holding the process's lock (including calling sched()).
The lock on slots should be released once the lock on the process is taken, and taken again before releasing the latter.

If you look at the code of sched(), you will see that this function considers that the current process must hold it lock. Magically enough, the sched() function never lets go of the lock! Technically, this function elects a new process, and it is this new process who owns the lock since it is this process that runs and who acquired it before falling asleep.

Test your program and verify that the child never runs again (no "go go go!" message). It's expected since the father never wakes him up.

Implement the sys_fwake() function. For this you must:

extract the slot and val arguments;
acquire the lock slots_lock;
assign the value val to slots[slot];
browse the process table (from proc[0] to proc[NPROC]), and for each of them, return it in the RUNNABLE state if it is in the SLEEPING state and its wait_slot variable is true (do not forget to take the process's lock);
release the lock slots_lock.

Check that your program now runs as expected.

Very important question for the future!

Compare the code of your functions sys_fwait()/sys_fwake() with the code of the functions sleep() and wakeup() in the same file proc.c.

You should notice that these functions generalize the principle of waiting. sleep pauses a process while waiting for an event on chan(), and wakeup() wakes up processes that are waiting for an event on chan. We could therefore have implemented our functions as follows:

struct spinlock slots_lock;

uint64
sys_fwait(void)
{
    int slot, val;

    argint(0, &slot);
    argint(1, &val);

    acquire(&slots_lock);

    while(slots[slot] != val)
        // Wait for a wakeup on &slots[slot].
        // sleep() releases the lock passed in argument, and takes it back before returning.
        sleep(&slots[slot], &slots_lock);

    release(&slots_lock);

    return 0;
}

uint64
sys_fwake(void)
{
    int slot, val;

    argint(0, &slot);
    argint(1, &val);

    acquire(&slots_lock);

    slots[slot] = val;

    // Wake up processes waiting on &slots[slot].
    wakeup(&slots[slot]);

    release(&slots_lock);

    return 0;
}

In order to see the code of the solution, execute the following sequence of instructions in a terminal:

git remote add futex-soluce https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-riscv-soluces/xv6-soluce-futex.git
git fetch futex-soluce
git checkout -b futex-soluce -t futex-soluce/master

And to see only the added code:

git diff origin/master