Operating systems

Interruptions, time management and scheduling

The goal of this lab is:

to make you handle the interruptions,
to make you understand time management in a kernel,
and show you how a scheduler works .

Implementation of a test program (∼ 45mn)

Before modifying the kernel, we first create a test program.

Start by getting the code from xv6:

git clone https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-riscv.git -b scheduler-lab
or git checkout scheduler-lab if you already have a local copy of the repository

You are now in the scheduler-lab branch which contains the source file user/stest.c. This file contains #pragma (i.e. directives for the compiler) to avoid that gcc performs optimizations (this is equivalent to using the -O0 option of gcc). These optimizations should be avoided because the code we are going to run in stest is so simple that gcc happens to entirely eliminate it if optimization are enabled.

Next, create a local branch named my-scheduler for your lab:

git checkout -b my-scheduler

Modify the Makefile so as to compile stest by adding $U/_stest to the UPROGS variable. Run xv6 (with make qemu) and check that you can run this new command. This command should end without doing anything.

Modify stest.c so that it creates 3 child processes with fork(). Each child should display PID starts then PID ends, where PID is the process PID, before exiting. Check that your program behaves correctly. If you see that xv6 starts mentionning zombies, don't worry: we will get rid of these evil zombies in the next question.

It is very much possible that most of your test program's output is scrambled and illegible. Don't worry! This is because xv6's serial console does not buffer the output line per line, and instead outputs characters as soon as possible.

If you are curious, do not hesitate to read the code of fork() in proc.c. This code is informative and shows how we can duplicate a process, that is, at high level, copy its memory and structure proc. Don't hesitate to ask your teachers if you want have more details on this code that we will not study more in this module.

In most systems, when a process terminates, it enters the state zombie. This intermediate state between life and death allows the parent process to be able to consult the return code of their child while preventing the child from being elected. As soon as the parent has retrieved this return code, the zombie is destroyed, i.e. the structure proc used by the zombie is definitely released.

In general, a user rarely sees zombies because:

a shell destroys its zombie children on its own,
when a zombie's parent dies, the zombie, like any other process, is adopted by an adopting process (process 1, a process named systemd or sometimes a process with another name specialized in this role). When adopting a zombie, the adopting process destroys it.

In xv6, the init process is the adopting process and it displays a message when it sees a zombie. Using the code from init.c as inspiration, make sure that stest destroys its zombie children.

wait() can take an argument: a pointer to an int where wait() will write the exit code the child. The argument can be 0 to skip this feature. Also, wait() returns immediately with -1 if the caller process does not have any child.

If you read the whole code of init, you can understand what this initial process does:

init creates file descriptors 0, 1, and 2, and associates them with the console. These are the input, output, and error streams of init. These open files being inherited during a fork (see the code of fork() in proc.c), you should understand how a parent and child share their input, error or output streams.
init then creates a shell, this is the one with which you can interact with xv6.

Modify stest so that each process (the parent and their three children) perform 3 loops of 2 << 28 iterations doing nothing. If you run your program, it should now take a few seconds to run.

When optimizing the code, gcc realizes that the loops has no effect, except on local variables. Since these variables are not used, i.e. they are not read, gcc thus manages to completely eliminate the code of the loops when it optimizes. It is for this reason that we disabled optimizations in this program.

Interruptions (∼ 30mn)

If you run stest, you should notice that, unexpectedly, xv6 runs two processes until their end, before running the other two entirely. It's normal: one of your teachers distracted forgot to put the time management code in xv6. As the Makefile is configured to use two cores, once a cores executes a process, the process is running entirely without giving control back to the system since it does not do any system calls, and there is no interrupt reception.

Let's start by reactivating the clock interrupts. xv6 does not use an absolute external source of time. It only keeps ticks, that are incremented exclusively by the CPU 0 when a timer interrupt is triggered on this CPU core. In addition, those ticks are used as jiffies, i.e., are global to the kernel (whereas ticks are usually local to each core).

In the timerinit() function of start.c (executed only by the CPU 0), add the code to configure the CLINT timer. To do this, copy the following code at the end of the function:

*(uint64*)CLINT_MTIMECMP(id) = *(uint64*)CLINT_MTIME + interval;
w_mie(r_mie() | MIE_MTIE);

In this code:

the first line sets the time when the CLINT must trigger a timer interruption, this time is defined as a delay (interval) after the current time (CLINT_MTIME);
the last line enables machine-mode timer interrupts, i.e., the RISC-V platform will make the local CPU jump to the interrupt vector defined in mtvec (close to the end of timerinit()) when a timer interrupt is triggered by the local CLINT.

Run xv6. If your code is correct you should regularly see a message telling you that xv6 is alive.

If you take a closer look at the way the CLINT is configured, you should notice that it is only a write into memory (by dereferencing pointers computed by the macros CLINT_XXX). The configuration registers of the CLINT are exposed as MMIO memory areas, so it is the CLINT that receives write requests, not the memory.

In the same function timerinit(), the interrupt vector to serve timer interrupts is written to mtvec, which is a machine mode register. Indeed, timer interrupts are served in machine mode, and translated to software interrupts into the supervisor mode (i.e., the kernel proper) by the code of the interrupt handler (in kernelvec.S).

We have set the tick to a relatively low frequency so as not to have too many displays of messages. Make sure to multiply the tick reception frequency by 100. Also remove the code showing that xv6 is alive in trap.c.

If you observe the few lines of code that are above the line you just deleted, you can observe the code that takes care of handling the jiffies:

only core 0 manages jiffies (the function clockintr is only called when the timer interrupt is served by CPU 0), and this variable is called tick in xv6 (the use of the name tick here is not ideal: usually this variable is called jiffies and the tick is local to a core);
you can also notice the call to wakeup() which wakes up the processes which wait on the variable jiffies. It is with this synchronization mechanism that xv6 implements the sleep() system call (see the code for sys_sleep in sysproc.c).

Note also, by rewinding the call chain of clockintr(), that the functions usertrap() and kerneltrap() are the entry points into xv6: depending whether the interruption or exception happen in a user process or in the kernel, the former or the latter is called by the RISC-V platform (because it was written in stvec, the register of the interrupt vector in supervisor mode). Remember as explained above, that timer interrupts are served in machine mode, but are translated to software interrupts to the supervisor mode, i.e., to the code set at stvec.

Scheduling (∼ 1h15)

In this exercise, you design a scheduler in xv6.

Modify the code of usertrap() and kerneltrap() so that the current process releases the processor each time the kernel receives a tick, using yield().

To do this, add the following code to usertrap (before the call to usertrapret):

if (which_dev == 2)
  yield();

and the following code to kerneltrap:

if (which_dev == 2 && myproc() != 0 && myproc()->state == RUNNING)
  yield();

This function allows you to let go of the processor, which has the effect of activating the scheduler (and your elect() function).

Explain the difference between the RUNNING state and the RUNNABLE state. These two constants are defined in proc.h and are used in the yield() method.

The RUNNABLE state means that the process is ready to run but that he is not elected. The RUNNING state means that the process runs on a core.

Knowing that xv6 can run on multiple cores, explain why accesses to the process state field (the one that indicates if a process is in the RUNNING state or RUNNABLE) must be protected. You are not asked to read the code, but to find an example with a pseudo-code.

The inconsistency could come from the fact that a process is elected on two cores simultaneously.

Explain how, from yield(), the code arrives to the scheduler() function.

The yield() function begins by acquiring the ptable.lock lock and setting the process state to RUNNABLE since the latter is no longer elected. Then yield() calls sched().

The sched() function starts by checking that the state of the kernel is consistent before calling the swtch() function. This magic function in assembler allows you to switch from the current context to a new context, i.e., from the kernel context running the scheduler to the elected process and vice-versa. In detail, the swtch() function saves the processor registers and restore those of the kernel context running the scheduler. In particular, it restores the stack pointer of the context. The program counter and the page table of the process are restored by the normal mechanism of the functions usertrap() and kerneltrap() that handled the interruption that ultimately led to the call to swtch().

You can see the kernel context running the scheduler function as the scheduler process (although it does not appear in the list of processes). This special process runs an infinite loop in the kernel which takes care of finding a process to elect (with elect()) and switches to this process using the swtch() function.

Look in the kernel where the scheduler() function is called. Explain.

Technically, scheduler is the last function called in mpmain, which is called by main. This main function is the entry point into the kernel on startup (see previous lab). We can therefore conclude that the scheduler() is simply the initial startup process.

You should notice that, despite our changes, the scheduler continues to execute entirely two processes before running the other two. By reading the code of elect(), explain why.

The elect() function chooses the first ready process of the array proc, which ultimately amounts to systematically electing the same process.

In order to see the code of the solution to the lab up to this point, execute the following sequence of instructions in a terminal:

git remote add scheduler-soluce ssh://git@gitlab.inf.telecom-sudparis.eu:2222/csc4508/xv6-riscv-soluces/xv6-soluce-scheduler.git
git fetch scheduler-soluce
git checkout -b scheduler-soluce -t scheduler-soluce/scheduler-lab

And to see only the added code:

git diff scheduler-lab

Modify the elect() function to avoid the famine problem that you observe (in other words to ensure that all processes are elected regularly). This question is long and should keep you occupied until the end of the lab. You have several solutions. For the most talented students, we advise to choose the variable priority policy and for other students, FIFO policy. For the super-human students, they can directly get into the PetitLinux policy:

[FIFO]: you can use the old parameter which indicates what was the last process elected, and move to the next one in the table. In Linux, this policy is called SCHED_RR (round-robin).
[Fixed priority]: you can add a static priority prio to a process (in the proc structure) that you will set manually (for example with prio = pid / 2); then apply a FIFO algorithm on the highest priority processes (solution starting to get closer to what we do in real time);
[LRU]: you can add a timestamp to the proc structure indicating which process was least recently elected. For that, you just have to understand what the tick variable is (this is a relatively realistic implementation);
[Variable priority]: you can add a prio variable to the proc structure. Then, make this variable increase with each tick for unelected processes and decreases sharply with each tick for the elected process indicated by the old variable. Linux's SCHED_OTHER policy is somehow similar to this solution but is fairer;
[PetitLinux]: you can add process type and apply the following policy:
- If old has the type FIFO, you keep it elected. With this policy, as soon as a process is elected, it has the guarantee to remain elected, unless it falls asleep. This is the policy you had at the beginning of this lab;
- Otherwise, if there is a process with the SCHED_FIFO attribute, you elect him;
- Otherwise, if the old process has the type SCHED_RR, you elect the SCHED_RR process by applying a FIFO policy;
- Otherwise, if there is a process of type SCHED_RR, you elect it;
- Otherwise, you apply a variable priority algorithm or LRU.

Since 2007, when a process is of type SCHED_OTHER, Linux actually uses CFS (Completly Fair Scheduler). You can find some a high level description here. Apart from this variation, the PetitLinux algorithm is exactly the one we have described. Note that the Linux scheduling algorithm is about 20,000 lines of code.

Partie 1 - Process

Part 2 - Memory management

Part 3 - I/O

Interruptions, time management and scheduling

Implementation of a test program (∼ 45mn)

Interruptions (∼ 30mn)

Scheduling (∼ 1h15)