CSC5101 – Advanced Programming of Multicore Architectures

Interruptions, time management and scheduling

The goal of this lab is:

to make you handle the interruptions,
to make you understand time management in a kernel,
and show you how a scheduler works .

Implementation of a test program (∼ 45mn)

Before modifying the kernel, we first create a test program.

Start by getting the code from xv6:

git clone https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-tsp.git -b scheduler-lab
or git checkout scheduler-lab if you already have a local copy of the repository

You are now in the scheduler-lab branch which contains the source file stest.c. This file contains #pragma (i.e. directives for the compiler) to avoid that gcc performs optimizations (this is equivalent to using the -O0 option of gcc). These optimizations should be avoided because the code we are going to run in stest is so simple that gcc happens to entirely eliminate it if optimization are enabled.

Next, create a local branch named my-scheduler for your lab:

git checkout -b my-scheduler

Modify the Makefile so as to compile stest by adding _stest to the UPROGS variable. Run xv6 (with make qemu) and check that you can run this new command. This command should end without doing anything.

If the stest program ends with the error

trap 14 err 5 on cpu X eip 0xffffffff
[...]

, it is likely that you forgot to call exit() to terminate the program.

Modify stest.c so that it creates 3 child processes with fork . Each child should display PID starts then PID ends , where PID is the process PID. Check that your program behaves correctly. If you see that xv6 starts mentionning zombies, don't worry: we will get rid of these evil zombies in the next question.

If you are curious, do not hesitate to read the code of fork in proc.c. This code is informative and shows how we can duplicate a process, that is, at high level, copy its memory and structure proc. Don't hesitate to ask your teachers if you want have more details on this code that we will not study more in this module.

In most systems, when a process terminates, it enters the state zombie. This intermediate state between life and death allows the parent process to be able to consult the return code of their child while preventing the child from being elected. As soon as the parent has retrived this return code, the zombie is destroyed, i.e. the structure proc used by the zombie is definitely released.

In general, a user rarely sees zombies because:

a shell destroys its zombie children on its own,
when a zombie's parent dies, the zombie, like any other process, is adopted by an adopting process (process 1, a process named systemd or sometimes a process with another name specialized in this role). When adopting a zombie, the adopting process destroys it.

In xv6, the init process is the adopting process and it displays a message when it sees a zombie. Using the code from init.c as inspiration, make sure that stest destroys its zombie children.

If you read the whole code of init, you can understand what this initial process does:

init creates file descriptors 0, 1, and 2, and associates them with the console. These are the input, output, and error streams of init. These open files being inherited during a fork (see the code of fork in proc.c), you should understand how a parent and child share their input, error or output streams.
init then creates a shell, this is the one with which you can interact with xv6.

Modify stest so that each process (the parent and their three children) perform 3 loops of 2 << 28 iterations doing nothing. If you run your program, it should now take a few seconds to run.

When optimizing the code, gcc realizes that the loops has no effect, except on local variables. Since these variables are not used, i.e. they are not read, gcc thus manages to completely eliminate the code of the loops when it optimizes. It is for this reason that we disabled optimizations in this program.

Interruptions (∼ 30mn)

If you run stest, you should notice that, unexpectedly, xv6 runs two processes until their end, before running the other two entirely. It's normal: one of your teachers distracted forgot to put the time management code in xv6. As the Makefile is configured to use two cores, once a cores executes a process, the process is running entirely without giving control back to the system since it does not any system calls and there is no interrupt reception.

Let's start by reactivating the clock interrupts. Technically, xv6 only uses a local time source to manage the tick on each CPU, and, instead of using an external source, also uses this local source to manage the jiffies.

In the lapicinit function of lapic.c, add the code to configure the local lapic clock. To do this, copy the following code into the function:

lapicw(TDCR, X1);
lapicw(TIMER, PERIODIC | (T_IRQ0 + IRQ_TIMER));
lapicw(TICR, 200*10000000);

In this code:

the second line asks to start the local clock in periodic mode, i.e. it requests to regularly receive an interrupt on the IDT number T_IRQ0 + IRQ_TIMER (note: it is not necessary to indicate which core should receive the interrupt since the LAPIC generates the interrupt on its local core);
the last line sets the tick frequency: every 200 * 10000000 / (TDCR * HZ) seconds, where HZ is the memory bus frequency, and TDCR is a divider configured at the first line.

Run xv6. If your code is correct you should regularly see a message telling you that xv6 is alive.

If you take a look at lapicw which takes care of sending data to the lapic, you should notice that this code only writes to memory. Technically, lapic points to an MMIO memory area, so it's the lapic which receives write requests, not memory.

We have set the tick to a relatively low frequency so as not to have too many displays of messages. Make sure to multiply the tick reception frequency by 200. Also remove the code showing that xv6 is alive in trap.c.

If you observe the few lines of code that are above the line you just deleted, you can observe the code that takes care of handling the jiffies:

only core 0 manages jiffies, and this variable is called tick in xv6 (the use of the name tick here is not ideal: usually this variable is called jiffies and the tick is local to a core);
you can also notice the call to wakeup which wakes up the processes which wait on the variable jiffies. It is with this synchronization mechanism that xv6 implements the call sleep system (see the code for sys_sleep in sysproc.c).

wakeup

jiffies

xv6

sleep

sys_sleep

sysproc.c

Note also that the trap function is the entry point of xv6: the start of the code manages the special case of a system call; the suite manages the other interrupts.

Scheduling (∼ 1h15)

In this exercise, you design a scheduler in xv6.

Modify the code of trap so that what the current process releases the processor each time the kernel receives a tick. To do this, add the following code if (myproc()) yield(); each time you receive tick(). This function allows you to let go of the processor, which has the effect of activating the scheduler (elect() function).

Explain the difference between the RUNNING state and the RUNNABLE state. These two constants are defined in proc.h and are used in the yield() method.

The RUNNABLE state means that the process is ready to run but that he is not elected. The RUNNING state means that the process runs on a core.

Knowing that xv6 can run on multiple cores, explain why accesses to the process state field (the one that indicates if a process is in the RUNNING state or RUNNABLE) must be protected. You are not asked to read the code, but to find an example with a pseudo-code.

The inconsistency could come from the fact that a process is elected on two cores simultaneously.

Explain how, from yield(), the code arrives to the scheduler() function.

The yield() function begins by acquiring the ptable.lock lock and setting the process state to RUNNABLE since the latter is no longer elected. Then yield() calls sched().

The sched function starts by checking that the state of the kernel is consistent before calling the swtch function. This magic function in assembler allows you to switch from the current process to the process named scheduler which we will described later. In detail, the swtch function saves the processor registers and restore those of the scheduler process. In particular, swtch saves the processor code pointer (which is found in sched) and activate that of the scheduler process (found in the scheduler function).

The scheduler function is the scheduler process. This special process runs an infinite loop in the kernel which takes care of finding a process to elect (with elect()) and pass to this process using the swtch function.

Look in the kernel where the scheduler function is called. Explain.

Technically, scheduler is the last function called in mpmain, which is called by main. This main function is the entry point into the kernel when start-up (see previous lab). We can therefore conclude that the scheduler is simply the initial start-up process.

You should notice that, despite our changes, the scheduler continues to execute entirely two processes before running the other two. By reading the code of elect(), explain why.

The elect function chooses the first ready process of the array proc, which ultimately amounts to systematically electing the same process.

Modify the elect() function to avoid the famine problem that you observe (in other words to ensure that all processes are elected regularly). This question is long and should keep you occupied until the end of the lab. You have several solutions. For the most talented students, we advise to choose the variable priority policy and for other students, FIFO policy. For the super-human students, they can directly get into the PetitLinux policy:

[FIFO]: you can use the old parameter which indicates what was the last process elected, and move to the next one in the table. In Linux, this policy is called SCHED_RR (round-robin).
[Fixed priority]: you can add a static priority prio to a process (in the proc structure) that you will set manually (for example with prio = pid / 2); then apply a FIFO algorithm on the highest priority processes (solution starting to get closer to what we do in real time);
[LRU]: you can add a timestamp to the proc structure indicating which process was least recently elected. For that, you just have to understand what the tick variable is (this is a relatively realistic implementation);
[Variable priority]: you can add a prio variable to the proc structure. Then, make this variable increase with each tick for unelected processes and decreases sharply with each tick for the elected process indicated by the old variable. Linux's SCHED_OTHER policy is somehow similar to this solution but is fairer;
[PetitLinux]: you can add process type and apply the following policy:
- If old has the type FIFO, you keep it elected. With this policy, as soon as a process is elected, it has the guarantee to remain elected, unless it falls asleep. This is the policy you had at the beginning of this lab;
- Otherwise, if there is a process with the SCHED_FIFO attribute, you elect him;
- Otherwise, if the old process has the type SCHED_RR, you elect the SCHED_RR process by applying a FIFO policy;
- Otherwise, if there is a process of type SCHED_RR, you elect it;
- Otherwise, you apply a variable priority algorithm or LRU.

Since 2007, when a process is of type SCHED_OTHER, Linux actually uses CFS (Completly Fair Scheduler). You can find some a high level description here. Apart from this variation, the PetitLinux algorithm is exactly the one we have described. Note that the Linux scheduling algorithm is about 20,000 lines of code.

You can find a draft fix in the scheduler-soluce branch:

git checkout scheduler-soluce
git diff scheduler

In order to see the solution of a scheduler with variable priority, run the following sequence of instructions in a terminal:

git remote add scheduler-soluce https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-soluces/xv6-soluce-scheduler.git
git fetch scheduler-soluce
git checkout -b scheduler-soluce -t scheduler-soluce/scheduler-lab

And to see only the added code:

git diff origin/scheduler-lab