Operating systems

Exam 2023-2024

Organization

Duration : 3 h

Preparation

cd someDirectory
wget http://www-inf.telecom-sudparis.eu/COURS/CSC4508/Supports/CF/2023-2024/cf1/G21Xw7xVpe/Exam.tgz
tar xvfz Exam.tgz
mv Exam ${USER}_Exam
cd ${USER}_Exam

Returning your work

At the end of the exam, you have to post your work on moodle.

Please post a tgz file. You can generate it with:

cd someDirectory
tar cvfz ${USER}.tgz ${USER}_Exam
	    

Linear regression, part 1 (6 points)

The goal of this exercise is to start implementing an application that computes the linear regression of a set of points. This exercise focuses on loading the data from a file specified by the application user.

In the exercise2 directory, the linear_regression.c program defines function void process_file(char* input_file). This function should:

open the input file
check the file size
allocate an array points large enough to store the points
read the content of the file and populate the points array
call the linear_regression function

In this exercise, the points are defined as the struct point data structure that contains a couple of integers (x, y).

We provide several input files points_*.dat. Each of these file contains a list of points, eg:

$ cat points_16.dat
34686,1951
22975,1214
56505,276
-43892,-199
66898,2784
-49195,-146
-10531,-52
4591,-596
-35186,994
-25057,-1053
-44983,646
7034,-44
-17865,-115
8521,-189
-97132,-428
-82209,-209
      

In order to validate your programs, here are the expected results for each of the input files:

$ ./linear_regression  points_16.dat 
SX=-204840
SXX=34401327642
SY=4834
SXY=332371608

Equation of best fit is: y = 460.9570 + 0.0124x

$ ./linear_regression  points_500.dat 
SX=-3038903
SXX=4702554441727
SY=-36093
SXY=44007904529

Equation of best fit is: y = -15.3684 + 0.0093x

$ ./linear_regression  points_4096.dat 
SX=-1467404
SXX=18943288023082
SY=1294
SXY=189633112869

Equation of best fit is: y = 3.9023 + 0.0100x

$ ./linear_regression  points_65536.dat 
SX=-15538847
SXX=191128109337657
SY=-13092
SXY=1286633575125

Equation of best fit is: y = 1.3964 + 0.0067x
      

File handling

First, we need to open the input file provided by the user, and allocate a buffer large enough to store all the points. Modify the void process_file(char* input_file); function so that it open input_file in read-only mode. In case of an error, print an error message to stderr, and exit the function. Once the file is open, process_file calls file_size in order to get the number of bytes contained in the file. This number is then used for allocating the points buffer. Implement the size_t file_size(FILE*f) function.

Barème

Open/close: 1 point
gestion des erreurs: 1 point
Implémentation de file_size: 1 point

Once the points buffer is allocated, process_file calls int read_points(FILE* f, struct point* points, int nb_points_max). This function reads the points in the input file and stores then in the points array. This function returns the number of points that were read.

Implement the read_points function.

Barème:

Appel correct à fscanf: 1 point
Boucle avec condition d'arrêt: 1 point
Vérification qu'on ne dépasse pas du tableau: 1 point

Once this exercise is correctly implemented, the program should process the input file:

$ ./linear_regression  points_4096.dat 
SX=-1467404
SXX=18943288023082
SY=1294
SXY=189633112869

Equation of best fit is: y = 3.9023 + 0.0100x
      

linear_regression.c

Parallelizing an application (8 points)

For this exercise, move to the exercise3 directory.

The goal is now to parallelize the linear regression: one thread reads the input file, and a set of worker threads process the data. We will start will one worker thread, and will later use several worker threads.

This parallelization will be implemented in the following questions:

Question a: creating of one worker thread
Question b: making the main thread and the worker thread communicate
Question c: termination of the worker thread
Question d: using a pool of worker threads instead of one worker thread

process_file uses a pre-defined dataset (dataset_500), and it defines a struct regression named global_r whose role is to aggregate the worker thread(s) computation.

Each worker thread processes a set of points, and updates the global_r.

In this exercise, the program requires an input file, but it does not use it. Instead, we use the pre-defined dataset (dataset_500) so that you can do this exercise even if you did not implement the first exercise.

2 points

Modify the process_file function so that it creates a worker thread in charge of processing points. After, creating the worker thread process_file should wait for the termination of the thread.

process_file should allocate a struct worker_info, initializes it, and pass it to the worker thread.

The worker thread calls void linear_regression_worker(struct worker_info *wi) that will be in charge of communicating with the main thread, and processing points.

bareme

pthread_create: 1 point
pthread_join: 1 point

3 points

To make the threads communicate, we will use an anonymous pipe. The main thread will copy each of the points to the pipe, and the worker thread(s) will from the pipe, and process the points.

Modify process_file so that it creates a pipe before creating the worker thread. process_file should communicate the pipe file descriptor to the worker thread (eg, using the worker_info structure).

Then, process_file should iterate over the points, and sends them to the worker thread through the pipe.

Modify linear_regression_worker so that the function reads the points from the pipe, and process them by calling linear_regression_update.

Bareme

creation du pipe: 1 point
Utilisation du pipe: 2 points

1 point

Now, we want to make sure that the worker thread finished processing points before printing the results. To do that, we will pass a special value (a point whose x and y values are INT_MAX, defined in limits.h) through the pipe to notify the worker thread(s) that they can stop processing points. When a worker thread receives this notification, it exists its main loop, and updates the global_r variable.

Modify process_file and linear_regression_worker to implement the termination mechanism.

Make sure that the printed results correspond to the expected values, eg:

$ ./linear_regression  points_500.dat 
SX=-3038903
SXX=4702554441727
SY=-36093
SXY=44007904529

Equation of best fit is: y = -15.3684 + 0.0093x
	      

Bareme

Notification/Terminaison: 1 point

2 points

We now want to use several workers to process the points in parallel.

Modify the program so that process_file creates several worker threads (eg. 4 threads), and the worker threads read the points from the pipe, and process them.

Make sure to protect data structure from any concurrent modification by the worker threads.

Bareme

création/terminaison de plusieurs threads: 1 point
protection des structures de données: 1 point

linear_regression.c

XV6 - Number of child process (7 points)

In this exercise, you are asked for xv6 code. You don't have to attach all the sources to your archive (that would be too big for an email). Instead start by cloning xv6 outside from the directory where you did the other exercises with the following command:

$ git clone https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-riscv.git

At the end of the exam, generate a patch in the directory where you did the other exercises, run the following commands:

$ git add -N test_nb_child.c 
# add test_nb_child.c to the git repo
$ git diff origin/master > ${PATH_TO_CF}/exercise4/diff.patch
# create the patch with all your code changes
	  

Where ${PATH_TO_CF} is the path to the directory containing the code you did for the other exercises.

Adding a system call (4 points)

Add a system call int get_nb_child(int pid) in xv6 that returns the number of child process of process pid.

The implementation of this system call should:

Check that process pid is usable (ie. its state is different from UNUSED

UNUSED

-1

Browse the list of usable processes, count the number of process whose parent is pid, and return this value

You can implement this system call in proc.c.

Grading scale:

Setting the system call: 2 point
Check the process state: 0.5 point
Locking: 0.5 point
Count the number of parent process: 1 point

3 point

Create a test program test_nb_child.c. In this program, the main process P0 creates 4 child processes C0, C1, C2, C3. Each child process parent should be P0.

Then, each child process should call get_nb_child(P0) and print the return value (which should be 4, since P0 created C0, C1, C2, and C3).

Make sure that you add test_nb_child.c to your local git repository by running the following command:

$ git add -N test_nb_child.c

Grading scale:

Test program: 2 point
Modification of the Makefile: 1 point

Operating systems

Partie 1 - Process

Part 2 - Memory management

Part 3 - I/O

Exam 2023-2024

Organization

Preparation

Returning your work

Linear regression, part 1 (6 points)

File handling

Parallelizing an application (8 points)

2 points

3 points

1 point

2 points

XV6 - Number of child process (7 points)

Adding a system call (4 points)

3 point