Operating systems

Portail informatique

Exam 2023-2024

Organization

Duration : 3 h

Preparation

cd someDirectory wget http://www-inf.telecom-sudparis.eu/COURS/CSC4508/Supports/CF/2023-2024/cf1/G21Xw7xVpe/Exam.tgz tar xvfz Exam.tgz mv Exam ${USER}_Exam cd ${USER}_Exam

Returning your work

At the end of the exam, you have to post your work on moodle.

Please post a tgz file. You can generate it with:

cd someDirectory tar cvfz ${USER}.tgz ${USER}_Exam

Linear regression, part 1 (6 points)

The goal of this exercise is to start implementing an application that computes the linear regression of a set of points. This exercise focuses on loading the data from a file specified by the application user.

In the exercise2 directory, the linear_regression.c program defines function void process_file(char* input_file). This function should:

  • open the input file
  • check the file size
  • allocate an array points large enough to store the points
  • read the content of the file and populate the points array
  • call the linear_regression function

In this exercise, the points are defined as the struct point data structure that contains a couple of integers (x, y).

We provide several input files points_*.dat. Each of these file contains a list of points, eg:

$ cat points_16.dat 34686,1951 22975,1214 56505,276 -43892,-199 66898,2784 -49195,-146 -10531,-52 4591,-596 -35186,994 -25057,-1053 -44983,646 7034,-44 -17865,-115 8521,-189 -97132,-428 -82209,-209

In order to validate your programs, here are the expected results for each of the input files:

$ ./linear_regression points_16.dat SX=-204840 SXX=34401327642 SY=4834 SXY=332371608 Equation of best fit is: y = 460.9570 + 0.0124x $ ./linear_regression points_500.dat SX=-3038903 SXX=4702554441727 SY=-36093 SXY=44007904529 Equation of best fit is: y = -15.3684 + 0.0093x $ ./linear_regression points_4096.dat SX=-1467404 SXX=18943288023082 SY=1294 SXY=189633112869 Equation of best fit is: y = 3.9023 + 0.0100x $ ./linear_regression points_65536.dat SX=-15538847 SXX=191128109337657 SY=-13092 SXY=1286633575125 Equation of best fit is: y = 1.3964 + 0.0067x

File handling

First, we need to open the input file provided by the user, and allocate a buffer large enough to store all the points. Modify the void process_file(char* input_file); function so that it open input_file in read-only mode. In case of an error, print an error message to stderr, and exit the function. Once the file is open, process_file calls file_size in order to get the number of bytes contained in the file. This number is then used for allocating the points buffer. Implement the size_t file_size(FILE*f) function.

Once the points buffer is allocated, process_file calls int read_points(FILE* f, struct point* points, int nb_points_max). This function reads the points in the input file and stores then in the points array. This function returns the number of points that were read.

Implement the read_points function.

Once this exercise is correctly implemented, the program should process the input file:
$ ./linear_regression points_4096.dat SX=-1467404 SXX=18943288023082 SY=1294 SXY=189633112869 Equation of best fit is: y = 3.9023 + 0.0100x

Parallelizing an application (8 points)

For this exercise, move to the exercise3 directory.

The goal is now to parallelize the linear regression: one thread reads the input file, and a set of worker threads process the data. We will start will one worker thread, and will later use several worker threads.

This parallelization will be implemented in the following questions:

  • Question a: creating of one worker thread
  • Question b: making the main thread and the worker thread communicate
  • Question c: termination of the worker thread
  • Question d: using a pool of worker threads instead of one worker thread

process_file uses a pre-defined dataset (dataset_500), and it defines a struct regression named global_r whose role is to aggregate the worker thread(s) computation.

Each worker thread processes a set of points, and updates the global_r.

In this exercise, the program requires an input file, but it does not use it. Instead, we use the pre-defined dataset (dataset_500) so that you can do this exercise even if you did not implement the first exercise.

2 points

Modify the process_file function so that it creates a worker thread in charge of processing points. After, creating the worker thread process_file should wait for the termination of the thread.

process_file should allocate a struct worker_info, initializes it, and pass it to the worker thread.

The worker thread calls void linear_regression_worker(struct worker_info *wi) that will be in charge of communicating with the main thread, and processing points.

3 points

To make the threads communicate, we will use an anonymous pipe. The main thread will copy each of the points to the pipe, and the worker thread(s) will from the pipe, and process the points.

Modify process_file so that it creates a pipe before creating the worker thread. process_file should communicate the pipe file descriptor to the worker thread (eg, using the worker_info structure).

Then, process_file should iterate over the points, and sends them to the worker thread through the pipe.

Modify linear_regression_worker so that the function reads the points from the pipe, and process them by calling linear_regression_update.

1 point

Now, we want to make sure that the worker thread finished processing points before printing the results. To do that, we will pass a special value (a point whose x and y values are INT_MAX, defined in limits.h) through the pipe to notify the worker thread(s) that they can stop processing points. When a worker thread receives this notification, it exists its main loop, and updates the global_r variable.

Modify process_file and linear_regression_worker to implement the termination mechanism.

Make sure that the printed results correspond to the expected values, eg:
$ ./linear_regression points_500.dat SX=-3038903 SXX=4702554441727 SY=-36093 SXY=44007904529 Equation of best fit is: y = -15.3684 + 0.0093x

2 points

We now want to use several workers to process the points in parallel.

Modify the program so that process_file creates several worker threads (eg. 4 threads), and the worker threads read the points from the pipe, and process them.

Make sure to protect data structure from any concurrent modification by the worker threads.

XV6 - Number of child process (7 points)

In this exercise, you are asked for xv6 code. You don't have to attach all the sources to your archive (that would be too big for an email). Instead start by cloning xv6 outside from the directory where you did the other exercises with the following command:

$ git clone https://gitlab.inf.telecom-sudparis.eu/csc4508/xv6-riscv.git

At the end of the exam, generate a patch in the directory where you did the other exercises, run the following commands:

$ git add -N test_nb_child.c # add test_nb_child.c to the git repo $ git diff origin/master > ${PATH_TO_CF}/exercise4/diff.patch # create the patch with all your code changes

Where ${PATH_TO_CF} is the path to the directory containing the code you did for the other exercises.

Adding a system call (4 points)

Add a system call int get_nb_child(int pid) in xv6 that returns the number of child process of process pid.

The implementation of this system call should:

  • Check that process pid is usable (ie. its state is different from UNUSED
  • ). If the process is UNUSED, print an error message, and return -1.
  • Browse the list of usable processes, count the number of process whose parent is pid, and return this value
You can implement this system call in proc.c.

3 point

Create a test program test_nb_child.c. In this program, the main process P0 creates 4 child processes C0, C1, C2, C3. Each child process parent should be P0.

Then, each child process should call get_nb_child(P0) and print the return value (which should be 4, since P0 created C0, C1, C2, and C3).

Make sure that you add test_nb_child.c to your local git repository by running the following command:
$ git add -N test_nb_child.c