Exam 2023-2024
Organization
Duration : 3 h
Preparation
Returning your work
At the end of the exam, you have to post your work on moodle.
Please post a tgz file. You can generate it with:
Linear regression, part 1 (6 points)
The goal of this exercise is to start implementing an application that computes the linear regression of a set of points. This exercise focuses on loading the data from a file specified by the application user.
In the exercise2 directory, the linear_regression.c program defines function void process_file(char* input_file). This function should:
- open the input file
- check the file size
- allocate an array points large enough to store the points
- read the content of the file and populate the points array
- call the linear_regression function
In this exercise, the points are defined as the struct point data structure that contains a couple of integers (x, y).
We provide several input files points_*.dat. Each of these file contains a list of points, eg:
In order to validate your programs, here are the expected results for each of the input files:
File handling
Once the points buffer is allocated, process_file calls int read_points(FILE* f, struct point* points, int nb_points_max). This function reads the points in the input file and stores then in the points array. This function returns the number of points that were read.
Implement the read_points function.
Parallelizing an application (8 points)
For this exercise, move to the exercise3 directory.
The goal is now to parallelize the linear regression: one thread reads the input file, and a set of worker threads process the data. We will start will one worker thread, and will later use several worker threads.
This parallelization will be implemented in the following questions:
- Question a: creating of one worker thread
- Question b: making the main thread and the worker thread communicate
- Question c: termination of the worker thread
- Question d: using a pool of worker threads instead of one worker thread
process_file uses a pre-defined dataset (dataset_500), and it defines a struct regression named global_r whose role is to aggregate the worker thread(s) computation.
Each worker thread processes a set of points, and updates the global_r.
2 points
Modify the process_file function so that it creates a worker thread in charge of processing points. After, creating the worker thread process_file should wait for the termination of the thread.
process_file should allocate a struct worker_info, initializes it, and pass it to the worker thread.
The worker thread calls void linear_regression_worker(struct worker_info *wi) that will be in charge of communicating with the main thread, and processing points.
3 points
To make the threads communicate, we will use an anonymous pipe. The main thread will copy each of the points to the pipe, and the worker thread(s) will from the pipe, and process the points.
Modify process_file so that it creates a pipe before creating the worker thread. process_file should communicate the pipe file descriptor to the worker thread (eg, using the worker_info structure).
Then, process_file should iterate over the points, and sends them to the worker thread through the pipe.
Modify linear_regression_worker so that the function reads the points from the pipe, and process them by calling linear_regression_update.
1 point
Now, we want to make sure that the worker thread finished processing points before printing the results. To do that, we will pass a special value (a point whose x and y values are INT_MAX, defined in limits.h) through the pipe to notify the worker thread(s) that they can stop processing points. When a worker thread receives this notification, it exists its main loop, and updates the global_r variable.
Modify process_file and linear_regression_worker to implement the termination mechanism.
2 points
We now want to use several workers to process the points in parallel.
Modify the program so that process_file creates several worker threads (eg. 4 threads), and the worker threads read the points from the pipe, and process them.
XV6 - Number of child process (7 points)
In this exercise, you are asked for xv6 code. You don't have to attach all the sources to your archive (that would be too big for an email). Instead start by cloning xv6 outside from the directory where you did the other exercises with the following command:
At the end of the exam, generate a patch in the directory where you did the other exercises, run the following commands:
Where ${PATH_TO_CF} is the path to the directory containing the code you did for the other exercises.
Adding a system call (4 points)
Add a system call int get_nb_child(int pid) in xv6 that returns the number of child process of process pid.
The implementation of this system call should:
- Check that process pid is usable (ie. its state is different from UNUSED ). If the process is UNUSED, print an error message, and return -1.
- Browse the list of usable processes, count the number of process whose parent is pid, and return this value
3 point
Create a test program test_nb_child.c. In this program, the main process P0 creates 4 child processes C0, C1, C2, C3. Each child process parent should be P0.
Then, each child process should call get_nb_child(P0) and print the return value (which should be 4, since P0 created C0, C1, C2, and C3).