Operating systems

Portail informatique

Lock analyzer

The goal of this lab is to create a tool to analyze the performance of a multithreaded application that uses locks. This tool will intercept the calls to the locking primitives (thanks to LD_PRELOAD) and, for each function call, save an event to a file at the beginning and at the end of the function call. The generated trace file can be analyzed post-mortem in order to detect a possible performance problem.

Preamble

To get started, retrive the base for today's lab:

  • git clone -b tp11_base https://gitlab.inf.telecom-sudparis.eu/csc4508/csc4508_facebook.git csc4508_tp11

The code provided in lock_analyzer/ contains several files:

  • liblock.h: defines a few data structures
  • cond.c: intercepts calls to functions pthread_cond_ * and calls the enter_function and leave_function functions
  • mutex.c: intercepts calls to functions pthread_mutex_* and calls the enter_function and leave_function functions
  • write_events.c: implements the enter_function and leave_function functions
  • read_trace.c: program that reads a trace generated by write_event
  • test: directory containing programs to test the tool

You can test the current implementation by running the benchmark application while LD_PRELOADing the library:

$ cd lock_analyzer $ make [...] $ cd test $ make [...] $ ./benchmark [...] Thread 0 did 100000 loops in 52346.000000 usec (0.523460 usec per loop) $ LD_PRELOAD=../liblock.so ./benchmark Entering mutex_lock Leaving mutex_lock Entering mutex_unlock Leaving mutex_unlock [...] Thread 0 did 100000 loops in 2475668.000000 usec (24.756680 usec per loop)

On Mac systems, instead of using LD_PRELOAD, you may have to use the DYLD_INSERT_LIBRARIES variable which works the same way. Moreover, you may have to add the compilation flag -force_flat_namespace when compiling the application.

Generation of a trace file

First, we want to count the number of calls to each of the intercepted functions. Edit the write_events.c file to define an array of integers containing the number of calls to each function. In enter_function, increment the array entry that corresponds to the called function. In __write_events_conclude, display the number of intercepted calls.

Measure the additional cost of the tool using the benchmark application.

We now want to generate a file allowing to trace the execution of the application. For each function entry/exit, a struct lock_event must be saved to a file.

Modify your implementation so that each time a function is called, an event is written to disk with a Unbuffered IO.

Measure the overhead of generating traces on the benchmark application.

Complete the read_trace.c program. This program must open the trace file passed in parameter and print the content of each event.

Modify your implementation of write_events.c to use buffered I/O and measure the overhead of the tool.

Modify your implementation so that events are saved in a pre-allocated buffer (for example, by preallocating a 64MB buffer during initialization). At the end of the application, write the buffer to the disk.

Measure the overhead of the tool.

A problem arises when the buffer becomes full. One solution is to stop recording events. Another solution is to empty the buffer to continue recording events.

Modify your tool so that it flushes the buffer when it becomes full.

Measure the overhead of the tool.

To avoid blocking the application when emptying the buffer, we can use a "double buffering" technique: two buffers A and B are allocated, and the events are recorded in buffer A. When A becomes full, we start an asynchronous write, and we continue writing to buffer B.

Measure the overhead of the tool.

The answers to this exercise are available in the tp11_corrige branch of the git repository.