CSC5101 – Advanced Programming of Multicore Architectures

Parallel and Distributed Systems track

Non-blocking algorithms

The lock free queue

Write a main function that creates N threads, where N is a parameter of the application. The main function has to wait for the termination of the threads before exiting.

Implements a queue of integer protected by a pthread_mutex_lock (enqueue and dequeue functions). Modify your application in order to enqueue K nodes and then dequeue K nodes in each thread, where K is a parameter of the application. Runs the application with N = 10 and K = 1'000'000 and verify that the queue is empty at the end of the run.

Implements a lock-free queue of integer by using the algorithm presented during the lecture. Don't forget to identify the atomic operations and to use the adequate C constructs. Runs the application with N = 10 and K = 1'000'000 and verify that the lock-free queue is also empty at the end of the run.
For the moment, don't try to free the nodes when they are removed from the queue. You will see that freeing the nodes complicate the algorithm later.
In order to simplify debugging, you can add a function to print the state of the queue.

By using clock_gettime in CLOCK_MONOTONIC mode, identify which implementation (lock-free or protected by a pthread_mutex_lock) gives the best result.

Now, try to free a node in the lock-free queue implementation of dequeue. If your code is correct, the application should crash lamentably with a segmentation fault. Try to understand why by taking look at the ABA problem. When you think that you understand, check with your professor that your explanation is correct.

optional and difficult

In order to avoid the ABA problem, we have to ensure that a node A is not reallocated when a thread uses it. For that purpose, we will use hazard pointers (Maged Michael, "Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects", TPDS 2004).

At high-level, when a thread reads a pointer X from a location L and wants to perform a CAS to change the value of L from X to another value, it has to ensure that X cannot be reallocated between the read and the CAS. For that purpose, the thread registers X in a thread-local variable called hazard while it uses X, and the algorithm has to ensure that X cannot be freed if one of the hazard variables points to X.

With the queue, when a thread reads the pointer X in head in dequeue, it writes thus X in hazard. Then, it re-reads head and if head is not equal to X, it restarts from the beginning. Finally, if the compare-and-swap succeed, instead of freeing the node, the thread writes 0 in its hazard pointer and adds X in a lock-free stack called free_nodes.

Similarly, in enqueue, the thread has also to ensure that the tail is not freed during one of the operations that modify tail. For this reason, similarly, when the thread reads tail, it saves its value in its hazard pointer, re-read tail in order to ensure that it was not yet potentially freed, and writes 0 in its hazard location after the CAS.

Finally, in enqueue, in order to allocate a node, a thread has now first to try to allocate the node from free_nodes before allocating a new node with malloc. In details:

  • the thread pops (with a lock-free algorithm) a pointer Z from the free_nodes stack,
  • If Z is null (i.e., the stack is empty), the thread allocates a node with malloc,
  • otherwise
    • if the thread finds the Z pointer in one of the hazard variables of the other threads, it means that Z is still used by another thread. In this case, it has to re-push Z and to use malloc to allocate a new node,
    • otherwise, the node is free and not used anymore, the thread can thus directly reuse it to enqueue a new value without calling malloc.

As an additional question, why a node can not appear twice in the free_nodes stack?

The radix tree (optional and easy)

Try to implement a lock free version of a radix tree. You can find a description of a radix tree here. In this exercise, consider that you cannot remove elements from the radix tree.

The linked list (optional and difficult)

Restart the first exercice, but by considering a linked list, and without trying to free the nodes. If you still have time and was able to implement hazard pointers for the queue, you can try to use hazard pointers to handle the ABA problem for linked lists (this question is difficult).