int fetch_and_add(int* obj,int value){int old =*obj;*obj = old+value;return old;}
Memory Fence (Barrière
mémoire)
C atomic_thread_fence( memory_order order );
performs a memory synchronization
ensures that all past memory operations are visible
by all threads according to the memory model chosen (see C11 memory
model)
Synchronization primitives
Properties to consider when choosing a synchronization
primitive
Reactivity: time spent between the release of a
lock and the unblocking of a thread waiting for this lock
Contention: memory traffic generated by threads
waiting for a lock
Equity and risk of famine: if several
threads are waiting for a lock, do they all have the same probability of
acquire it? Are some threads likely to wait indefinitely?
Busy-waiting synchronization
int pthread_spin_lock(pthread_spinlock_t *lock);
tests the value of the lock until it becomes free, then acquires the
lock
int pthread_spin_unlock(pthread_spinlock_t *lock);
Benefits
Simple to implement (with test_and_set)
Reactivity
Disadvantages
Consumes CPU while waiting
Consumes memory bandwidth while waiting
Futex
Fast Userspace Mutex
System call allowing to build synchronization mechanisms in
userland
Allows waiting without monopolizing the CPU
A futex is made up of:
a value
a waiting list
Available operations (among others)
WAIT(int *addr, int value)
while(*addr == value) { sleep(); }
add the current thread to the waiting list
WAKE(int *addr, int value, int num)
*addr = value
wake up num threads waiting on addr
Implementing a mutex using a
futex
mutex: an integer with two possible values: 1
(unlocked), or 0 (locked)
mutex_lock(m):
Test and unset the mutex
if mutex is 0, call FUTEX_WAIT
mutex_unlock(m):
Test and set the mutex
call FUTEX_WAKE to wake up a thread from the waiting
list
Situation such that at least two processes are each waiting for a
non-shareable resource already allocated to the other
Necessary and sufficient conditions (Coffman, 1971 (Coffman, Elphick, and
Shoshani 1971))
Resources accessed under mutual exclusion (non-shareable
resources)
Waiting processes (processes keep resources that are acquired)
Non-preemption of resources
Circular chain of blocked processes
Strategies:
Prevention: acquisition of mutexes in the same order
Deadlock detection and resolution (eg. with
pthread_mutex_timedlock)
Lock granularity
Coarse grain locking
A lock protects a large portion of the program
Advantage: easy to implement
Disadvantage: reduces parallelism
Fine grain locking
Each lock protects a small portion of the program
Advantage: possibility of using various resources in
parallel
Disadvantages:
Complex to implement without bug (eg. deadlocks, memory
corruption)
Overhead (locking comes at a cost)
Scalability of a parallel
system
Scalability = ability to reduce execution time when adding
processing units
Sequential parts of a program reduce the scalability of a program
(Amdhal’s law (Amdahl
1967))
In a parallel program, waiting for a lock introduced sequentiality
→ Locks can interfere with
scalability
Bibliography
Amdahl, Gene M. 1967. “Validity of the Single Processor Approach
to Achieving Large Scale Computing Capabilities.” In
Proceedings of the April 18-20, 1967, Spring Joint Computer
Conference, 483–85. ACM.
Coffman, Edward G, Melanie Elphick, and Arie Shoshani. 1971.
“System Deadlocks.”ACM Computing Surveys (CSUR) 3
(2): 67–78.