François Trahay
1965 - 2005
⟹ Increased processor performance
Since 2005
Source: https://github.com/karlrupp/microprocessor-trend-data
At each stage, several circuits are used
→ One instruction is executed at each cycle
⟹ several instructions executed simultaneously!
Limitations of the superscalar:
There should be no dependency between statements executed simultaneously.
Example of non-parallelizable instructions
cmp a, 7 ; a > 7 ?
ble L1
mov c, b ; b = c
br L2
L1: mov d, b ; b = d
L2: ...
⟹ waste of time
0x12 loop:
...
0x50 inc eax
0x54 cmpl eax, 10000
0x5A jl loop
0x5C end_loop:
...
Example: image processing, scientific computing
Using vector instructions (MMX, SSE, AVX, …)
Instructions specific to a processor type
Process the same operation on multiple data at once
Problem with superscalar / vector processors:
Simultaneous Multi-Threading (SMT, or Hyperthreading)
Limited scalability of SMT
dispatcher is shared
FPU is shared
→ Duplicate all the circuits
→ Non-Uniform Memory Architecture
→ Mainly used for small caches (ex: TLB)
→ Direct access to the cache line
Warning: risk of collision
example:
0x12345
67
8
and
0xbff72
67
8
→ K-way associative cache (in French: Cache associatif K-voies)
What if 2 threads access the same cache line?
Concurrent read: replication in local caches
Concurrent write: need to invalidate data in other caches
Cache snooping: the cache sends a message that invalidates the others caches
[bryant] Bryant, Randal E., and David Richard O’Hallaron. “Computer systems: a programmer’s perspective”. Prentice Hall, 2011.
[patterson2013] Patterson, David A and Hennessy, John L. “Computer organization and design: the hardware/software interface”. Newnes, 2013.
[patterson2011] Patterson, David A. “Computer architecture: a quantitative approach”. Elsevier, 2011.