Multiprocessor Systems
Bus Based Uniform Memory Access
- Bus bandwidth becomes bottleneck with multiple CPUs.
- Have a per-processor cache to minimise access to memory.
Cache Consistency
- Usually handled by hardware.
- Writes to one cache invalidate all other caches.
- Also consumes bus bandwidth.
Operating Systems
Separate OS For Each CPU
Used in early multiprocessor systems.
- Simple to implement.
- Avoids CPU-based concurrency issues.
- Scales - no shared serial sections.
- Actually used in modern cloud virtualisation.
Problems:
- Each processor has own scheduling queue:
- Can have one processor overloaded while rest are idle.
- Each processor has own memory partition:
- Can have one processor thrashing, and others with free memory.
- No way to move free memory from one OS to another.
Symmetric Multiprocessor
Used in modern multiprocessor systems.
- OS kernel runs on all processors.
- Load and resource balanced between all processors.
- Including kernel execution.
- Load and resource balanced between all processors.
- Issue: Real concurrency in the kernel.
- Need to carefully apply synchronisation primitives.
Synchronisation
Test and Set
TSL does not work on multiprocessor systems without extra hardware support
- Hardware solution blocks all other CPUs from accessing the bus during the TSL instruction to prevent memory accesses by other CPU.
- Has mutually exclusive access to memory for duration of instruction.
Problem:
- TSL is a busy-wait synchronisation primitive (spinlock).
- Lock contention leads to spinning on the lock.
- Spinning on lock blocks the bus which slows all other CPUs down.
- Independent of whether other CPUs need lock or not.
- Causes bus contention.
- Spinning on lock blocks the bus which slows all other CPUs down.
Read Before Test and Set
- Spin reading cached copy of lock variable waiting for it to change.
- When it does, use TSL to acquire the lock and invalidate all cached copies.
- Greatly reduces bus traffic.
Spinning vs Blocking
Spinning (busy waiting) on a lock makes no sense on a uniprocess.
- No other running process to release the lock.
- Blocking and (eventually) switching to the lock holder is the only sensible option.
Decision is not as clear on SMP systems.
- Lock is held by another running CPU and will be freed without necessarily switching away from the requestor.
- Sometimes lock is held for less time than overhead of switching to and back, so spinning is more efficient.
Spinlock Implementation
- Critical sections synchronised via spinlocks are expected to be short to avoid other CPUs wasting cycles spinning.
- Spinlock holder must not be preempted.
- Other CPUs will spin until holder is scheduled again.
- Interrupts are hence disabled.