Execution Units are Often Pipelined

2024-12-30

This blog post explores the pipelining of execution units in out-of-order microarchitectures. The author initially assumed execution units remain occupied until µop completion, but using the Firestorm microarchitecture (A14 and M1) as an example, demonstrates that two integer execution units can handle multiple multiplications concurrently, each taking three cycles. By comparing dependent and independent instruction sequences, the author reveals that many execution unit/µop combinations are heavily pipelined, allowing a µop to be issued while the unit processes others. This reduces execution time for independent instructions from a predicted 6 cycles to 4. Finally, the author explains why instruction latency and bandwidth tables specify reciprocal throughput – it's equivalent to cycles/instruction.