Writing CPU-Friendly Code: A Guide to Hardware-Aware Programming

2025-03-23
Writing CPU-Friendly Code: A Guide to Hardware-Aware Programming

This article uses the analogy of a drive-through restaurant to explain three crucial CPU architecture concepts: instruction pipelining, memory caching, and speculative execution. The author argues that understanding these mechanisms and writing code that works with them (hardware-aware programming) can dramatically improve software performance. The article delves into code optimization techniques, such as loop unrolling to leverage superscalar execution, and optimizing data structure layout and access patterns to make full use of caching, to boost efficiency. Ultimately, the author emphasizes that writing efficient code boils down to writing clean, maintainable code first, then profiling to identify performance bottlenecks, and finally applying hardware-aware programming principles to target those bottlenecks.

Read more

Unix Spell: The 64kB RAM Miracle

2025-01-19
Unix Spell: The 64kB RAM Miracle

In the 1970s, the Unix spell checker faced an incredible challenge: fitting a 250kB dictionary into a mere 64kB of RAM on a PDP-11. Douglas McIlroy's ingenious solution involved a multi-stage approach. Initially, a Bloom filter provided fast lookups, but as the dictionary grew, he developed a novel hash compression scheme. By recognizing that differences between sorted hash codes followed a geometric distribution, and employing Golomb coding, he achieved near-theoretical compression limits. Finally, partitioning the compressed data further improved lookup speed. This story is a masterclass in constrained optimization, showing how clever algorithms can overcome seemingly impossible limitations.

Read more
Development compression

Linux Context Switching Internals: Process State and Memory

2025-01-02
Linux Context Switching Internals: Process State and Memory

This article delves into the Linux kernel's representation of processes and their states, focusing on the key data structures: task_struct and mm_struct. task_struct manages the execution state, including process state, CPU time tracking, and scheduling information. mm_struct handles memory state, encompassing page tables, memory segment boundaries, and architecture-specific details. The article thoroughly explains the fields within these structures and their roles in context switching, offering a deep understanding of the Linux kernel's inner workings.

Read more

The Impossibility Theorem of Clustering: Why Perfect Algorithms Don't Exist

2024-12-26
The Impossibility Theorem of Clustering: Why Perfect Algorithms Don't Exist

This article explores the 'impossible triangle' problem in clustering algorithms. Drawing a parallel to the CAP theorem, the author argues that every clustering algorithm must sacrifice one of three desirable properties: scale invariance, richness, and consistency. The article defines each property and illustrates how algorithms like k-means compromise on one to achieve the others. The conclusion emphasizes that developers should choose algorithms based on the specific needs of their application, accepting that a perfect clustering algorithm is mathematically impossible.

Read more