Linear Scan Register Allocation: Handling Lifetime Holes

2025-08-26
Linear Scan Register Allocation: Handling Lifetime Holes

This post details improvements to the linear scan register allocation algorithm to handle lifetime holes. The author explains how lifetime holes arise from reducing the control flow graph to a linear instruction sequence, creating discontinuities in virtual register lifetimes. The solution involves modifying the interval data structure to support multiple disjoint ranges, allowing the identification and exploitation of these holes. The linear scan algorithm is then adapted to consider these holes during register assignment, improving register utilization. This enhances the compiler's ability to leverage register resources, ultimately boosting code performance.

Read more
Development linear scan algorithm

Safely Using snprintf: Avoid Buffer Overflows

2025-08-19
Safely Using snprintf: Avoid Buffer Overflows

This article highlights a lesser-known feature of the `snprintf` function: its ability to determine the required buffer size before formatting, thus preventing buffer overflows. By calling `snprintf` twice – once with `NULL` and 0 to get the size, and again with a properly allocated buffer – the need for manual buffer size calculations is eliminated. The author also recommends a lightweight header-only library for easier usage.

Read more
Development buffer overflow

A 300-Line Python Compiler: Closure Conversion Explained

2025-08-11
A 300-Line Python Compiler: Closure Conversion Explained

While working through the Ghuloum tutorial, the author re-implemented a compiler originally written in C, achieving a concise 300-line Python version (including tests). This compiler performs closure conversion, handling variable binding, free variable tracking, and code object management. The post details the implementation, covering `lambda` and `let` expressions, function calls, and providing test cases and assembly code examples. The result is a surprisingly compact compiler capable of handling closures and indirect function calls, showcasing elegant solutions to complex problems.

Read more
Development closure conversion

Compiler IR Design: Local Decisions and Optimization

2025-06-17
Compiler IR Design: Local Decisions and Optimization

This post explores compiler intermediate representation (IR) design, focusing on making decisions using only local information. The author compares control-flow graphs (CFGs), register-based IRs, and Static Single Assignment (SSA) form, introducing more advanced designs like Static Single Information (SSI) and Sea of Nodes (SoN). SSA simplifies analysis by assigning each variable only once, while SSI allows adding finer-grained information to the same variable across different program branches. SoN represents all instructions as graph nodes, explicitly representing data and control dependencies for more flexible optimization. These designs aim to make compiler optimizers more efficient, ultimately generating more optimized code.

Read more

Building a Blog Search Engine from Scratch with Word2Vec

2025-05-20
Building a Blog Search Engine from Scratch with Word2Vec

The authors built a blog search engine from scratch using Python and Word2Vec embeddings. Posts and search queries are embedded into a 300-dimensional vector space, and cosine similarity is used to rank results. To make it web-friendly, the Word2Vec model is split into an index and vectors, with HTTP Range requests used to download only necessary data, reducing web load significantly. An evaluation metric is designed to assess the search engine's accuracy, and future improvements, such as using TF-IDF to reduce noise, are discussed.

Read more
Development

Game-Changing Papers & Blog Posts on Programming Languages

2025-05-14
Game-Changing Papers & Blog Posts on Programming Languages

This blog post lists several papers and blog posts that profoundly impacted the author's understanding of programming languages and compilers. Topics covered include garbage collection, code optimization, register allocation, regular expression engines, machine learning, SSA form, and compiler design. The author highlights the insightful approaches presented, such as using Z3 as a proof engine, leveraging fuzzing for bug detection, and efficient expression parsing techniques. The collection showcases the author's deep dive into the intricacies of programming language design and implementation.

Read more
Development

Manually Building a Nix Derivation: A Deep Dive into Hash Generation

2025-04-09
Manually Building a Nix Derivation: A Deep Dive into Hash Generation

This blog post details the author's journey in manually building a simple Nix derivation. By dissecting Farid's blog post step-by-step, the author delves into the inner workings of Nix derivations, specifically the hash generation process. The journey involved overcoming challenges such as understanding ATerm representation, SHA256 hashing, and Nix's unique base32 encoding. Ultimately, the author successfully generated the same hash value as in Farid's blog post and successfully built a simple "hello world" derivation.

Read more
Development Hash Generation

100x Speedup: Garbage Collection and GPUs in Python

2025-03-25
100x Speedup: Garbage Collection and GPUs in Python

This post details how the author achieved a 100x speedup of a Python program through simple code optimizations. The initial program used NumPy for parallel computation but was slow and memory-intensive due to poor memory management. By implementing a simple garbage collection mechanism to release unused intermediate variables, the author reduced runtime from 40 seconds to 10 seconds, significantly decreasing memory usage. Subsequently, using CuPy to offload computation to the GPU further reduced runtime to 1.5 seconds, demonstrating a dramatic performance improvement.

Read more
Development Python Optimization

Cinder JIT: Efficient Type Representation Using Bitsets and Semilattices

2025-03-11
Cinder JIT:  Efficient Type Representation Using Bitsets and Semilattices

The Cinder JIT compiler employs a clever type representation, treating types as sets (even lattices) and choosing a compact bitset representation. This article delves into how Cinder leverages bitsets and semilattice structures for efficient type information handling, covering basic type representation, type unions, and specialization. By encoding type information into bitsets, Cinder effectively represents type unions and allows for finer-grained type distinctions. Furthermore, Cinder introduces a specialization mechanism to track the specific value of individual objects, further improving compiler optimization efficiency. The article also discusses the Bottom type and details on generating the type lattice.

Read more
Development bitsets

A Deep Dive into Static Single Assignment (SSA) Compiler Optimizations

2025-02-11
A Deep Dive into Static Single Assignment (SSA) Compiler Optimizations

This article chronicles the decades-long evolution of Static Single Assignment (SSA) compiler optimization techniques. From the early papers on code motion and global value numbering, through Cytron's seminal work on minimizing phi instructions, to Brandis and Mössenböck's single-pass generation approach, and Click and Paleczny's Sea of Nodes IR, the article traces several key papers and discusses their strengths and weaknesses. It also touches upon Appel's work on the relationship between functional programming and SSA, Aycock and Horspool's iterative phi node removal, and more recent approaches based on abstract interpretation. The article concludes with a list of further papers and resources, providing a more comprehensive perspective for readers interested in learning more about SSA.

Read more

Deep Dive into CPS: A Journey into Functional Programming Compilation

2024-12-25
Deep Dive into CPS: A Journey into Functional Programming Compilation

This article delves into Continuation-Passing Style (CPS) and its application in compiling functional programming languages. The author builds a CPS transformer step-by-step for a simple Scheme-like language, explaining optimization strategies and code generation methods. The article details the transformation of integers, variables, function calls, arithmetic operators, lambda expressions, and if expressions into CPS form. It also discusses meta-continuations and optimization techniques such as constant folding and beta reduction. Finally, it outlines several approaches to generating executable code from CPS, including generating C code, using trampolines, and employing a single large switch statement.

Read more