It's Time to Stop Building KV Databases

2025-03-25
It's Time to Stop Building KV Databases

The author argues that Key-Value databases are overly simplistic and lack expressive power, making them painful to use. While popular among storage engine vendors, KV databases are merely building blocks for reasonable data models, forcing users to build these models from scratch, often with suboptimal results. The author proposes a middle ground: an embedded database with typed records, separating logical and physical schemas but writing queries against the physical schema. This avoids complex query planners, supports asynchronous schema changes and layout switching. This approach balances data independence with the simplicity needed for embedded systems, offering a compelling alternative to both simple KV stores and the complexities of full-blown relational databases.

Read more
Development

Verification-First Development: Beyond Test-Driven Development

2025-03-18
Verification-First Development: Beyond Test-Driven Development

This article explores Verification-First Development (VFD), a paradigm that emphasizes establishing verification mechanisms before writing code. This could involve writing tests, defining type invariants, adding contracts, or other methods. VFD differs from Test-Driven Development (TDD), which is a specific case of VFD and focuses on using tests to drive code design. VFD's advantages include reducing the likelihood of skipping verification, early error detection, and improved code quality. However, VFD also has drawbacks: it can slow development, hinder exploratory coding, and verification methods might influence code design. The author argues that VFD, as a technique rather than a paradigm, is more flexible and easily integrates with other approaches.

Read more

Five Types of Nondeterminism: Practical Insights from Formal Methods

2025-02-20
Five Types of Nondeterminism: Practical Insights from Formal Methods

This article explores five types of nondeterminism in system modeling: true randomness, concurrency, user input, external forces, and abstraction. The author explains each type clearly with practical examples. True randomness, while often simulated with pseudorandom number generators, is usually treated as nondeterministic choice in modeling. Concurrency is a major source of nondeterminism, requiring special handling due to state space explosion. User input and external forces are treated as nondeterministic external influences. Critically, abstraction simplifies complex deterministic processes into nondeterministic choices, simplifying models and increasing sensitivity to potential errors. This provides valuable insights into understanding nondeterminism and its applications in software development.

Read more

Efficiency vs. Horizontal Scalability: A Necessary Trade-off?

2025-02-12
Efficiency vs. Horizontal Scalability: A Necessary Trade-off?

This article explores the tension between software efficiency and horizontal scalability. The author argues that software optimized for scalability often performs poorly in single-machine environments, and vice versa. This stems from Amdahl's Law, coordination overhead, and limitations on shared resources. Efficient algorithms often rely on assumptions about the system and problem that may no longer hold true when scaling horizontally. The author also discusses cultural factors and task types influencing choices, illustrating with examples like the Tigerbeetle database and CPython's GIL. Ultimately, a deep understanding of the problem and environment is key to achieving both high efficiency and scalability.

Read more
Development

The Curious History of Regex Anchors: Why `$` and `^`?

2025-01-21
The Curious History of Regex Anchors: Why `$` and `^`?

This post delves into the historical origins of using `$` and `^` as line anchors in regular expressions. Tracing back to the QED text editor, `$` initially represented the end of the buffer, later adapted by Ken Thompson to signify the end of a line in regexes. The choice of `^` likely stemmed from the limited character set of the Teletype Model 35 typewriter, with `^` already present in ASCII-67. This wasn't a brilliant design choice but rather a consequence of hardware and character set limitations of that era, becoming a convention in regexes.

Read more
Development regular expressions

Mathematical Modeling Reveals Just How Bad the Dreidel Game Is

2024-12-18
Mathematical Modeling Reveals Just How Bad the Dreidel Game Is

Last year, the author used the PRISM probabilistic modeling language to model the traditional holiday game Dreidel, proving its lack of fun. This year, he refined the model to simulate the entire game until its conclusion. The new model corrects the previous flaw of only simulating the elimination of the first player and improves the calculation logic for betting and player elimination. Through model simulation, the author found that, on average, a four-player game takes 760 spins to end, and the longest can even exceed 6 hours. This fully proves that the Dreidel game is long, tedious, and frustrating.

Read more