494x Faster Word Counting with SIMD and Threads

2025-08-17

This article details the author's journey in optimizing a word counting program, achieving a remarkable 494x speedup. Starting with a naive Python implementation (89.6 seconds), the author progressively improved performance using CPython's `re` module (13.7 seconds), a C scalar loop (1.205 seconds), and finally, SIMD instructions and multithreading (181 milliseconds). Each optimization step is explained, covering leveraging C extensions, efficient C loops, and multi-core CPU utilization. While multithreading yielded less than expected gains, the final version reached an impressive 5.52 GiB/s processing speed. The author invites readers to suggest further optimizations.

Read more
Development

Dumac: Significant Performance Improvements in a macOS Disk Usage Tool

2025-08-11

Dumac, a macOS disk usage utility faster than `du -sh`, leverages the macOS-specific `getattrlistbulk` syscall. After incorporating community feedback, the author boosted performance by ~33% by switching from Tokio to Rayon for improved parallelism and optimizing inode hash-set sharding. Addressing inode contention through a refined sharding strategy yielded further speed gains.

Read more
Development

Rapid Game Prototyping with LÖVE

2024-12-31

A programmer, aiming to complete a full game in 2025, built chess and card game prototypes using the LÖVE2D framework in Lua. LÖVE's simple yet powerful API allowed for complex UI interactions with minimal code, further accelerated by LLM-assisted code generation. The author found LÖVE ideal for prototyping, especially UI, but noted the need for improvements in hot reloading and logic separation for larger projects. The plan is to use LÖVE to develop a basic game MVP.

Read more