Analyzing the Entire Hacker News Dataset with DuckDB

2025-04-30
Analyzing the Entire Hacker News Dataset with DuckDB

The author downloaded the complete Hacker News dataset—a 20GB JSON file containing everything ever posted on the site—and analyzed it using the DuckDB database. The post details the download process and uses SQL queries to calculate 12-week moving averages of the frequency of mentions for specific programming languages (Python, JavaScript, Java, Ruby, Rust). The author praises DuckDB's ease of use and speed, humorously suggesting future possibilities like training LLMs on the dataset.

Development