Insane Compression: Shrinking 10GB of RATP Transit Data to 530KB with Rust
This weekend project started by browsing the open-data repository of Paris’ public transport network. The author noticed a section on data reuse, featuring external projects using this open data, particularly the RATP status website which visualizes historical disruptions. The GitHub repository contains JSON files queried every 2 minutes for almost a year, totaling over 10GB. The author wondered if this could be compressed better. This post details how they used Rust's interning design pattern to achieve a 2000x compression! Techniques explored include optimizing the interner structure, tuning the data schema, and leveraging interning in serialization. The result? A staggering reduction from 1.1GB of JSON files to a mere 530KB.