ts_zip: Text Compression with Large Language Models
Fabrice Bellard's new tool, ts_zip, leverages the RWKV 169M v4 language model to achieve significantly higher text compression ratios than traditional methods. While requiring a GPU and 4GB RAM, and slower than conventional compressors (up to 1 MB/s on an RTX 4090), ts_zip demonstrates impressive compression on English and other languages, including source code. Compression ratio is measured in bits per byte (bpb), with results showing superior performance to xz on datasets like enwik8 and enwik9. Deterministic model evaluation ensures consistent decompression across different hardware and software configurations.
Read more