ts_zip: Text Compression with Large Language Models

2024-12-30

Fabrice Bellard's new tool, ts_zip, leverages the RWKV 169M v4 language model to achieve significantly higher text compression ratios than traditional methods. While requiring a GPU and 4GB RAM, and slower than conventional compressors (up to 1 MB/s on an RTX 4090), ts_zip demonstrates impressive compression on English and other languages, including source code. Compression ratio is measured in bits per byte (bpb), with results showing superior performance to xz on datasets like enwik8 and enwik9. Deterministic model evaluation ensures consistent decompression across different hardware and software configurations.

Read more
Development text compression

New Short Message Compression Tool: ts_sms

2024-12-30

ts_sms is a new tool for short message compression using large language models. It employs lossless compression, reducing bits by identifying and eliminating statistical redundancy without information loss. Compared to tools like brotli, ts_sms shows advantages in compressing small messages. Both Linux and Windows versions are available. Its core technology is similar to ts_zip, utilizing a specific padding system compatible with arithmetic coding, eliminating the need to explicitly encode message length.

Read more
Development