Blosc2 3.0: Breaking the Memory Wall, Computing with TB-Sized Datasets

2025-03-31
Blosc2 3.0: Breaking the Memory Wall, Computing with TB-Sized Datasets

Blosc2 3.0 introduces an integrated compute engine, enabling high-performance computation on datasets over 100 times larger than available RAM. By tightly integrating compression and computation and leveraging CPU caches to process compressed data chunks, it effectively addresses the memory wall problem. Experiments demonstrate Blosc2 maintains high performance with datasets up to 8TB, even outperforming in-memory performance with on-disk storage for very large datasets. This is a significant advancement for big data analytics and high-performance computing.

Development