BloomSearch: A Keyword Search Engine for Massive Datasets using Hierarchical Bloom Filters

2025-07-16
BloomSearch: A Keyword Search Engine for Massive Datasets using Hierarchical Bloom Filters

BloomSearch is a high-performance keyword search engine designed for massive datasets. Leveraging hierarchical Bloom filters and pluggable storage interfaces, it achieves extremely low memory usage and fast cold-start searches. It supports field searches, token searches, and field:token combinations with AND/OR operators. BloomSearch also features partitioning, min-max indexes, and TTL for enhanced query performance and data management. Its scalable architecture handles unbounded ingest and query throughput, making it ideal for logs, JSON documents, and high-cardinality keyword search.