Apache DataFusion: A Powerful and Extensible Query Engine in Rust

2025-01-16

Apache DataFusion is an extensible query engine written in Rust that uses Apache Arrow as its in-memory format. It offers SQL and DataFrame APIs, boasts excellent performance, and provides built-in support for CSV, Parquet, JSON, and Avro. DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. It's highly customizable, allowing additions of data sources, query languages, functions, custom operators, and more. Related subprojects include DataFusion Python (Python bindings), DataFusion Ray (distributed version), and DataFusion Comet (Apache Spark accelerator).

Development Query Engine