Apache Hudi: Upserts, Deletes, and Incremental Processing for Big Data
2025-01-23
Apache Hudi is an open-source data lakehouse platform built on a high-performance open table format for ingesting, indexing, storing, serving, transforming, and managing data across multiple cloud data environments. It supports various data formats and sources, offers atomic commits with rollback/restore, and boasts fast upsert/delete capabilities. A scalable indexing subsystem accelerates queries, while support for snapshot, incremental, and time-travel queries provides flexibility. Integration with metadata stores like Apache Hive Metastore is also included.