Arroyo: A Blazing Fast JSON Decoder Built on Arrow

2025-03-26
Arroyo: A Blazing Fast JSON Decoder Built on Arrow

Arroyo stream processing engine faces the core challenge of efficiently handling massive JSON data streams. This article details how Arroyo leverages Arrow's columnar in-memory format and a two-pass JSON decoding strategy to dramatically improve JSON deserialization speed. The first pass constructs a flattened "tape" data structure, while the second pass builds Arrow arrays concurrently based on the schema. This approach is up to 2.3x faster than Jackson-based deserializers in benchmarks. Furthermore, Arroyo extends support for raw JSON and bad data handling, enabling more flexible processing of real-world streaming data.

Development JSON decoding