Arroyo: A Blazing Fast JSON Decoder Built on Arrow
2025-03-26

Arroyo stream processing engine faces the core challenge of efficiently handling massive JSON data streams. This article details how Arroyo leverages Arrow's columnar in-memory format and a two-pass JSON decoding strategy to dramatically improve JSON deserialization speed. The first pass constructs a flattened "tape" data structure, while the second pass builds Arrow arrays concurrently based on the schema. This approach is up to 2.3x faster than Jackson-based deserializers in benchmarks. Furthermore, Arroyo extends support for raw JSON and bad data handling, enabling more flexible processing of real-world streaming data.
Development
JSON decoding