xorq: Simplifying Multi-Engine ML Pipelines

2025-03-27
xorq: Simplifying Multi-Engine ML Pipelines

xorq is a deferred computation framework bringing the reproducibility and performance of declarative pipelines to the Python ML ecosystem. It lets you write pandas-style transformations that never run out of memory, automatically caches intermediate results, and seamlessly moves between SQL engines and Python UDFs—all while maintaining reproducibility. Built on Ibis and DataFusion, xorq features declarative expressions, multi-engine support, built-in caching, serializable pipelines, portable UDFs, and an Arrow-native architecture. It offers both an interactive library and a CLI for a smooth transition from exploratory research to production-ready artifacts.

Development