Kreuzberg: A Powerful Local Document Text Extraction Python Library
2025-02-15
Kreuzberg is a powerful Python library for text extraction from various documents. It provides a unified asynchronous interface supporting PDFs, images, office documents, and more. The library emphasizes local processing, requiring no external APIs or cloud services, boasting high resource efficiency, minimal dependencies, and batch processing capabilities. Kreuzberg employs a smart approach to PDF text extraction, first attempting direct extraction and falling back to OCR if necessary. It offers comprehensive error handling and features such as async/sync APIs, metadata extraction, and concurrent processing.