Near 100% GPU Utilization for Embedding Millions of Documents with Daft
2025-08-17

The Daft team achieved near-100% GPU utilization while embedding millions of text documents using the Qwen3-Embedding-0.6B model. This blog post details a three-step data pipeline: text chunking, embedding generation, and distributed processing, providing code examples. They subsequently improved performance by 3x without relying on maximum GPU utilization.
Read more
Development
large-scale text processing