Voyage-code-3: More Accurate Code Retrieval with Lower Costs

2025-01-14
Voyage-code-3: More Accurate Code Retrieval with Lower Costs

Voyage AI unveiled Voyage-code-3, a next-generation code retrieval embedding model surpassing OpenAI-v3-large and CodeSage-large by an average of 13.80% and 16.81% across 32 datasets. Leveraging Matryoshka learning and quantization (int8 and binary), Voyage-code-3 dramatically reduces storage and search costs with minimal impact on retrieval quality. Supporting 2048, 1024, 512, and 256-dimensional embeddings and various quantization formats, it boasts a 32K token context length. Trained on a massive, diverse code corpus, Voyage-code-3 excels in code retrieval, particularly handling algorithmic reasoning and nuanced syntax, and has been rigorously evaluated for robustness and accuracy.