Beyond RAG: LLM Tool Calling Ushers in a New Era for Semantic Search

This article explores methods for implementing semantic search, particularly using LLMs for vector embedding search. While directly embedding user search terms and documents sometimes yields suboptimal results, new techniques like Nomic Embed Text v2 improve embedding methods, bringing questions and answers closer together in vector space. Furthermore, LLMs can synthesize potential answers, then use those embeddings to search for relevant documents. The article also introduces LLM-based Retrieval-Augmented Generation (RAG) systems, emphasizing that RAG doesn't rely on vector embeddings and can be combined with keyword search or hybrid search systems. The author argues that despite the emergence of long-context models, RAG won't disappear because the amount of data will always exceed model context capacity. The author favors the LLM tool-calling approach, exemplified by o3 and o4-mini, believing it's more effective than traditional RAG (single retrieval followed by direct answering).