Common Questions and Answers on AI Evaluation: Lessons from 700+ Engineers and PMs

2025-07-03
Common Questions and Answers on AI Evaluation: Lessons from 700+ Engineers and PMs

This post summarizes frequently asked questions encountered while teaching 700+ engineers and product managers an AI evaluation course. Topics covered include whether RAG is dead, model selection, annotation tools, evaluation methodologies, synthetic data generation, and gaps in existing evaluation tooling. The authors stress the importance of error analysis, advocating for binary evaluations over Likert scales, and sharing best practices for building custom annotation tools, choosing appropriate chunk sizes, and evaluating RAG systems. The post also discusses the differences between guardrails and evaluators, minimum viable evaluation setup, evaluating agentic workflows, and the different uses of evaluations in CI/CD versus production monitoring.

Development Error Analysis