AI Agent Evals: From Testing to Trust
🎥 Recorded live at the MLOps World | GenAI Summit 2025 — Austin, TX (October 8, 2025) Session Title: AI Agent Evals: From Testing to Trust Speaker: Vaibhavi Gangwar, CEO & Co-Founder, Maxim AI Talk Track: LLM Observability Abstract: What separates AI systems that work in demos from those that work in production? It all comes down to evaluation. In this talk, Vaibhavi Gangwar, CEO & Co-Founder of Maxim AI, reveals how high-performing AI teams are rethinking evaluation workflows to build reliable, trustworthy, and high-quality products. You’ll learn how leading teams integrate testing and evaluation into every stage of the product lifecycle — from pre-launch experimentation to post-deployment monitoring — and how they close the loop between data, decisions, and user trust. Vaibhavi also shares real lessons from startups and enterprises, breaking down what’s actually working (and what isn’t) when scaling LLM evaluation pipelines in production environments. What you’ll learn: • Why evaluation is the foundation of trust in AI agents • How to integrate testing and monitoring into your development cycle • How to optimize context using strategies like write, select, compress, and isolate • What real-world teams are doing to improve quality and reliability • How to design LLM evaluation workflows that scale across environments.
