AI Oct 7, 2025

Building and Evaluating Agents

🎥 Recorded live at the MLOps World | GenAI Summit 2025 — Austin, TX (October 9, 2025). Session Title: Building and Evaluating Agents Speaker: Anish Shah, AI Engineer, Weights & Biases Talk Track: Agents in Production Abstract: As large language models mature from single-prompt systems into agentic architectures, the real challenge lies in designing, evaluating, and deploying them reliably. In this session, Anish Shah, AI Engineer at Weights & Biases, walks through the evolution of LLMs into agents capable of solving real-world business problems. He breaks down the core design principles that make agents successful — including reflection, tool use, planning, and collaboration — and demonstrates how they translate into scalable, production-ready systems. The talk also explores the often-overlooked side of the equation: evaluation. Attendees will learn practical methods for measuring and improving agent performance, from automated judges and process-level metrics to continuous monitoring and iterative improvement pipelines. What you’ll learn: • How LLMs evolve into fully capable agentic systems • Core principles behind successful agent architectures • Evaluation methods for reliability, efficiency, and user trust • Practical tools and workflows for continuous monitoring • How to systematically improve agent performance in production.