🎥 Recorded live at the MLOps World | GenAI Summit 2025 — Austin, TX (October 9, 2025) Session Title: Testing AI Agents: A Practical Framework for Reliability and Performance Speaker: Irena Grabovitch-Zuyev, Staff Applied Scientist, PagerDuty Talk Track: Agents in Production Abstract: As AI agents powered by large language models become core components of production systems, ensuring reliability, safety, and consistency has become one of the toughest challenges in applied AI. In this talk, Irena Grabovitch-Zuyev, Staff Applied Scientist at PagerDuty, presents a practical, end-to-end testing framework for AI agents built from real-world deployment experience. She dives into the fundamentals of iterative regression testing — how to design, execute, and refine tests that detect failures and performance drifts as agents evolve over time. Through a concrete case study, Irena shares the lessons learned from developing and deploying production-grade AI agents, including how her team implements unit tests for tools, adversarial testing for robustness, and ethical testing for bias and compliance. She also discusses the pipelines PagerDuty built to automate test execution, scoring, and benchmarking — enabling faster iteration and continuous improvement. What you’ll learn: • How to design and run regression tests for evolving AI agents • Techniques for testing correctness, robustness, and ethical alignment • How to automate testing pipelines for rapid iteration and benchmarking • Why conventional testing methods fail for agentic systems — and what replaces them • Real-world lessons from deploying reliable AI agents at scale.