Measuring Agents With Interactive Evaluations
Agents explore, plan, and reliably execute across diverse, long-horizon tasks—challenges that static benchmarks can’t measure. Hear from Greg Kamradt, President of the ARC Prize Foundation, on how evaluating agentic performance requires interactive evaluations.
