Confs Space
Frontend AI Backend DevOps Mobile Security UX
Frontend AI Backend DevOps Mobile Security UX
AI • Oct 6, 2025

Measuring Agents With Interactive Evaluations

OpenAI DevDay 2025
OpenAI DevDay 2025 Conference Collection

Agents explore, plan, and reliably execute across diverse, long-horizon tasks—challenges that static benchmarks can’t measure. Hear from Greg Kamradt, President of the ARC Prize Foundation, on how evaluating agentic performance requires interactive evaluations.

#Agents

Up Next

Context Engineering & Coding Agents with Cursor

Context Engineering & Coding Agents with Cursor

OpenAI DevDay 2025

Building Agentic Workforces at Marriott International

Building Agentic Workforces at Marriott International

MLOps World | GenAI Summit 2025

No GPU Left Behind: Scaling Online LLM Training With...

No GPU Left Behind: Scaling Online LLM Training With...

PyTorch Conference 2025

Your Infrastructure Just Got Smarter: AI Agents in the DevOps Loop

Your Infrastructure Just Got Smarter: AI Agents in the DevOps Loop

MLOps World | GenAI Summit 2025

Unconference: Self-optimizing deep research agents

Unconference: Self-optimizing deep research agents

OpenSearchCon North America 2025

Vllm-triton-backend: How To Get State-of-the-art Performance on...

Vllm-triton-backend: How To Get State-of-the-art Performance on...

PyTorch Conference 2025

Confs Space

One-stop destination for tech conference talks

Frontend AI Backend DevOps Mobile Security UX

Confs.Space 2026 © All rights reserved.

About Disclaimer