AI Oct 7, 2025

A Practical Field Guide to Optimizing Cost, Speed & Accuracy of LLMs

🎥 From the MLOps World | GenAI Summit 2025 — Virtual Session (October 7, 2025) Session Title: A Practical Field Guide to Optimizing the Cost, Speed, and Accuracy of LLMs for Domain-Specific Agents Speaker: Niels Bantilan, Chief ML Engineer, Union.ai Talk Track: Agents in Production Abstract: As the initial boom of LLM applications settles, many teams are realizing the challenges of scaling — latency spikes, high compute costs, and context limitations. In this hands-on session, Niels Bantilan provides a practical roadmap for bridging the gap between experiments and production. He demonstrates how Small Language Models (SLMs) can replace or augment LLMs in domain-specific applications — trading broad generalization for speed, cost-efficiency, and precision. Using an example of an agent that translates natural language into SQL queries, Niels outlines when and how to deploy SLMs in production, how to progressively replace LLM calls, and which AI-native orchestration strategies help maintain quality while reducing cost. Key topics include: • Identifying leverage points to swap out LLMs for SLMs without performance loss • Speed optimization techniques: parallelization, intelligent caching, and task fanout • Cost management strategies: resource-aware orchestration and zero-scaling patterns • Accuracy improvements: using “AI unit tests,” synthetic datasets, and LLM judges to detect regressions early What you’ll learn: • Actionable strategies for cost-effective AI deployment • A clear decision framework for SLM adoption in production • Orchestration patterns that enhance performance and reduce overhead • How smaller models can deliver greater value in domain-specific AI systems.