AI • Oct 7, 2025

Query Inside the File: AI Engineering for Audio, Video & Sensor Data

🎥 Recorded live at the MLOps World | GenAI Summit 2025 — Austin, TX (October 8, 2025) Session Title: Query Inside the File: AI Engineering for Audio, Video, and Sensor Data Speaker: Dmitry Petrov, Co-Founder & CEO, DataChain Talk Track: Multimodal Systems in Production Abstract: AI models rarely need entire videos, recordings, or sensor files—they need the right slice. In this session, Dmitry Petrov (Co-Founder & CEO, DataChain) explores how to unlock the full potential of unstructured data by querying inside files. You’ll see real-world use cases where audio, video, and sensor data are transformed into structured, queryable assets directly from S3—powering segmentation, object detection, event filtering, and precise, context-aware LLM prompts. Dmitry demonstrates how DataChain enables developers to manipulate complex data types like bounding boxes, video frames, and time-based slices using Pydantic data models, dramatically improving inference speed, cost, and accuracy. Instead of processing gigabytes of data, you’ll learn how to ask targeted questions like: “What’s happening in this 12-second clip where two people enter the car?” This talk redefines how we think about multimodal data engineering—turning massive media files into searchable, analyzable, and production-ready assets. What you’ll learn: • How to query specific clips or segments inside large audio, video, or sensor files • How to turn raw media into structured, queryable data for LLM pipelines • How this approach improves inference cost, speed, and precision • The power of Pydantic data models for representing multimodal assets • How AI teams can build scalable pipelines for real-world multimodal workflows.