AI • Oct 7, 2025

The Rise of Self-Aware Data Lakehouses

🎥 From the MLOps World | GenAI Summit 2025 — Virtual Session (October 7, 2025) Session Title: The Rise of Self-Aware Data Lakehouses Speaker: Srishti Bhargava, Software Engineer, Amazon Web Services Talk Track: Data Engineering in an LLM Era Abstract: If you’re managing dozens of data models and hundreds of tables, you’ve likely felt the pain—schema changes break production, impact analysis takes hours, and onboarding new engineers can take weeks. In this session, Srishti Bhargava demonstrates how to build an AI assistant that understands your data platform—not just another chatbot, but a system that analyzes schemas, parses dependencies, and predicts which models will break before they do. Using metadata extracted from Apache Iceberg tables, Srishti walks through how to: • Analyze SQL dependencies and automatically surface insights • Identify performance bottlenecks and zombie processes • Pinpoint storage overuse, customer PII exposure, and compaction opportunities • Build a natural language interface that allows you to query your data infrastructure in plain English This session delivers a practical blueprint for leveraging LLMs to make your data lakehouse self-aware, intelligent, and explainable. What you’ll learn: - Why metadata challenges are worsening as systems scale - How LLMs can interpret and reason over data architectures - How simple structural changes in tables can yield massive gains - Why manual analysis breaks at scale—and how LLMs fix it - How to extract and embed metadata from Iceberg tables into AI-powered systems - How to unlock business value hidden in schemas, dependencies, and usage patterns.