AI Jun 5, 2025

Flexible OpenSearch Data Management With Apache Iceberg: Data...

Flexible OpenSearch Data Management With Apache Iceberg: Data Versioning and Incremental Processing - Sotaro Hikita & Shuhei Fukami, Amazon Web Services Managing data for OpenSearch workloads sometimes requires organizations to rebuild indices, conduct testing with different data versions, and handle incremental data updates - tasks that demand sophisticated data management strategies. This session introduces an architectural pattern that uses Apache Iceberg as the source of truth between source systems and OpenSearch. Iceberg’s rich data management features are valuable for OpenSearch, enabling sophisticated patterns such as point-in-time data loading, efficient version management for A/B testing, and seamless schema evolution that aligns naturally with OpenSearch’s dynamic data structures. Through PyIceberg and Spark with OpenSearch Hadoop, we can seamlessly integrate Iceberg tables with OpenSearch at scale. We will demonstrate how the combination of Iceberg and OpenSearch creates a robust foundation for search data management. Through practical examples and architectural patterns, we will explore building resilient and maintainable search solutions that can evolve alongside your organization’s search requirements.