Derived Source: Slash Storage Costs Without Losing Data in Open...
Derived Source: Slash Storage Costs Without Losing Data in OpenSearch - Mohit Godwani & Tanik Pansuriya, Amazon Web Services In today’s data-intensive world, OpenSearch users struggle to balance storage costs with query performance. The _source field, essential for accessing original document data, often leads to significant storage overhead - especially challenging in time-series data scenarios. This talk explores our journey implementing the derived source feature, addressing mounting storage costs where full _source storage becomes unnecessary. We’ll examine various solutions considered and demonstrate why derived source emerged as the optimal choice. Learn how our framework reconstructs document sources from stored fields at query time, reducing storage while maintaining data accessibility. We’ll cover: * Real-world use cases and pain points * Technical alternatives explored * Architecture deep-dive * Performance optimization strategies * Storage cost impact metrics *Extensibility for various field types * Implementation best practices Perfect for time-series data, logs, or storage-sensitive applications, this session provides practical insights into optimizing OpenSearch storage efficiency.
