AI Jun 5, 2025

Resilient Segment Replication With Adaptive Refresh

Resilient Segment Replication With Adaptive Refresh - Vigya Sharma, Amazon Web Services Segment Replication is a powerful index replication strategy for high QPS systems that enables decoupling of indexing and search workloads, rapid replica failovers, and seamless point-in-time restores. OpenSearch introduced segment replication in 2023 and saw up to 78% improvement in indexing throughput for highly replicated clusters. It’s also our chosen replication strategy for Amazon’s Lucene based e-commerce search engine. However, decoupled indexing and search is not without challenges. Large replication checkpoints, caused by indexing bursts or network issues, can overwhelm searchers with high page faults, thrashing, and degraded search performance. To address this, we created an ‘Adaptive Refresh’ strategy for Lucene searchers. Instead of refreshing on the entire checkpoint in one fell swoop, searchers intelligently process through ‘safe to refresh commit points, absorbing the large checkpoint without excessive page faults. In this talk, we’ll present how adaptive refresh works. You’ll learn how to leverage these changes to make your OpenSearch clusters resilient to indexing bursts and network glitches.