Skip to content

[RW Separation] [Feature Request] Scale to Zero (Indexing Shards) with Reader/Writer Separation. #16720

@prudhvigodithi

Description

@prudhvigodithi

Is your feature request related to a problem? Please describe

Coming from the META issue #15306 achieve Scale to Zero with Reader/Writer Separation. With scale to zero we should be able to scale down the primary and replicas and keep only the search replicas for search traffic and ability to bring back the primary and regular replicas for write (index) traffic.

Describe the solution you'd like

graph TD
    A[OpenSearch Cluster Healthy] --> B[Index Created]

    subgraph "Shard Allocation (Healthy)"
        B --> P0[Primary Shard 0 - Node 1]
        B --> R0[Replica Shard 0 - Node 2]
        B --> S0[Search Replica 0 - Node 3]
    end

    subgraph "Traffic Distribution"
        C[Write Traffic] --> P0
        D[Search Traffic] --> S0
    end

Loading
graph LR
    A[Enable search_only Mode] --> B[Sync Data to Remote Store] --> C[Remove Primary & Replica Shards]

    S0[Search Replica 0 - Node 3]

    subgraph "🟡 Search Traffic"
        T[Search Traffic Routed to Search Replicas] --> S0
    end


Loading
graph TD
    A[Disable search_only Mode] --> B[Restore Primary & Replica Shards]

    subgraph "Shard Allocation (search_only Disabled)"
        B --> P0[Primary Shard 0 - Node 1]
        B --> R0[Replica Shard 0 - Node 2]
        B --> S0[Search Replica 0 - Node 3]
    end

    subgraph "Traffic Distribution"
        C[Write Traffic] --> P0
        D[Search Traffic] --> S0
    end
Loading

Solution during Cluster Recovery

Index without search_only mode honors default behavior:

Reason for user to run the restore API #17299 (comment).

graph TD
    A[Cluster Restarted] --> B[No Persistent Data Directory]
    B --> C[Primary & Replica Shards Lost]
    C --> D[Cluster State In Remote Store]

    subgraph "Shard State After Restart"
        C --> P0[Primary Shard - UNASSIGNED]
        C --> R0[Replica Shard - UNASSIGNED]
        C --> S0[Search Replica Shard - UNASSIGNED]
    end

    D --> E[User Runs _remotestore/_restore]
    E --> F[Primary, Replica and Search Replica Shards Restored]
    F --> G[Cluster Fully Recovered]
Loading

Index with search_only has an advantage of search replicas auto recovery.

More testing details #17299 (comment). Reason on why search replicas can be auto restored #17299 (comment)

graph TD
    A[Cluster Restarted] --> B[Search-Only Mode Enabled]
    B --> C[Primary & Replica Shards Skipped]
    C --> D[Search Replicas Auto-Start]

    subgraph "Shard State After Restart"
        D --> S0[Search Replica 0 - STARTED on Node 3]
        D --> S1[Search Replica 1 - STARTED on Node 5]
        D --> S2[Search Replica 2 - STARTED on Node 7]
    end

Loading

Related component

Search:Performance

Metadata

Metadata

Labels

Roadmap:SearchProject-wide roadmap labelSearch:PerformanceenhancementEnhancement or improvement to existing feature or requestv3.0.0Issues and PRs related to version 3.0.0

Type

No type

Projects

Status

✅ Done

Status

Done

Status

New

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions