Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Searchable Snapshot] [Design Proposal] Remote Searchable Snapshots #3895

Closed
Tracked by #2919
andrross opened this issue Jul 13, 2022 · 0 comments
Closed
Tracked by #2919

[Searchable Snapshot] [Design Proposal] Remote Searchable Snapshots #3895

andrross opened this issue Jul 13, 2022 · 0 comments
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request

Comments

@andrross
Copy link
Member

andrross commented Jul 13, 2022

This document outlines a proposal for implementing remote searchable snapshots. Please feel free to leave comments below with any thoughts or suggestions!

Goal

Users should be able to search indexes within snapshots in remote repositories without downloading all index data to disk ahead of time. The goal represents phase 2 under the storage roadmap proposal (#3739).

Requirements

  1. Provide the mechanism to search a previously-generated snapshot within any supported repository (Azure, GCP, S3, HDFS, etc) without downloading all the contents of the snapshot ahead of time on the node
  2. Minimize impact of latency required to fetch data from remote repository via strategies such as caching previously-searched data and prefetching data during query execution
  3. Provide the capability for a node to act as a remote reader either exclusively or as an additional role

Approach

An OpenSearch snapshot backed up within a Repository consists of the index metadata, shard metadata, corresponding state and segment files for the backed up index. The searchable snapshot feature will add the capability to restore a snapshot to an index without downloading and instead access the remote data on-demand at query time. The high level approach is outlined below, with links to the corresponding issues with more details.

  • Introduce a searchable snapshot feature flag to gate the new functionality while under development
  • API: Implement an API for creating an index backed by a remote snapshot (starting with a naive storage layer to prove out the end-to-end functionality).
  • Storage: Implement the block-based storage mechanism for efficiently fetching the remote data on demand.
  • Configuration: Introduce a new node role for a remote searcher, with corresponding node settings for defining the location/size of the disk-based cache to use. This work will include all changes to shard allocation because a searchable snapshot shard must be allocated to a node with the appropriate role. Full design TBD.
  • Optimizations: UltraWarm has implemented the ability to prefetch data during search execution to reduce the time spent waiting on round trips, which should be implemented here for best performance. Concurrent segment search is another mechanism that may reduce the time spent waiting on remote access. Full benchmarking and design TBD.

API

Summarized from here:

A new parameter will be introduced in the snapshot restore API: storage_type.

Setting Description
storage_type Must be one of local or remote_snapshot. local is the default if not specified, and indicates that all snapshot metadata and index data will be downloaded to local instance storage. remote_snapshot indicates that snapshot metadata will be downloaded to the cluster but the remote repository will remain the authoritative store of the index data. Data will be downloaded and cached as necessary to service queries. At least one node in the cluster must be configured for the search role in order to restore a snapshot of type remote_snapshot.

For example:

POST _snapshot/my-repository/2/_restore
{
  "indices": "my-index*",

  "storage_type": "remote_snapshot",  <-- NEW PARAMETER

  "rename_pattern": "my-index(.+)",
  "rename_replacement": "restored-my-index$1"
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issues intended to help drive brainstorming and decision making enhancement Enhancement or improvement to existing feature or request feature New feature or request
Projects
Status: Done
Development

No branches or pull requests

1 participant