[Feature Proposal] Writable Remote Index #7804
Labels
enhancement
Enhancement or improvement to existing feature or request
RFC
Issues requesting major changes
Storage
Issues and PRs relating to data and metadata storage
Goal
As an extension to remote store feature, searchable remote index will introduce data tier support in OpenSearch. Hot index has data in local disk as well as remote store whereas warm index has data only in the remote store. The next step is writable warm index. This RFC talks about the requirement of writable warm, different approaches to support writes, pros/cons of each of the approaches and recommends an approach.
Background
This doc assumes following index structure with data tiers. Example provided is just to highlight sample pattern and can be changed as per user’s requirements
orders
- Live index, normal writes go to this index.order-history-<DATE>
- orders index is rotated on a daily basis and rotated index is suffixed with the date.orders-alias
points to indexes containing last 30 days of data.orders
is added to this alias withis_write_index=true
. That means, if we use alias to write data, it will always write toorders
index.order-history-2023-02-22
toorder-history-2023-02-16
are hot indexes and can be written in the same way we write data to an index today.order-history-2023-02-15
toorder-history-2023-01-16
are warm indexes.Requirements
Functional
Non-Functional
Non-Requirements
order-history-2023-02-22
would need the same index name to be provided. Writing to alias will only write to live hot index.orders
alias as per the example above). Based on a configured field (liketimestamp
), OpenSearch decides which index to write the data to. Even though this is valid requirement, this can be built incrementally.Use Cases
Write New Data
Add new documents to the existing warm index. This use case is mostly driven by back-filling data that was not ingested earlier due to some reason. This assumes that user knows which index to use for writing the new data.
Update Existing Data
To update existing data, we need to fetch the existing document first. To improve the latency we need to perform block-level fetches. Once the document is fetched and new changes are applied to it, the next step would be same as
Write New Data
Potential Approaches
These approaches provide solution for
Write New Data
use case only asUpdate Existing Data
use case internally depends on write new data.[Recommended]
Once the request to write hits the warm index, we open the engine in read-write mode, with the metadata from local disk. We can potentially have warm index have engine open in read-write mode from the start to support writes.
For non-append-only cases we do a block fetch of the document that needs to be updated. Then perform an update of the document, by writing to remote translog before we ack back.
For append-only uses cases, we can skip the block fetch part altogether since we know its a new document and write directly to remote translog. Based on configurable delay we refresh the segments and move the newly created segments and updated bitsets to remote segment store. More details of this approach will be covered in the design review.
Alternative Approaches
Download All Data
In this approach, we make the index hot by downloading all data from remote store to local disk. Once data is downloaded, new data is ingested into it. As this is warm index, we can’t keep the data forever on the local disk. We wait for X mins after last write to avoid frequent downloading of the data then flush and delete data (and metadata based on the data tier type type) from local disk.
Comparison
Potential Issues
Next Steps
RemoteDirectory
instead ofFSDirectory
inIndexShard.Store
The text was updated successfully, but these errors were encountered: