-
-
Notifications
You must be signed in to change notification settings - Fork 217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CEP] Use native Elasticsearch reindexing for index changes #26516
Comments
I'm not actually offering to do this or to have SaaS unilaterally prioritize it. @snopoke it sounds like you've been thinking about this and I wanted to create a public place for that discussion. If ICDS wants to take some initiative at the planning level, I can see SaaS being willing to pitch in effort as well, since we'd clearly also get some benefit from it. |
I haven't thought about this much but reading the docs I see there are options for updating or overwriting or ignoring documents that already exist in the target index. One option would be to start the pillow writing to both old and new indexes before the reindex starts and configure the ES reindex to ignore existing docs. Just looking at our current reindex workflow I think the part that sets the pillow checkpoints is broken because either it does not set the checkpoint at all (e.g. sql form reindexer) or it uses the old pillows (e.g. user reindexer). |
This sounds like a good path to go on, but agree we might need to think a bit more about the details. One thing that will be nice is if reindexing etc can be decoupled from env to env. |
I looked into this as part of reindexing the large index on ICDS. There are few challenges to using native ES reindexing.
Given all these challenges native Reindex might not be better than our HQ reindex tooling. Above is a concise summary from the doc where I took notes while researching this, which has more details to points.. |
Adding some notes from the staging test
|
@sravfeyn can you update this with the current state of the reindex tools you used. |
Abstract
Incorporate https://www.elastic.co/guide/en/elasticsearch/reference/2.4/docs-reindex.html into our automatic elasticsearch reindexing setup.
Motivation
It's supposedly much faster than resyncing all the docs ourselves
Specification
There should likely be a fallback method for when we need to reindex data in place because of an issue with the pillows, as opposed to reindexing because we changed the mapping, which is the more common case.
Impact on users
This should not affect users at all.
Impact on hosting
This change should be transparent to local hosting setups. If done before the EOL of our ES 1 backend option, it should fall back to current behavior if the setting
ELASTICSEARCH_MAJOR_VERSION = 1
is used.Backwards compatibility
Besides backwards compatibility with
ELASTICSEARCH_MAJOR_VERSION = 1
described above, this should be an in place replacement of our current system with no major affects on users or devops, other than reindexes being faster.Release Timeline
There is no hard date by which we must do this, but we'd probably want to do it before the next time we reindex forms or cases, as in #25666.
Open questions and issues
I'm not sure we fully understand the behavior of the native elasticsearch reindex functionality. There's always the tricky issue of how to make sure we don't skip any items that have come in between when we start the reindex and when we flip all new reads and writes to it; it's possible that our current code already handles this correctly and in a way that cleanly applies to the proposed reindex implementation.
The text was updated successfully, but these errors were encountered: