Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clean up all previous indexes after successfully switching to a new one during data refresh #1481

Closed
1 task
sarayourfriend opened this issue Aug 11, 2022 · 4 comments
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@sarayourfriend
Copy link
Contributor

Problem

If the data refresh fails after the ES indexes are created, they never get cleaned up. This leaves indexes lying around in ES that are unused and useless.

Description

Once we've successfully switched over to the new index at the end of data refresh and we are getting ready to delete the previous index, we could extend that delete operation to be all other indexes aside from the currently used one.

Outstanding questions

Is it wise to immediately delete the previous index after data refresh? Would it be prudent to keep the immediately previous index around but delete any other indexes? If so, would using date versioning for indexes be a good way to track the order of the indexes or does ES keep track of index creation date in a way that would be useful for us to know which indexes are the "current, immediately previous, all others"?

Implementation

  • 🙋 I would be interested in implementing this feature.
@sarayourfriend sarayourfriend added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository labels Aug 11, 2022
@obulat obulat added the 🧱 stack: catalog Related to the catalog and Airflow DAGs label Feb 24, 2023
@obulat obulat transferred this issue from WordPress/openverse-catalog Apr 17, 2023
@obulat
Copy link
Contributor

obulat commented Nov 24, 2023

Do you think we should add the clean up of all previous indexes after a failed indexing run (#1756) into this issue, @sarayourfriend?

@krysal
Copy link
Member

krysal commented Nov 24, 2023

@obulat This looks more like a duplicate of the issue you mention.

But regardless of which one we use, unless this behavior is desirable only for the production environment, it will conflict with the Search relevancy sandbox project, which intends to create multiple indexes to test the performance of different configurations. We could find a workaround with keeping some aliases or excluding patterns in the name of the index (e.g. proportional-by-provider) but I think I'll prefer we avoid automatic deletions of all the non-production indexes.

An alternative is to create a notification for dangling indexes, similar to the one sent for new Flickr subproviders. What do you think?

@obulat
Copy link
Contributor

obulat commented Nov 25, 2023

@krysal, I agree with the preference for not deleting the indexes automatically, and like your idea of notifications for dangling indexes.
It would be nice to have a dashboard showing all of the indexes with a short description of what each index is for (and if it's dangling) after we start working with more indexes from the Search relevancy sandbox. Then a maintainer would know whether they should delete an index after seeing the notification, or leave it if it's not really dangling.

@sarayourfriend
Copy link
Contributor Author

I think automatically deleting the indexes is probably a big can of worms that could go disastrously wrong. It's not something that happens very often at all, if at all, and when it does it's easy to do this by hand and less prone to accidentally deleting an index we actually did need to keep around.

I'm going to close this issue. If we need an alert for this kind of thing, that should be a separate issue, and probably we should wait until we actually have experienced the need for it, otherwise it's work without precedence.

@sarayourfriend sarayourfriend closed this as not planned Won't fix, can't repro, duplicate, stale Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Archived in project
Development

No branches or pull requests

3 participants