-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce ena submission induced db load #2875
Conversation
IIUC, we check every 2 minutes for new data? That seems a bit excessive still, though maybe good for testing. Should we make this configurable? So we can change it easily through values yaml depending on whether one is testing or not? We should use 304 caching here as well if this is something that's constantly running. None of this is blocking but might be worth doing before we enable in prod/staging etc |
Maybe I misunderstood. Can you quickly outline which parts poll constantly to which endpoints? There seem to be at least 3 different hosts we talk to:
Which of these endpoints are talked to every minute or so? |
I poll the postgres db for entries in a specific state (every 10seconds now), then I poll github for data added to https://github.com/pathoplexus/ena-submission (every 2min now), then only after assemblies have been submitted (i.e. there are entries in the assembly_table in state WAITING) I poll ENA for accessions every 5min (not changed by this PR). |
This is for checking if new data has been uploaded to github - I can make this modifiable :-)
I'm not immediately sure how to do this for requests to github but I will look into it! |
Ah no - we can't do this with Github, I just meant to implement 304 for repeated loculus db requests we're making. |
Cool so it's the first 10s poll to the loculus db that we should use the 304 on. |
Ah ok - so this is an actual sql query as I talk directly to the db - but I could add (similar to the table we added for the backend) a trigger table and check if there have been any changes there before performing the sql query for entries in a specific state? Does that make sense? |
I guess those specific queries here are cheap as they are only on submittable sequences of which we have very few at this point - so we can improve efficiency later. I was primarily worried about |
resolves #
preview URL: https://reduce-ena-db-load.loculus.org/
Summary
Screenshot
PR Checklist