-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(backend, silo-preprocessing): add new header x-total-records to get-released-data to enable streaming result verification #2773
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
anna-parker
changed the title
get-released-data: add record count to new header x-record-count
fix(backend, silo-preprocessing): add new header x-total-records to get-released-data to enable streaming result verification
Sep 12, 2024
backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
Outdated
Show resolved
Hide resolved
* check if zstd can be installed after container set up.
fengelniederhammer
approved these changes
Sep 13, 2024
...end/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt
Outdated
Show resolved
Hide resolved
...end/src/test/kotlin/org/loculus/backend/controller/submission/GetReleasedDataEndpointTest.kt
Outdated
Show resolved
Hide resolved
backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
Outdated
Show resolved
Hide resolved
backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
Outdated
Show resolved
Hide resolved
chaoran-chen
approved these changes
Sep 13, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
backend/src/main/kotlin/org/loculus/backend/controller/SubmissionController.kt
Show resolved
Hide resolved
backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt
Show resolved
Hide resolved
backend/src/main/kotlin/org/loculus/backend/service/submission/SubmissionDatabaseService.kt
Show resolved
Hide resolved
…/SubmissionDatabaseService.kt Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>
…/SubmissionDatabaseService.kt Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>
corneliusroemer
approved these changes
Sep 13, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
partially resolves #2724
preview URL: http://add-count-in-header.loculus.org/
Summary
As we send data in ndjson format, clients (such as the silo_import_job) have no way of knowing if the data they have received is complete or if the stream was interrupted early.
To fix this, /get-released-data adds header x-rowcount to the response with the total number of expected records in the accompanying stream.
Clients can then compare this number with the responseBody to see if they have received all records and discard/re-request in case the stream was aborted early.
Small caveat
The actual streaming is executed after
getReleased
is defined. This means the count calculated for the header and the results of the stream are not inside the same database transaction. This means that if the table was updated in between calculating the record and streaming there could be more records in the stream than expected from the header.silo_import_job
will not use the data received fromget-released-data
ifx-total-records
header and the actual record count do not match. Meaning it could take slightly longer for new sequences to appear in the database but this should be negligible.This only causes a problem if the stream is cut off after exactly as many records as expected from the header. (e.g. count is 20, add one new sequence, stream should include 21 records, stream is cut off after record 20, client thinks stream completed successfully - but we lost an arbitrary sequence).
PR Checklist
(and probably jq to the silo-preprocessing docker image)- jq is already there: feat(docker): Add zstd to the dockerimage used in silo preprocessing GenSpectrum/LAPIS-SILO#574, update: I can install zstd after the container is set up, not very clean but this worksTest locally by adding a Thread.sleep(1000) between stream iterations in
streamAsNdjson
and killing the backend mid-stream, the import script responds as expected: