Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(backend, silo-preprocessing): add new header x-total-records to get-released-data to enable streaming result verification #2773

Merged
merged 13 commits into from
Sep 13, 2024

Conversation

anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Sep 12, 2024

partially resolves #2724

preview URL: http://add-count-in-header.loculus.org/

Summary

As we send data in ndjson format, clients (such as the silo_import_job) have no way of knowing if the data they have received is complete or if the stream was interrupted early.

To fix this, /get-released-data adds header x-rowcount to the response with the total number of expected records in the accompanying stream.

Clients can then compare this number with the responseBody to see if they have received all records and discard/re-request in case the stream was aborted early.

Small caveat

The actual streaming is executed after getReleased is defined. This means the count calculated for the header and the results of the stream are not inside the same database transaction. This means that if the table was updated in between calculating the record and streaming there could be more records in the stream than expected from the header.

silo_import_job will not use the data received from get-released-data if x-total-records header and the actual record count do not match. Meaning it could take slightly longer for new sequences to appear in the database but this should be negligible.

This only causes a problem if the stream is cut off after exactly as many records as expected from the header. (e.g. count is 20, add one new sequence, stream should include 21 records, stream is cut off after record 20, client thinks stream completed successfully - but we lost an arbitrary sequence).

PR Checklist

  • All necessary documentation has been adapted.
  • The implemented feature is covered by an appropriate test.
  • Add zstd (and probably jq to the silo-preprocessing docker image) - jq is already there: feat(docker): Add zstd to the dockerimage used in silo preprocessing GenSpectrum/LAPIS-SILO#574, update: I can install zstd after the container is set up, not very clean but this works
  • Break stream locally and check that this still works as intended.
    Test locally by adding a Thread.sleep(1000) between stream iterations in streamAsNdjson and killing the backend mid-stream, the import script responds as expected:
Response should contain a total of : 2141 records
jq: parse error: Invalid numeric literal at line 30, column 3
Response contained a total of : 29 records
Expected and actual number of records are not the same

@anna-parker anna-parker changed the title get-released-data: add record count to new header x-record-count fix(backend, silo-preprocessing): add new header x-total-records to get-released-data to enable streaming result verification Sep 12, 2024
@anna-parker anna-parker added the preview Triggers a deployment to argocd label Sep 12, 2024
@anna-parker anna-parker marked this pull request as ready for review September 12, 2024 15:48
Copy link
Member

@chaoran-chen chaoran-chen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

anna-parker and others added 2 commits September 13, 2024 15:47
…/SubmissionDatabaseService.kt

Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>
…/SubmissionDatabaseService.kt

Co-authored-by: Cornelius Roemer <cornelius.roemer@gmail.com>
@anna-parker anna-parker merged commit 9a76840 into main Sep 13, 2024
17 checks passed
@anna-parker anna-parker deleted the add_count_in_header branch September 13, 2024 14:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
5 participants