Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ena-submission): Parse get analysis process response better for single segment case, return partial results and submit projects with holdUntilDate #2896

Merged
merged 20 commits into from
Sep 30, 2024

Conversation

anna-parker
Copy link
Contributor

@anna-parker anna-parker commented Sep 28, 2024

resolves #2895

preview URL: https://patch-create-assembly.loculus.org/

Summary

Note that chromosome accessions are the accessions of the nucleotides in each submitted segment, in PP we call this insdcAccession_{segment}.

  1. Rename check_ena to get_ena_analysis_process, refactor so that get_chromsome_accessions is a separate sub-function of get_ena_analysis_process, and results are returned when either the gca accession OR the chromosome accessions are returned (in contrast to now only returning results when both are returned)
  2. Add tests that get_chromsome_accessions works for single and multi-segmented viruses now that we know the expected response format for both. I added a test where I mock the response using a known response text for a 1-segment case and confirm this works,
  3. Update assembly_table.results column response results as soon as gca or chromosome accessions are known, but only change from WAITING to SUBMITTED state when both are known. Do not update table when there are no results or results are the same as already in table.

This PR additionally modifies the project submission request to by default submit projects with a hold_until_date parameter which is set to the day of submission (as we anyways only submit data to ENA when it is OPEN we should make the project public).

Screenshot

Testing

Tested on preview by

17:36:33     INFO (  create_assembly.py: 513) - Assembly submission for accession LOC_000RBTK succeeded and accession returned!
  • Sadly the erz accessions are private so I could not fully confirm this works for the single segment case
  • I additionally test that when only the chromosome accession is returned the table (by editing out code to add in the gca accession) the table stays in state WAITING:
17:52:59     INFO (  create_assembly.py: 489) - Partial results of assembly submission for accession LOC_000RBTK returned!

@anna-parker anna-parker added the preview Triggers a deployment to argocd label Sep 29, 2024
ena-submission/Snakefile Outdated Show resolved Hide resolved
@anna-parker anna-parker marked this pull request as ready for review September 29, 2024 18:03
@anna-parker anna-parker changed the title Update to allow for receiving only gca or insdc accessions, fix case … fix(ena-submission): Parse get analysis process response better for single segment case, return partial results and submit projects with holdUntilDate Sep 29, 2024
@anna-parker anna-parker merged commit 9920c16 into main Sep 30, 2024
15 checks passed
@anna-parker anna-parker deleted the patch_create_assembly branch September 30, 2024 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Triggers a deployment to argocd
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Correct mapping of chromosome/gca accession fields in check_ena function.
2 participants