Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vs 1416 modify ingest to correctly handle ploidy differences in dragen 3 7 8 samples #8994

Open
wants to merge 15 commits into
base: ah_var_store
Choose a base branch
from

Conversation

koncheto-broad
Copy link

Three major changes here.

  1. Added in logic to create the ploidy table during ingest (with necessary supporting class) and use it during extract automatically as part of the default joint workflow. Also removed a column that we won't need when creating it automatically.
  2. Rearranged the PAR checking logic to consolidate it in its own class (PloidyUtils.java).

Successful run against tiny sample set "PLOIDY_TEST" in echo callset project:

https://app.terra.bio/#workspaces/allofus-drc-wgs-dev/GVS%20AoU%20WGS%20Echo%20Callset%20v2/job_history/a93aa2ef-9cef-451d-8cf8-b31f1c6a8407

You'll need your aou credentials to see the results.

@gatk-bot
Copy link

gatk-bot commented Oct 9, 2024

Github actions tests reported job failures from actions build 11256533054
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 11256533054.11 logs
integration 17.0.6+10 11256533054.0 logs

* https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/vcf/MendelianViolations/FindMendelianViolations.java#L203
* Wikipedia data: https://en.wikipedia.org/wiki/Pseudoautosomal_region
*/
private static final Set<String> PSEUDO_AUTOSOMAL_REGIONS_DEFINITION = CollectionUtil .makeSet("X:60001-2699520", "X:154931044-155260560", "chrX:10001-2781479", "chrX:155701383-156030895", "Y:10001-2649520", "Y:59034050-59363566", "chrY:10001-2781479", "chrY:56887903-57217415");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little surprised this doesn't exist already somewhere in GATK

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So was I, to be honest. I grepped a bunch of terms related to PARs, and the best I could find is that FindMendelianViolations tool that I linked in the comments. Strangely, that one doesn't even include the regions on the y chromosome, for whatever reason. AND its definition and implementation was private, so I couldn't even just import it here.

It's possible I missed something, but at the very least I can say that anything that may exist is pretty non-obvious.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ask in the methods channel perhaps?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would be defined in the reference fasta itself? I've seen some with XY or PAR as the contig

@@ -74,7 +74,7 @@ task GetToolVersions {
String cloud_sdk_slim_docker = "gcr.io/google.com/cloudsdktool/cloud-sdk:435.0.0-slim"
String variants_docker = "us-central1-docker.pkg.dev/broad-dsde-methods/gvs/variants:2024-09-30-alpine-27b8708f5808"
String variants_nirvana_docker = "us.gcr.io/broad-dsde-methods/variantstore:nirvana_2022_10_19"
String gatk_docker = "us-central1-docker.pkg.dev/broad-dsde-methods/gvs/gatk:2024_08_19-gatkbase-cd5b6b7821b2"
String gatk_docker = "us-central1-docker.pkg.dev/broad-dsde-methods/gvs/gatk:2024-10-08-gatkbase-2e8764db86c3"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a heads up you and @gbggrant appear to be ⚔️ here with your GATK Docker changes

* https://github.com/broadinstitute/picard/blob/master/src/main/java/picard/vcf/MendelianViolations/FindMendelianViolations.java#L203
* Wikipedia data: https://en.wikipedia.org/wiki/Pseudoautosomal_region
*/
private static final Set<String> PSEUDO_AUTOSOMAL_REGIONS_DEFINITION = CollectionUtil .makeSet("X:60001-2699520", "X:154931044-155260560", "chrX:10001-2781479", "chrX:155701383-156030895", "Y:10001-2649520", "Y:59034050-59363566", "chrY:10001-2781479", "chrY:56887903-57217415");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ask in the methods channel perhaps?

Copy link
Collaborator

@gbggrant gbggrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thumbs up, conditional on:

  1. Miguel's comments are addressed.
  2. Passing integration test (ideally on ALL chromosomes).

datatype = "sample_chromosome_ploidy",
schema_json = sample_chromosome_ploidy_schema_json,
max_table_id = 1,
superpartitioned = "false",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
superpartitioned = "false",
superpartitioned = false,

schema_json = sample_chromosome_ploidy_schema_json,
max_table_id = 1,
superpartitioned = "false",
partitioned = "false",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
partitioned = "false",
partitioned = false,

@gatk-bot
Copy link

gatk-bot commented Oct 24, 2024

Github actions tests reported job failures from actions build 11502878660
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 11502878660.11 logs
integration 17.0.6+10 11502878660.0 logs

@gatk-bot
Copy link

gatk-bot commented Oct 24, 2024

Github actions tests reported job failures from actions build 11506462963
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 11506462963.11 logs
integration 17.0.6+10 11506462963.0 logs

@gatk-bot
Copy link

gatk-bot commented Oct 24, 2024

Github actions tests reported job failures from actions build 11506500757
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 11506500757.11 logs
integration 17.0.6+10 11506500757.0 logs

@gatk-bot
Copy link

gatk-bot commented Oct 25, 2024

Github actions tests reported job failures from actions build 11519677920
Failures in the following jobs:

Test Type JDK Job ID Logs
integration 17.0.6+10 11519677920.11 logs
integration 17.0.6+10 11519677920.0 logs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants