Skip to content

Commit

Permalink
Merge branch 'develop' into lk-paired-tag-trimming
Browse files Browse the repository at this point in the history
  • Loading branch information
ekiernan authored Apr 3, 2024
2 parents 5e07125 + a4aa631 commit 07ce4e7
Show file tree
Hide file tree
Showing 40 changed files with 121 additions and 37 deletions.
5 changes: 5 additions & 0 deletions pipelines/broad/arrays/single_sample/Arrays.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 2.6.23
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 2.6.22
2023-12-18 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/broad/arrays/single_sample/Arrays.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ import "../../../../tasks/broad/Utilities.wdl" as utils
workflow Arrays {

String pipeline_version = "2.6.22"
String pipeline_version = "2.6.23"

input {
String chip_well_barcode
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 2.1.12
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list. This is useful for WGS samples that were previously running out of memory.

# 2.1.11
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import "../../../../../../tasks/broad/Qc.wdl" as QC

workflow ReblockGVCF {

String pipeline_version = "2.1.11"
String pipeline_version = "2.1.12"


input {
Expand Down Expand Up @@ -48,6 +48,7 @@ workflow ReblockGVCF {
ref_dict = ref_dict,
calling_interval_list = select_first([calling_interval_list, gvcf]), #nice trick so we don't have to pass around intervals; shouldn't be too much slower
calling_interval_list_index = gvcf_index,
calling_intervals_defined = defined(calling_interval_list),
is_gvcf = true,
extra_args = "--no-overlaps",
gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.1.19
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 3.1.18
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ import "../../../../../../structs/dna_seq/DNASeqStructs.wdl"
# WORKFLOW DEFINITION
workflow ExomeGermlineSingleSample {

String pipeline_version = "3.1.18"
String pipeline_version = "3.1.19"


input {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.16
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 1.0.15
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ workflow UltimaGenomicsWholeGenomeGermline {
filtering_model_no_gt_name: "String describing the optional filtering model; default set to rf_model_ignore_gt_incl_hpol_runs"
}

String pipeline_version = "1.0.15"
String pipeline_version = "1.0.16"


References references = alignment_references.references
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.1.20
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 3.1.19
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ import "../../../../../../structs/dna_seq/DNASeqStructs.wdl"
workflow WholeGenomeGermlineSingleSample {


String pipeline_version = "3.1.19"
String pipeline_version = "3.1.20"


input {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 2.1.18
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 2.1.17
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ import "../../../../../tasks/broad/DragenTasks.wdl" as DragenTasks
workflow VariantCalling {


String pipeline_version = "2.1.17"
String pipeline_version = "2.1.18"


input {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.16
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 1.0.15
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ workflow UltimaGenomicsWholeGenomeCramOnly {
save_bam_file: "If true, then save intermeidate ouputs used by germline pipeline (such as the output BAM) otherwise they won't be kept as outputs."
}

String pipeline_version = "1.0.15"
String pipeline_version = "1.0.16"

References references = alignment_references.references

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.12.17
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 1.12.16
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ import "../../../../tasks/broad/Qc.wdl" as Qc
workflow IlluminaGenotypingArray {

String pipeline_version = "1.12.16"
String pipeline_version = "1.12.17"

input {
String sample_alias
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.1.7
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 1.1.6
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ workflow BroadInternalArrays {
description: "Push outputs of Arrays.wdl to TDR dataset table ArraysOutputsTable."
}

String pipeline_version = "1.1.6"
String pipeline_version = "1.1.7"

input {
# inputs to wrapper task
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.17
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 1.0.16
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import "../../../../../../../pipelines/broad/qc/CheckFingerprint.wdl" as FP

workflow BroadInternalUltimaGenomics {

String pipeline_version = "1.0.16"
String pipeline_version = "1.0.17"

input {

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.29
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 1.0.28
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import "../../../../tasks/broad/Utilities.wdl" as utils

workflow BroadInternalRNAWithUMIs {

String pipeline_version = "1.0.28"
String pipeline_version = "1.0.29"

input {
# input needs to be either "hg19" or "hg38"
Expand Down
5 changes: 5 additions & 0 deletions pipelines/broad/qc/CheckFingerprint.changelog.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 1.0.16
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 1.0.15
2023-12-18 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/broad/qc/CheckFingerprint.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ import "../../../tasks/broad/Qc.wdl" as Qc
workflow CheckFingerprint {

String pipeline_version = "1.0.15"
String pipeline_version = "1.0.16"

input {
File? input_vcf
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.1.19
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 3.1.18
2023-12-18 (Date of Last Commit)

Expand Down
2 changes: 1 addition & 1 deletion pipelines/broad/reprocessing/exome/ExomeReprocessing.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import "../../../../structs/dna_seq/DNASeqStructs.wdl"
workflow ExomeReprocessing {


String pipeline_version = "3.1.18"
String pipeline_version = "3.1.19"

input {
File? input_cram
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.1.21
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 3.1.20
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ import "../../../../../tasks/broad/CopyFilesFromCloudToCloud.wdl" as Copy

workflow ExternalExomeReprocessing {

String pipeline_version = "3.1.20"
String pipeline_version = "3.1.21"


input {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 2.1.21
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 2.1.20
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import "../../../../../tasks/broad/CopyFilesFromCloudToCloud.wdl" as Copy
workflow ExternalWholeGenomeReprocessing {


String pipeline_version = "2.1.20"
String pipeline_version = "2.1.21"

input {
File? input_cram
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
# 3.1.20
2024-03-26 (Date of Last Commit)

* ValidateVcfs requires less memory when run without interval list; this does not affect this pipeline

# 3.1.19
2023-12-18 (Date of Last Commit)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import "../../../../structs/dna_seq/DNASeqStructs.wdl"

workflow WholeGenomeReprocessing {

String pipeline_version = "3.1.19"
String pipeline_version = "3.1.20"

input {
File? input_cram
Expand Down
19 changes: 11 additions & 8 deletions tasks/broad/Qc.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -617,17 +617,17 @@ task ValidateVCF {
File? dbsnp_vcf
File? dbsnp_vcf_index
File calling_interval_list
File? calling_interval_list_index # if the interval list is a VCF, than an index file is also required
File? calling_interval_list_index # if the interval list is a VCF, than an index file makes VcfToIntervalList run faster
Boolean calling_intervals_defined = true
Int preemptible_tries = 3
Boolean is_gvcf = true
String? extra_args
String gatk_docker = "us.gcr.io/broad-gatk/gatk:4.5.0.0"
Int machine_mem_mb = 7000
}
Boolean calling_intervals_is_vcf = defined(calling_interval_list_index)
String calling_interval_list_basename = basename(calling_interval_list)
String calling_interval_list_index_basename = if calling_intervals_is_vcf then basename(select_first([calling_interval_list_index])) else ""
String calling_interval_list_index_basename = if calling_intervals_defined then "" else basename(select_first([calling_interval_list_index]))
Int command_mem_mb = machine_mem_mb - 2000
Float ref_size = size(ref_fasta, "GiB") + size(ref_fasta_index, "GiB") + size(ref_dict, "GiB")
Expand All @@ -636,18 +636,21 @@ task ValidateVCF {
command {
set -e
# We can't always assume the index was located with the vcf, so make a link so that the paths look the same
ln -s ~{calling_interval_list} ~{calling_interval_list_basename}
if [ ~{calling_intervals_is_vcf} == "true" ]; then
if [ ~{calling_intervals_defined} == "false" ]; then
# We can't always assume the index was located with the vcf, so make a link so that the paths look the same
ln -s ~{calling_interval_list} ~{calling_interval_list_basename}
ln -s ~{calling_interval_list_index} ~{calling_interval_list_index_basename}
gatk VcfToIntervalList -I ~{calling_interval_list_basename} -O intervals_from_gvcf.interval_list
INTERVALS="intervals_from_gvcf.interval_list"
else
INTERVALS="~{calling_interval_list}"
fi
# Note that WGS needs a lot of memory to do the -L *.vcf if an interval file is not supplied
gatk --java-options "-Xms~{command_mem_mb}m -Xmx~{command_mem_mb}m" \
ValidateVariants \
-V ~{input_vcf} \
-R ~{ref_fasta} \
-L ~{calling_interval_list_basename} \
-L $INTERVALS \
~{true="-gvcf" false="" is_gvcf} \
--validation-type-to-exclude ALLELES \
~{"--dbsnp " + dbsnp_vcf} \
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ slug: /Pipelines/Exome_Germline_Single_Sample_Pipeline/README

| Pipeline Version | Date Updated | Documentation Author | Questions or Feedback |
| :----: | :---: | :----: | :--------------: |
| [ExomeGermlineSingleSample_v3.1.18](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |
| [ExomeGermlineSingleSample_v3.1.19](https://github.com/broadinstitute/warp/releases?q=ExomeGermlineSingleSample_v3.0.0&expanded=true) | March, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in WARP or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |


The Exome Germline Single Sample pipeline implements data pre-processing and initial variant calling according to the GATK Best Practices for germline SNP and Indel discovery in human exome sequencing data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@
sidebar_position: 2
---

# Exome Germline Single Sample v3.1.18 Methods
# Exome Germline Single Sample v3.1.19 Methods

The following contains a detailed methods description outlining the pipeline’s process, software, and tools that can be modified for a publication methods section.

## Detailed Methods

Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.1.17 pipeline using Picard 2.26.10, GATK 4.5.0.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)).
Preprocessing and variant calling was performed using the ExomeGermlineSingleSample 3.1.19 pipeline using Picard 2.26.10, GATK 4.5.0.0, and Samtools 1.11 with default tool parameters unless otherwise specified. All reference files are available in the public [Broad References Google Bucket](https://console.cloud.google.com/storage/browser/gcp-public-data--broad-references/hg38/v0). The pipeline follows GATK Best Practices as previously described ([Van der Auwera & O'Connor, 2020](https://www.oreilly.com/library/view/genomics-in-the/9781491975183/)) as well as the Functional Equivalence specification ([Regier et al., 2018](https://www.nature.com/articles/s41467-018-06159-4)).

### Pre-processing and QC

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_position: 2

# VCF Overview: Illumina Genotyping Array

The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.12.16 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline.
The [Illumina Genotyping Array Pipeline](https://github.com/broadinstitute/warp/blob/develop/pipelines/broad/genotyping/illumina/IlluminaGenotypingArray.wdl) v1.12.17 pipeline produces a VCF (Variant Call Format) output with data processing and sample-specific genotype information. The VCF follows the format listed in the [VCF 4.2 specification](https://samtools.github.io/hts-specs/VCFv4.2.pdf), but additionally contains fields and attributes that are unique to the Arrays pipeline.

This document describes the Array pipeline’s unique VCF fields and attributes that are not listed in the standard VCF specification. To learn more about the pipeline, see the [Illumina Genotyping Array Pipeline Overview](./README.md).

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ slug: /Pipelines/Illumina_Genotyping_Arrays_Pipeline/README

| Pipeline Version | Date Updated | Documentation Author | Questions or Feedback |
| :----: | :---: | :----: | :--------------: |
| [Version 1.12.16](https://github.com/broadinstitute/warp/releases) | February, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |
| [Version 1.12.17](https://github.com/broadinstitute/warp/releases) | March, 2024 | [Elizabeth Kiernan](mailto:ekiernan@broadinstitute.org) | Please file GitHub issues in warp or contact [the WARP team](mailto:warp-pipelines-help@broadinstitute.org) |

![The Illumina Genotyping Array Pipeline](./IlluminaGenotyping.png)

Expand Down Expand Up @@ -239,7 +239,7 @@ The Illumina Genotyping Array Pipeline is available on the cloud-based platform

## Citing the Illumina Genotyping Array Pipeline

If you use the Illumina Genotyping Array Pipeline in your research, please consider citing our preprint:
If you use the Illumina Genotyping Array Pipeline in your research, please cite our preprint:

Degatano, K.; Awdeh, A.; Dingman, W.; Grant, G.; Khajouei, F.; Kiernan, E.; Konwar, K.; Mathews, K.; Palis, K.; Petrillo, N.; Van der Auwera, G.; Wang, C.; Way, J.; Pipelines, W. WDL Analysis Research Pipelines: Cloud-Optimized Workflows for Biological Data Processing and Reproducible Analysis. Preprints 2024, 2024012131. https://doi.org/10.20944/preprints202401.2131.v1

Expand Down
Loading

0 comments on commit 07ce4e7

Please sign in to comment.