Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Customize battenberg_wgs.R wrapper #18

Draft
wants to merge 13 commits into
base: main
Choose a base branch
from
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,14 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

---
## Unreleased
### Added
- Add custom `battenberg_wgs.R`

### Removed
- Remove `modify_reference_path.sh`

## [2.2.9] - 2023-06-27
## [2.2.9] - 2023-06-27 [YANKED]
### Added
- Add `modify_reference_path.sh`
- Add GRCh37 and GRCh38 resource paths to `README.md`
Expand Down
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,14 @@
"gridExtra","doParallel","foreach", "splines", "VariantAnnotation", "copynumber"))'

# Install devtools, ASCAT & Battenberg
FROM r-base:latest

Check warning on line 33 in Dockerfile

View check run for this annotation

Wiz Inc. (8da00b022c) / Wiz IaC Scanner

Image Version Using 'latest'

Rule ID: 44bf85f5-c7c2-4e9f-813e-cdaae7810057 Severity: Medium Resource: FROM={{r-base:latest}} When building images, always tag them with useful tags which codify version information, intended destination (prod or test, for instance), stability, or other information that is useful when deploying the application in different environments. Do not rely on the automatically-created latest tag
Raw output
Expected: FROM r-base:latest:'version' where version should not be 'latest'
Found: FROM r-base:latest'

Check notice on line 33 in Dockerfile

View check run for this annotation

Wiz Inc. (8da00b022c) / Wiz IaC Scanner

Healthcheck Instruction Missing

Rule ID: b0f1f03a-461a-4b7b-8daf-a61ca12d86da Severity: Low Resource: FROM={{r-base:latest}} Ensure that HEALTHCHECK is being used. The HEALTHCHECK instruction tells Docker how to test a container to check that it is still working
Raw output
Expected: Dockerfile should contain instruction 'HEALTHCHECK'
Found: Dockerfile doesn't contain instruction 'HEALTHCHECK'
RUN R -q -e 'install.packages("devtools", dependencies = TRUE)' && \
R -q -e 'devtools::install_github("Crick-CancerGenomics/ascat/ASCAT@v3.1.2")' && \
R -q -e 'devtools::install_github("Wedge-Oxford/battenberg@v2.2.9")'

# Modify paths to reference files
COPY modify_reference_path.sh /usr/local/bin/modify_reference_path.sh
RUN chmod +x /usr/local/bin/modify_reference_path.sh && \
bash /usr/local/bin/modify_reference_path.sh /usr/local/lib/R/site-library/Battenberg/example/battenberg_wgs.R /usr/local/bin/battenberg_wgs.R
# Add custom Battenberg R wrapper
COPY battenberg_wgs.R /usr/local/bin/battenberg_wgs.R
RUN chmod +x /usr/local/bin/battenberg_wgs.R

RUN ln -sf /usr/local/lib/R/site-library/Battenberg/example/filter_sv_brass.R /usr/local/bin/filter_sv_brass.R && \
ln -sf /usr/local/lib/R/site-library/Battenberg/example/battenberg_cleanup.sh /usr/local/bin/battenberg_cleanup.sh
Expand Down
121 changes: 121 additions & 0 deletions battenberg_wgs.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
###############################################################################
# A pure R Battenberg v2.2.9 WGS pipeline implementation.
###############################################################################
library(Battenberg);
library(optparse);

option.list <- list(
make_option(c('-t', '--tumourname'), type = 'character', default = NULL, help = 'Samplename of the tumour', metavar = 'character'),
make_option(c('-n', '--normalname'), type = 'character', default = NULL, help = 'Samplename of the normal', metavar = 'character'),
make_option(c('--tb'), type = 'character', default = NULL, help = 'Tumour BAM file', metavar = 'character'),
make_option(c('--nb'), type = 'character', default = NULL, help = 'Normal BAM file', metavar = 'character'),
make_option(c('--sex'), type = 'character', default = NULL, help = 'Sex of the sample', metavar = 'character'),
make_option(c('-o', '--output'), type = 'character', default = NULL, help = 'Directory where output will be written', metavar = 'character'),
make_option(c('--skip_allelecount'), type = 'logical', default = FALSE, action = 'store_true', help = 'Provide when alleles don\'t have to be counted. This expects allelecount files on disk', metavar = 'character'),
make_option(c('--skip_preprocessing'), type = 'logical', default = FALSE, action = 'store_true', help = 'Provide when pre-processing has previously completed. This expects the files on disk', metavar = 'character'),
make_option(c('--skip_phasing'), type = 'logical', default = FALSE, action = 'store_true', help = 'Provide when phasing has previously completed. This expects the files on disk', metavar = 'character'),
make_option(c('--cpu'), type = 'numeric', default = 8, help = 'The number of CPU cores to be used by the pipeline (Default: 8)', metavar = 'character'),
make_option(c('--bp'), type = 'character', default = NULL, help = 'Optional two column file (chromosome and position) specifying prior breakpoints to be used during segmentation', metavar = 'character'),
make_option(c('--min_ploidy'), type = 'double', default = 1.6, help = 'The minimum ploidy to consider', metavar = 'character'),
make_option(c('--max_ploidy'), type = 'double', default = 4.8, help = 'The maximum ploidy to consider', metavar = 'character'),
make_option(c('--min_rho'), type = 'double', default = 0.1, help = 'The minimum cellularity to consider', metavar = 'character'),
make_option(c('--platform_gamma'), type = 'numeric', default = 1, help = 'Platform specific gamma value (0.55 for SNP6, 1 for NGS)', metavar = 'character'),
make_option(c('--phasing_gamma'), type = 'numeric', default = 1, help = 'Gamma parameter used when correcting phasing mistakes (Default: 1)', metavar = 'character'),
make_option(c('--segmentation_gamma'), type = 'numeric', default = 10, help = 'The gamma parameter controls the size of the penalty of starting a new segment during segmentation. It is therefore the key parameter for controlling the number of segments (Default: 10)', metavar = 'character'),
make_option(c('--segmentation_kmin'), type = 'numeric', default = 3, help = 'Kmin represents the minimum number of probes/SNPs that a segment should consist of (Default: 3)', metavar = 'character'),
make_option(c('--phasing_kmin'), type = 'numeric', default = 1, help = 'Kmin used when correcting for phasing mistakes (Default: 3)', metavar = 'character'),
make_option(c('--clonality_dist_metric'), type = 'numeric', default = 0, help = 'Distance metric to use when choosing purity/ploidy combinations (Default: 0)', metavar = 'character'),
make_option(c('--ascat_dist_metric'), type = 'numeric', default = 1, help = 'Distance metric to use when choosing purity/ploidy combinations (Default: 1)', metavar = 'character'),
make_option(c('--min_goodness_of_fit'), type = 'double', default = 0.63, help = 'Minimum goodness of fit required for a purity/ploidy combination to be accepted as a solution (Default: 0.63)', metavar = 'character'),
make_option(c('--balanced_threshold'), type = 'double', default = 0.51, help = 'The threshold beyond which BAF becomes uninformative (Default: 0.51)', metavar = 'character'),
make_option(c('--min_normal_depth'), type = 'numeric', default = 10, help = 'Minimum depth required in the matched normal for a SNP to be considered as part of the wgs analysis (Default: 10)', metavar = 'character'),
make_option(c('--min_base_qual'), type = 'numeric', default = 20, help = 'Minimum base quality required for a read to be counted when allele counting (Default: 20)', metavar = 'character'),
make_option(c('--min_map_qual'), type = 'numeric', default = 35, help = 'Minimum mapping quality required for a read to be counted when allele counting (Default: 35)', metavar = 'character'),
make_option(c('--calc_seg_baf_option'), type = 'numeric', default = 3, help = 'Sets way to calculate BAF per segment: 1=mean, 2=median, 3=ifelse median==0 | 1, mean, median (Default: 3)', metavar = 'character'),
make_option(c('--data_type'), type = 'character', default = 'wgs', help = 'String that contains either wgs or snp6 depending on the supplied input data (Default: wgs)', metavar = 'character')
);

opt.parser <- OptionParser(option_list = option.list);
opt <- parse_args(opt.parser);

TUMOURNAME <- opt$tumourname;
NORMALNAME <- opt$normalname;
NORMALBAM <- opt$nb;
TUMOURBAM <- opt$tb;
IS.MALE <- opt$sex == 'male' | opt$sex == 'Male';
RUN.DIR <- opt$output;
SKIP.ALLELECOUNTING <- opt$skip_allelecount;
SKIP.PREPROCESSING <- opt$skip_preprocessing;
SKIP.PHASING <- opt$skip_phasing;
NTHREADS <- opt$cpu;
PRIOR.BREAKPOINTS.FILE <- opt$bp;
MIN.PLOIDY <- opt$min_ploidy;
MAX.PLOIDY <- opt$max_ploidy;
MIN.RHO <- opt$min_rho;
PLATFORM.GAMMA <- opt$platform_gamma;
PHASING.GAMMA <- opt$phasing_gamma;
SEGMENTATION.GAMMA <- opt$segmentation_gamma;
SEGMENTATION.KMIN <- opt$segmentation_kmin;
PHASING.KMIN <- opt$phasing_kmin;
CLONALITY.DIST.METRIC <- opt$clonality_dist_metric;
ASCAT.DIST.METRIC <- opt$ascat_dist_metric;
MIN.GOODNESS.OF.FIT <- opt$min_goodness_of_fit;
BALANCED.THRESHOLD <- opt$balanced_threshold;
MIN.NORMAL.DEPTH <- opt$min_normal_depth;
MIN.BASE.QUAL <- opt$min_base_qual;
MIN.MAP.QUAL <- opt$min_map_qual;
CALC.SEG.BAF.OPTION <- opt$calc_seg_baf_option;
DATA.TYPE <- opt$data_type;

# General static
IMPUTEINFOFILE <- '/opt/battenberg_reference/impute_info.txt';
G1000PREFIX <- '/opt/battenberg_reference/1000_genomes_loci/1000_genomes_allele_index_chr';
G1000PREFIX.AC <- '/opt/battenberg_reference/1000_genomes_loci/1000_genomes_loci_chr';
GCCORRECTPREFIX <- '/opt/battenberg_reference/1000_genomes_gcContent/1000_genomes_GC_corr_chr';
REPLICCORRECTPREFIX <- '/opt/battenberg_reference/battenberg_wgs_replication_timing_correction_1000_genomes/1000_genomes_replication_timing_chr';
IMPUTE.EXE <- 'impute2';

# WGS specific static
ALLELECOUNTER <- 'alleleCounter';
PROBLEMLOCI <- '/opt/battenberg_reference/battenberg_problem_loci/probloci.txt.gz';

# Change to work directory and load the chromosome information
setwd(RUN.DIR);

battenberg(
tumourname = TUMOURNAME,
normalname = NORMALNAME,
tumour_data_file = TUMOURBAM,
normal_data_file = NORMALBAM,
ismale = IS.MALE,
imputeinfofile = IMPUTEINFOFILE,
g1000prefix = G1000PREFIX,
g1000allelesprefix = G1000PREFIX.AC,
gccorrectprefix = GCCORRECTPREFIX,
repliccorrectprefix = REPLICCORRECTPREFIX,
problemloci = PROBLEMLOCI,
data_type = DATA.TYPE,
impute_exe = IMPUTE.EXE,
allelecounter_exe = ALLELECOUNTER,
nthreads = NTHREADS,
platform_gamma = PLATFORM.GAMMA,
phasing_gamma = PHASING.GAMMA,
segmentation_gamma = SEGMENTATION.GAMMA,
segmentation_kmin = SEGMENTATION.KMIN,
phasing_kmin = PHASING.KMIN,
clonality_dist_metric = CLONALITY.DIST.METRIC,
ascat_dist_metric = ASCAT.DIST.METRIC,
min_ploidy = MIN.PLOIDY,
max_ploidy = MAX.PLOIDY,
min_rho = MIN.RHO,
min_goodness = MIN.GOODNESS.OF.FIT,
uninformative_BAF_threshold = BALANCED.THRESHOLD,
min_normal_depth = MIN.NORMAL.DEPTH,
min_base_qual = MIN.BASE.QUAL,
min_map_qual = MIN.MAP.QUAL,
calc_seg_baf_option = CALC.SEG.BAF.OPTION,
skip_allele_counting = SKIP.ALLELECOUNTING,
skip_preprocessing = SKIP.PREPROCESSING,
skip_phasing = SKIP.PHASING,
prior_breakpoints_file = PRIOR.BREAKPOINTS.FILE
);
8 changes: 4 additions & 4 deletions metadata.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: 'Docker repository for Wedge-lab/battenberg'
maintainers: ['mmootor@mednet.ucla.edu']
languages: ['Dockerfile']
tools: ['battenberg']
version: ['2.2.9'] # Tool version number
purpose: 'Whole Genome Sequencing subclonal copy number caller' # Description of what this tool does
references: 'https://github.com/Wedge-lab/battenberg' # is the tool/dependencies published, is there a confluence page
image_name: 'battenberg' # name of the new docker image
version: ['2.2.9']
purpose: 'Whole Genome Sequencing subclonal copy number caller'
references: 'https://github.com/Wedge-lab/battenberg'
image_name: 'battenberg'
12 changes: 0 additions & 12 deletions modify_reference_path.sh

This file was deleted.