-
Notifications
You must be signed in to change notification settings - Fork 20
Generating brass filter file
Keiran Raine edited this page Jun 28, 2021
·
13 revisions
The brass.pl
script has the following option:
-filter -f bgzip tabix-ed normal panel groups file
This can be an empty bgzip and tabix-ed file (actually needs a single bedpe record, e.g. 1 0 1) if you don't want to filter.
The process to generate a real filtering normal panel is as follows.
$ brassI_np_in.pl
USAGE: <OUT_DIR> <FILE_INDEX> <FILE_1> <FILE 2>...
The FILE_INDEX
element is intended for use in a farm environment to allow the same base command to be used for all generations
$ bsub -oo brass_np.log -q normal -J 'np[1-20]' -P analysis-cgp -n 1 -R'select[mem>=6000] span[hosts=1] rusage[mem=6000]' -M 6000 'brassI_np_in.pl $BRASS_FILTER_OUT $LSB_JOBINDEX sample1.bam sample2.bam ... sample20.bam
# expecting biobambam2 to be installed
$ bammerge I=sample1.brm.bam I=sample2.brm.bam ... I=sample20.brm.bam > merged_normals.bam
$ brass-group merged_normals.bam -o brass_np.groups
Depending on the number of samples merged this can require lots of memory.
$ ( grep '^#' ../brass_np.groups;\
cat ../brass_np.groups | perl -ane 'next if ($_=~/^#/); printf "%s%s%s%s\t%s\n", $F[0],$F[1],$F[4],$F[5],join("\t",@F[1..$#F]);' | sort -k1,1 -k3,3n -k4,4n -k7,7n -k8,8n ) > brass_np.srt.groups
$ bgzip -c brass_np.srt.groups > brass_np.groups.gz
$ tabix -s 1 -b 3 -e 4 -0 brass_np.groups.gz
$ (grep '^#' brass_np.groups;\
grep -v '^#' brass_np.groups | sort -k1,1 -k3,3n -k4,4n -k5,5 -k7,7n -k8,8n) > brass_np.srt.groups
$ bgzip -c brass_np.srt.groups > brass_np.groups.gz
$ tabix -s 1 -b 3 -e 4 -0 brass_np.groups.gz
( zgrep '^#' brass_np.groups.gz;\
zcat brass_np.groups.gz | perl -ane 'next if ($_=~/^#/); printf "%s%s%s%s\t%s\n", $F[0],$F[1],$F[4],$F[5],join("\t",@F[1..$#F]);' | sort -k1,1 -k3,3n -k4,4n -k7,7n -k8,8n ) > brass_np.srt.groups
$ bgzip -c brass_np.srt.groups > brass_np.srt.groups.gz
$ tabix -s 1 -b 3 -e 4 -0 brass_np.srt.groups.gz
Column | Description |
---|---|
1 | Composite of Low chr, Low strand, High chr, High strand. Low chr is not retained as a dedicated col |
2 | Low strand |
3 | Low start |
4 | Low end |
5 | High chr |
6 | High strand |
7 | High start |
8 | High end |
9 -> 9+(N-1) | abberant read pair count from sample (as ordered by header) |
9+N -> END | name of abberant pairs by sample (as ordered by header) |