Skip to content

Generating brass filter file

Keiran Raine edited this page Jun 28, 2021 · 13 revisions

The brass.pl script has the following option:

-filter    -f   bgzip tabix-ed normal panel groups file

This can be an empty bgzip and tabix-ed file (actually needs a single bedpe record, e.g. 1 0 1) if you don't want to filter.

The process to generate a real filtering normal panel is as follows.

1. Generate merged aberrant pair BAM from multiple normal BAM files.

$ brassI_np_in.pl
USAGE: <OUT_DIR> <FILE_INDEX> <FILE_1> <FILE 2>...

The FILE_INDEX element is intended for use in a farm environment to allow the same base command to be used for all generations

$ bsub -oo brass_np.log -q normal -J 'np[1-20]' -P analysis-cgp -n 1 -R'select[mem>=6000] span[hosts=1] rusage[mem=6000]' -M 6000 'brassI_np_in.pl $BRASS_FILTER_OUT $LSB_JOBINDEX sample1.bam sample2.bam ... sample20.bam

2. Merge the resulting files found in $BRASS_FILTER_OUT:

# expecting biobambam2 to be installed
$ bammerge I=sample1.brm.bam I=sample2.brm.bam ... I=sample20.brm.bam > merged_normals.bam

3. Run the brass-grouping step

$ brass-group merged_normals.bam -o brass_np.groups

Depending on the number of samples merged this can require lots of memory.

4. Convert to an indexable format

a. Post version v6.0.0

$ ( grep '^#' ../brass_np.groups;\
cat ../brass_np.groups | perl -ane 'next if ($_=~/^#/); printf "%s%s%s%s\t%s\n", $F[0],$F[1],$F[4],$F[5],join("\t",@F[1..$#F]);' | sort -k1,1 -k3,3n -k4,4n -k7,7n -k8,8n ) > brass_np.srt.groups
$ bgzip -c brass_np.srt.groups > brass_np.groups.gz
$ tabix -s 1 -b 3 -e 4 -0 brass_np.groups.gz

b. Pre version v5.4.1

$ (grep '^#' brass_np.groups;\
 grep -v '^#' brass_np.groups | sort -k1,1 -k3,3n -k4,4n -k5,5 -k7,7n -k8,8n) > brass_np.srt.groups
$ bgzip -c brass_np.srt.groups > brass_np.groups.gz
$ tabix -s 1 -b 3 -e 4 -0 brass_np.groups.gz

c. Convert pre v5.4.1 brass_np.groups.gz to v6.0.0 format

( zgrep '^#' brass_np.groups.gz;\
zcat brass_np.groups.gz | perl -ane 'next if ($_=~/^#/); printf "%s%s%s%s\t%s\n", $F[0],$F[1],$F[4],$F[5],join("\t",@F[1..$#F]);' | sort -k1,1 -k3,3n -k4,4n -k7,7n -k8,8n ) > brass_np.srt.groups
$ bgzip -c brass_np.srt.groups > brass_np.srt.groups.gz
$ tabix -s 1 -b 3 -e 4 -0 brass_np.srt.groups.gz

Format of v6.0.0+ filter file

Column Description
1 Composite of Low chr, Low strand, High chr, High strand. Low chr is not retained as a dedicated col
2 Low strand
3 Low start
4 Low end
5 High chr
6 High strand
7 High start
8 High end
9 -> 9+(N-1) abberant read pair count from sample (as ordered by header)
9+N -> END name of abberant pairs by sample (as ordered by header)