Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/mm flag modifier #36

Merged
15 commits merged into from
Mar 7, 2019
Merged
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ addons:
- lsof
- libbz2-dev
- liblzma-dev
- libgd-dev
- libdb-dev

install: true

Expand Down
10 changes: 10 additions & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,15 @@
# CHANGES

## feature/mmFlagModifier

* Removed duplicate flag from `BAD_FLAGS` variable in mismatchQC script
* Added `mmFlagModifier` script to remove/reinstate QC_fail flag where `mm:A:Y` tag is found.

## 4.2.7

* scripts generated by Threaded.pm now always sync'ed
* remove need to chmod, reduce issues under docker, less file ops

## 4.2.6

* New bam file added with mini reference for cram formatting test.
Expand Down
30 changes: 13 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,12 @@ and subsequent calling algorithms.
The intention is to provide reference implementations and simple to execute wrappers
that are useful for the scientific community who may have little IT/bioinformatic support.

<!-- TOC depthFrom:2 depthTo:6 withLinks:1 updateOnSave:1 orderedList:0 -->

- [General usage](#general-usage)
- [Docker, Singularity and Dockstore](#docker-singularity-and-dockstore)
- [Dependencies/Install](#dependenciesinstall)
- [Creating a release](#creating-a-release)
- [Preparation](#preparation)
- [Cutting the release](#cutting-the-release)

<!-- /TOC -->
* [General usage](#general-usage)
* [Docker, Singularity and Dockstore](#docker-singularity-and-dockstore)
* [Dependencies/Install](#dependenciesinstall)
* [Creating a release](#creating-a-release)
* [Preparation](#preparation)
* [Cutting the release](#cutting-the-release)

## General usage

Expand Down Expand Up @@ -73,13 +69,13 @@ Please see the respective licence for each before use.
### Cutting the release

1. Update `lib/PCAP.pm` to the correct version.
1. Ensure upgrade path for new version number is added to `lib/PCAP.pm`.
1. Update `CHANGES.md` to show major items.
1. Run `./prerelease.sh`
1. Check all tests and coverage reports are acceptable.
1. Commit the updated docs tree and updated module/version.
1. Push commits.
1. Use the GitHub tools to draft a release.
2. Ensure upgrade path for new version number is added to `lib/PCAP.pm`.
3. Update `CHANGES.md` to show major items.
4. Run `./prerelease.sh`
5. Check all tests and coverage reports are acceptable.
6. Commit the updated docs tree and updated module/version.
7. Push commits.
8. Use the GitHub tools to draft a release.

<!-- References -->

Expand Down
7 changes: 5 additions & 2 deletions bin/bwa_mem.pl
Original file line number Diff line number Diff line change
Expand Up @@ -365,11 +365,14 @@ =head2 OPTIONAL parameters
WARNING:
bwa_mem.pl will exclude all QCFAIL reads from mapping. If a BAM/CRAM file has been created using
this option please ensure that you pre-process the file to remove the flag 512 if you intend to
reprocess based on that output.
reprocess based on that output.

The script mmFlagModifier -m (--remove) can process a bam file to remove any occurences of
flag 512 where the read also has the tag mm:A:Y .

e.g.

bammaskflags maskneg=512 auxexists=mm < mmqc.bam > cleaned.bam
mmFlagModifier -m -i mmqc.bam > cleaned.bam

=item B<-mmqcfrac>

Expand Down
11 changes: 8 additions & 3 deletions c/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ BAM_STATS_TARGET=../bin/bam_stats
SQ_TARGET=../bin/reheadSQ
BAM_DIFF=../bin/diff_bams
MISMATCHQC=../bin/mismatchQc
MMMODIFIER=../bin/mmFlagModifier

#
# The following part of the makefile is generic; it can be used to
Expand All @@ -87,7 +88,7 @@ MISMATCHQC=../bin/mismatchQc

.NOTPARALLEL: test

all: clean pre make_htslib_tmp $(BAM_STATS_TARGET) $(BAM2BG_TARGET) $(BAM2BW_TARGET) $(BAM_DIFF) $(MISMATCHQC) test remove_htslib_tmp $(CAT_TARGET) $(SQ_TARGET)
all: clean pre make_htslib_tmp $(BAM_STATS_TARGET) $(BAM2BG_TARGET) $(BAM2BW_TARGET) $(BAM_DIFF) $(MISMATCHQC) $(MMMODIFIER) test remove_htslib_tmp $(CAT_TARGET) $(SQ_TARGET)
@echo bam_stats and reheadSQ compiled.

$(BAM_STATS_TARGET): $(OBJS)
Expand All @@ -102,6 +103,10 @@ $(BAM_DIFF):
$(MISMATCHQC):
$(CC) $(CFLAGS) $(INCLUDES) $(CAT_INCLUDES) -o $(MISMATCHQC) $(OBJS) $(LFLAGS) $(CAT_LFLAGS) $(LIBS) ./mismatchQc.c

$(MMMODIFIER):
$(CC) $(CFLAGS) $(INCLUDES) $(CAT_INCLUDES) -o $(MMMODIFIER) $(OBJS) $(LFLAGS) $(CAT_LFLAGS) $(LIBS) ./mmFlagModifier.c


#Unit Tests
test: $(BAM_STATS_TARGET)
test: CFLAGS += $(INCLUDES) $(CAT_INCLUDES) $(OBJS) $(LFLAGS) $(CAT_LFLAGS) $(LIBS)
Expand All @@ -123,7 +128,7 @@ remove_htslib_tmp:

copyscript:
cp ./scripts/* ./bin/
chmod a+x $(MISMATCHQC) $(BAM_STATS_TARGET) $(CAT_TARGET) $(SQ_TARGET) $(BAM2BW_TARGET) $(BAM2BG_TARGET) $(BAM_DIFF)
chmod a+x $(MISMATCHQC) $(MMMODIFIER) $(BAM_STATS_TARGET) $(CAT_TARGET) $(SQ_TARGET) $(BAM2BW_TARGET) $(BAM2BG_TARGET) $(BAM_DIFF)

valgrind:
VALGRIND="valgrind --log-file=/tmp/valgrind-%p.log" $(MAKE)
Expand All @@ -138,7 +143,7 @@ valgrind:

clean:
@echo clean
$(RM) ./*.o *~ $(BAM_STATS_TARGET) $(MISMATCHQC) $(SQ_TARGET) $(BAM_DIFF) ./tests/tests_log $(TESTS) ./*.gcda ./*.gcov ./*.gcno *.gcda *.gcov *.gcno ./tests/*.gcda ./tests/*.gcov ./tests/*.gcno
$(RM) ./*.o *~ $(BAM_STATS_TARGET) $(MISMATCHQC) $(MMMODIFIER) $(SQ_TARGET) $(BAM_DIFF) ./tests/tests_log $(TESTS) ./*.gcda ./*.gcov ./*.gcno *.gcda *.gcov *.gcno ./tests/*.gcda ./tests/*.gcov ./tests/*.gcno
-rm -rf $(HTSTMP)

depend: $(SRCS)
Expand Down
107 changes: 107 additions & 0 deletions c/c_tests/test_05_mmFlagModifier.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
#!/bin/bash

##########LICENCE##########
# PCAP - NGS reference implementations and helper code for the ICGC/TCGA Pan-Cancer Analysis Project
# Copyright (C) 2014-2019 Genome Research Limited
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not see:
# http://www.gnu.org/licenses/gpl-2.0.html
##########LICENCE##########

compare_sam () {
sam1 = $1;
sam2 = $2;
#Cut out the header and sort from first sam
grep -e '^@' $1 | perl -aple '@F=sort @F;$_=join qq{\t},@F;' > $1.tmphead
#Cut out the header and sort from second sam
grep -e '^@' $2 | perl -aple '@F=sort @F;$_=join qq{\t},@F;' > $2.tmphead
diff $1.tmphead $2.tmphead
if [ "$?" != "0" ];
then
echo "ERROR in "$0": Comparing headers of sam files $1 and $2."
return 1
fi
rm $1.tmphead $2.tmphead
grep -ve '^@' $1 > $1.tmpreads
grep -ve '^@' $2 > $2.tmpreads
diff $1.tmpreads $2.tmpreads
if [ "$?" != "0" ];
then
echo "ERROR in "$0": Comparing reads of sam files $1 and $2."
return 1
fi
rm $1.tmpreads $2.tmpreads
return 0
}

#Ensure valid format produced
../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -m | bamvalidate
if [ "$?" != "0" ];
then
echo "ERROR running ../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -m . Invalid output"
exit 1;
fi

../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -p | bamvalidate
if [ "$?" != "0" ];
then
echo "ERROR running ../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -p. Invalid output"
exit 1;
fi

../bin/mmFlagModifier -i ../t/data/mismatch_test.cram -C -m | bamvalidate inputformat=cram
if [ "$?" != "0" ];
then
echo "ERROR running ../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -C. Invalid output cram compression"
exit 1;
fi

../bin/mmFlagModifier -i ../t/data/mismatch_test.cram -C -p | bamvalidate inputformat=cram
if [ "$?" != "0" ];
then
echo "ERROR running ../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -C. Invalid output cram compression"
exit 1;
fi

../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -m | bamcollate2 inputformat=bam outputformat=sam collate=0 resetaux=0 > ../t/data/mmFlagModifier_test_out.sam;
if [ "$?" != "0" ];
then
echo "ERROR running ../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -m | bamcollate2 inputformat=bam outputformat=sam collate=0 resetaux=0 > ../t/data/mmFlagModifier_test_out.sam"
exit 1;
fi

if [ compare_sam '../t/data/mmFlagModifier_test_out.sam' '../t/data/mmFlagModifier_m_expected_out.sam' != "0" ];
then
echo "ERROR in "$0": Comparing mmFlagModifier to expected result failed."
echo "------"
rm ../t/data/mmFlagModifier_test_out.sam
exit 1
fi

../bin/mmFlagModifier -i ../t/data/mmFlagModifier_p_input.bam -p | bamcollate2 inputformat=bam outputformat=sam collate=0 resetaux=0 > ../t/data/mmFlagModifier_test_out.sam;
if [ "$?" != "0" ];
then
echo "ERROR running ../bin/mmFlagModifier -i ../t/data/mismatch_test.bam -p | bamcollate2 inputformat=bam outputformat=sam collate=0 resetaux=0 > ../t/data/mmFlagModifier_test_out.sam"
exit 1;
fi

if [ compare_sam '../t/data/mmFlagModifier_test_out.sam' '../t/data/mismatch_expected_out.sam' != "0" ];
then
echo "ERROR in "$0": Comparing mmFlagModifier to expected result failed."
echo "------"
rm ../t/data/mmFlagModifier_test_out.sam
exit 1
fi

rm ../t/data/mmFlagModifier_test_out.sam
2 changes: 1 addition & 1 deletion c/mismatchQc.c
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ long long int marked_count = 0;
read fails platform/vendor quality checks,
read is PCR or optical duplicate
*/
const int BAD_FLAGS = BAM_FUNMAP | BAM_FMUNMAP | BAM_FQCFAIL | BAM_FDUP | BAM_FSECONDARY | BAM_FSUPPLEMENTARY;
const int BAD_FLAGS = BAM_FUNMAP | BAM_FMUNMAP | BAM_FQCFAIL | BAM_FSECONDARY | BAM_FSUPPLEMENTARY;

enum rw_opts {
W_CRAM = 1,
Expand Down
Loading