Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

--outSAMattrRGline is interpreted incorrectly #725

Open
byb121 opened this issue Aug 28, 2019 · 1 comment
Open

--outSAMattrRGline is interpreted incorrectly #725

byb121 opened this issue Aug 28, 2019 · 1 comment

Comments

@byb121
Copy link

byb121 commented Aug 28, 2019

When using star (version: 2.7.2a) with the command below:

STAR \
--runMode alignReads \
--runThreadN 8 \
--genomeDir star-ref-2.7.2a \
--outSAMattrRGline "ID:2977244" "LB:EM-2_rnas\"|||eq 1\!100\$5739_52200" \
--twopassMode None \
--readFilesIn test_bad_header.1_1.fastq.gz test_bad_header.1_2.fastq.gz \
--outFileNamePrefix /home/ubuntu/tmpStar/ \
--readFilesCommand zcat \
--outSAMtype BAM Unsorted

RG line in the output BAM has header:

$ samtools view -H Aligned.out.bam | grep '@RG'
@RG     ID:2977244      LB:EM-2_rnas    |||eq   1\!100$5739_52200"

but we expect an RG line like this:

@RG     ID:2977244      LB:EM-2_rnas"|||eq 1!100$5739_52200

I believe the "LB" tag string in the command has been escaped correctly:

$ echo "LB:EM-2_rnas\"|||eq 1\!100\$5739_52200"
LB:EM-2_rnas"|||eq 1\!100$5739_52200

I know it's a weird tag value, but every char in the string is acceptable according to SAM/BAM format definitions.

@alexdobin
Copy link
Owner

Hi Yaobo,

the problem is with having both the " and space in one field, it somehow interferes with the standard c++ text stream input. At the moment, you would need to use samtools reheader to fix the resulting BAM file.

Cheers
Alex

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants