Skip to content

Extract data from some samples cells from BAM file

Buys de Barbanson edited this page Jun 15, 2020 · 3 revisions

The following examples assume sample names are encoded in the SM: tag present in each bam/sam record.

Extract using a file containing the sample names to extract

In this example we will extract reads from two selected cells from input.bam to cell314and315.bam.

First prepare a file containing the names of the samples/cells you wish to extract from the input bam file:

my_experiment_cell_314
my_experiment_cell_315

Store this list of sample names in a text file, for example in extract_test.txt

Then run bamExtractSamples.py input.bam extract_test.txt -o cell314and315.bam This will extract reads with samples my_experiment_cell_314 and my_experiment_cell_315 from the bam file input.bam and write them to cell314and315.bam

Extract cells from multiple groups in into multiple bam files

To extract multiple groups of cells in one go add a second column to the extraction text file. In this example we extract cells 20 to 25 to one bam file (case sample), and 26 to 30 to another (control sample).

extract_test.txt looks like this:

cell_20 CASE
cell_21 CASE
cell_22 CASE
cell_23 CASE
cell_24 CASE
cell_25 CASE
cell_26 CONTROL
cell_27 CONTROL
cell_28 CONTROL
cell_29 CONTROL
cell_30 CONTROL

We then run bamExtractSamples.py input.bam extract_test.txt -o output_.bam

This will read reads from input.bam and write the reads from the samples listed in extract_test.txt to two bam files: output_CASE.bam and output_CONTROL.bam. If required you can extract more than two groups at once by specifying more group names in the second column.


Extract ALL samples from a bam file into single sample bam files:

bamSplitByTag.py input.bam SM -o single_


Using bamFilter.py

Extracting a subset of reads can be done using bamFilter.py.

The following line extracts cells with sample names (SM) my_experiment_cell_315 and my_experiment_cell_314, and writes it to cell_315_and_314.bam .

bamFilter.py input.bam "r.get_tag('SM') in ['my_experiment_cell_315','my_experiment_cell_314']" -o cell_315_and_314.bam