forked from vadimzalunin/crammer
-
Notifications
You must be signed in to change notification settings - Fork 1
Reference-based compression of SRA data
dozy/crammer
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
CRAMTools is a set of Java tools and APIs for efficient compression of sequence read data. Although this is intended as a stable version the code is released as early access. Parts of the CRAMTools are experimental and may not be supported in the future. http://www.ebi.ac.uk/ena/about/cram_toolkit Version 0.7 New features: Arbitrary tag support PG an CO header records 0x200 bit flag (not passing quality control) support Input files: Reference sequence in fasta format <fasta file> Reference sequence index file <fasta file>.fai created using samtools (samtools faidx <fasta file>) Input BAM file <BAM file> sorted by reference coordinates BAM index file <BAM file>.bai created using samtools (samtools index <BAM file>) Download and run the program: Download the prebuilt runnable jar file from: https://github.com/vadimzalunin/crammer/blob/master/cramtools.jar?raw=true Execute the command line program: java -jar cramtools.jar Usage is printed if no arguments were given To convert a BAM file to CRAM: java -jar cramtools.jar cram --input-bam-file <bam file> --reference-fasta-file <reference fasta file> [--output-cram-file <output cram file>] To convert a CRAM file to BAM: java -jar cramtools.jar bam --input-cram-file <input cram file> --reference-fasta-file <reference fasta file> --output-bam-file <output bam file> To build the program from source: To check out the source code from github you will need git client: http://git-scm.com/ Make sure you have java 1.6 or higher: http://openjdk.java.net/ or http://www.oracle.com/us/technologies/java/index.html Make sure you have ant version 1.7 or higher: http://ant.apache.org/ git clone git://github.com/vadimzalunin/crammer.git ant -f build/build.xml runnable java -jar cramtools.jar To run unit tests: ant -f build/build.xml test Picard integraion Some tools using Picard API should be able to read/write CRAM archives. For example: java -cp cramtools.jar net.sf.picard.sam.ValidateSamFile INPUT=data.cram However the following will not work: java -cp cramtools.jar -jar ValidateSamFile.jar INPUT=data.cram Reference sequence discovery For tools that use Picard API the following rules describe how the reference sequence file is discovered: 1. Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory. 2. Given an input file '<some name>.cram' search for a '<some name>.fa' file in the same directory, which should contain a full path to the reference file. 3. Use java property 'reference=<path to ref file>', usage: java -Dreference=<path to ref file> -cp cramtools.jar ... There is a modification of ViewSam tool which allows to query alignment slices, for example: java -Dreference=human_g1k_v37.fasta -cp cramtools.jar net.sf.picard.sam.ViewSam2 INPUT=data.cram chr1:1-50000 java -cp cramtools.jar net.sf.picard.sam.ViewSam2 INPUT=data.bam chr10:10000-100000 Make sure that the corresponding index file exists: for 'data.cram' file the index file name is 'data.cram.crai'. Otherwise the tool will run slow. ViewSam2 tool may disappear (or will be replaced) in the future releases. Check for more on our web site: http://www.ebi.ac.uk/ena/about/cram_toolkit
About
Reference-based compression of SRA data
Resources
Stars
Watchers
Forks
Packages 0
No packages published