Skip to content

Commit

Permalink
0.3.1 release
Browse files Browse the repository at this point in the history
  • Loading branch information
sigven committed Jul 5, 2018
1 parent 3442d44 commit c96d8ff
Show file tree
Hide file tree
Showing 10 changed files with 546 additions and 92 deletions.
40 changes: 23 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,28 @@

### Overview

The germline variant annotator (*gvanno*) is a simple, Docker-based software package intended for analysis and interpretation of human DNA variants of germline origin. It accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The software is largely based on [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html), and extends this with clinically relevant annotations retrieved flexibly through [vcfanno](https://github.com/brentp/vcfanno). The workflow produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record.
The germline variant annotator (*gvanno*) is a simple, Docker-based software package intended for analysis and interpretation of human DNA variants of germline origin. It accepts query files encoded in the VCF format, and can analyze both SNVs and short InDels. The workflow is largely based on [Ensembl’s Variant Effect Predictor (VEP)](http://www.ensembl.org/info/docs/tools/vep/index.html), and [vcfanno](https://github.com/brentp/vcfanno). It produces an annotated VCF file and a file of tab-separated values (.tsv), the latter listing all annotations pr. variant record.

#### Annotation resources included in _gvanno_ - 0.3.0
#### Annotation resources included in _gvanno_ - 0.3.1

* [VEP v92](http://www.ensembl.org/info/docs/tools/vep/index.html) - Variant Effect Predictor release 92 (GENCODE v19/v28 as the gene reference dataset)
* [dBNSFP v3.5](https://sites.google.com/site/jpopgen/dbNSFP) - Database of non-synonymous functional predictions (August 2017)
* [gnomAD r2](http://gnomad.broadinstitute.org/) - Germline variant frequencies exome-wide (October 2017)
* [dbSNP b150](http://www.ncbi.nlm.nih.gov/SNP/) - Database of short genetic variants (February 2017)
* [1000 Genomes Project - phase3](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) - Germline variant frequencies genome-wide (May 2013)
* [ClinVar](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (April 2018)
* [gnomAD r2](http://gnomad.broadinstitute.org/) - Germline variant frequencies exome-wide (February 2017) - from VEP
* [dbSNP b150](http://www.ncbi.nlm.nih.gov/SNP/) - Database of short genetic variants (February 2017) - from VEP
* [1000 Genomes Project - phase3](ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/) - Germline variant frequencies genome-wide (May 2013) - from VEP
* [ClinVar 20180603](http://www.ncbi.nlm.nih.gov/clinvar/) - Database of clinically related variants (June 2018)
* [DisGeNET](http://www.disgenet.org) - Database of gene-disease associations (v5.0, May 2017)
* [UniProt/SwissProt KnowledgeBase 2018_03](http://www.uniprot.org) - Resource on protein sequence and functional information (March 2018)
* [UniProt/SwissProt KnowledgeBase 2018_06](http://www.uniprot.org) - Resource on protein sequence and functional information (June 2018)
* [Pfam v31](http://pfam.xfam.org) - Database of protein families and domains (March 2017)
* [TSGene v2.0](http://bioinfo.mc.vanderbilt.edu/TSGene/) - Tumor suppressor/oncogene database (November 2015)

### News

* April 20th 2018 - 0.3.0 release

* July 5th 2018 - **0.3.1 release**
* Data bundle updates (ClinVar, UniProt)
* Addition of [VEP LofTee plugin](https://github.com/konradjk/loftee) - predicts loss-of-function variants
* April 20th 2018 - **0.3.0 release**
* Runs under Python3
* VEP version 92
* Support for grch38
Expand Down Expand Up @@ -47,15 +51,15 @@ An installation of Python (version _3.6_) is required to run *gvanno*. Check tha

#### STEP 2: Download *gvanno* and data bundle

1. Download and unpack the [latest software release (0.3.0)](https://github.com/sigven/gvanno/releases/tag/v0.3.0)
1. Download and unpack the [latest software release (0.3.1)](https://github.com/sigven/gvanno/releases/tag/v0.3.1)
2. Download and unpack the assembly-specific data bundle in the PCGR directory
* [grch37 data bundle](https://drive.google.com/open?id=1M4jUFLk5LwfgiWZOkKXNmQFPhl75Iy4-) (approx 9Gb)
* [grch38 data bundle](https://drive.google.com/file/d/1EfpUlaR8DRwFZjhJAJ8mkbbqlpENIlx5/) (approx 9Gb)
* [grch37 data bundle](https://drive.google.com/file/d/15NbYwwnb8J5IGhL6-RJXpAeQ-xqzjc5F/) (approx 9Gb)
* [grch38 data bundle](https://drive.google.com/file/d/1hr4MShsEh2Xf-_bBgDPi7t-vj32XrWJ0/) (approx 9Gb)
* *Unpacking*: `gzip -dc gvanno.databundle.grch37.YYYYMMDD.tgz | tar xvf -`

A _data/_ folder within the _gvanno-X.X_ software folder should now have been produced
3. Pull the [gvanno Docker image (0.3.0)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.7Gb):
* `docker pull sigven/gvanno:0.3.0` (gvanno annotation engine)
3. Pull the [gvanno Docker image (0.3.1)](https://hub.docker.com/r/sigven/gvanno/) from DockerHub (approx 2.5Gb):
* `docker pull sigven/gvanno:0.3.1` (gvanno annotation engine)

#### STEP 3: Input preprocessing

Expand Down Expand Up @@ -84,7 +88,7 @@ Run the workflow with **gvanno.py**, which takes the following arguments and opt

positional arguments:
gvanno_dir gvanno base directory with accompanying data
directory, e.g. ~/gvanno-0.2.0
directory, e.g. ~/gvanno-0.3.1
output_dir Output directory
{grch37,grch38} grch37 or grch38
configuration_file gvanno configuration file (TOML format)
Expand All @@ -101,10 +105,10 @@ Run the workflow with **gvanno.py**, which takes the following arguments and opt
--version show program's version number and exit


The _examples_ folder contain an example VCF file. It also contain *gvanno* configuration file. Analysis of the example VCF can be performed by the following command:
The _examples_ folder contains an example VCF file. It also contains a *gvanno* configuration file. Analysis of the example VCF can be performed by the following command:

`python ~/gvanno-0.3.0/gvanno.py --input_vcf ~/gvanno-0.3.0/examples/example.vcf.gz`
` ~/gvanno-0.3.0 ~/gvanno-0.3.0/examples grch37 ~/gvanno-0.3.0/examples/gvanno_config.toml example`
`python ~/gvanno-0.3.1/gvanno.py --input_vcf ~/gvanno-0.3.1/examples/example.vcf.gz`
` ~/gvanno-0.3.1 ~/gvanno-0.3.1/examples grch37 ~/gvanno-0.3.1/examples/gvanno_config.toml example`


This command will run the Docker-based *gvanno* workflow and produce the following output files in the _examples_ folder:
Expand All @@ -114,6 +118,8 @@ This command will run the Docker-based *gvanno* workflow and produce the followi

Similar files are produced for all variants, not only variants with a *PASS* designation.

Documentation of the various variant and gene annotations should be interrogated from the header of the annotated VCF file.



### Contact
Expand Down
16 changes: 8 additions & 8 deletions gvanno.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
import platform
import toml

version = '0.3.0'
version = '0.3.1'

def __main__():

Expand Down Expand Up @@ -192,7 +192,7 @@ def verify_input_files(input_vcf, configuration_file, gvanno_config_options, bas
f_rel_not = open(rel_notes_file,'r')
compliant_data_bundle = 0
for line in f_rel_not:
version_check = 'GVANNO_DB_VERSION = 20180416'
version_check = 'GVANNO_DB_VERSION = 20180629'
if version_check in line:
compliant_data_bundle = 1

Expand Down Expand Up @@ -294,10 +294,10 @@ def run_gvanno(host_directories, docker_image_version, config_options, sample_id
if not input_vcf_docker == 'None':

## Define input, output and temporary file names
output_vcf = '/workdir/output/' + str(sample_id) + '_gvanno.vcf.gz'
output_tsv = '/workdir/output/' + str(sample_id) + '_gvanno.tsv'
output_pass_vcf = '/workdir/output/' + str(sample_id) + '_gvanno_pass.vcf.gz'
output_pass_tsv = '/workdir/output/' + str(sample_id) + '_gvanno_pass.tsv'
output_vcf = '/workdir/output/' + str(sample_id) + '_gvanno_' + str(genome_assembly) + '.vcf.gz'
output_tsv = '/workdir/output/' + str(sample_id) + '_gvanno_' + str(genome_assembly) + '.tsv'
output_pass_vcf = '/workdir/output/' + str(sample_id) + '_gvanno_pass_' + str(genome_assembly) + '.vcf.gz'
output_pass_tsv = '/workdir/output/' + str(sample_id) + '_gvanno_pass_' + str(genome_assembly) + '.tsv'
input_vcf_gvanno_ready = '/workdir/output/' + re.sub(r'(\.vcf$|\.vcf\.gz$)','.gvanno_ready.vcf.gz',host_directories['input_vcf_basename_host'])
vep_vcf = re.sub(r'(\.vcf$|\.vcf\.gz$)','.gvanno_vep.vcf',input_vcf_gvanno_ready)
vep_vcfanno_vcf = re.sub(r'(\.vcf$|\.vcf\.gz$)','.gvanno_vep.vcfanno.vcf',input_vcf_gvanno_ready)
Expand All @@ -310,7 +310,7 @@ def run_gvanno(host_directories, docker_image_version, config_options, sample_id
if genome_assembly == 'grch38':
vep_assembly = 'GRCh38'
fasta_assembly = "/usr/local/share/vep/data/homo_sapiens/92_GRCh38/Homo_sapiens.GRCh38.dna.primary_assembly.fa.gz"
vep_options = "--vcf --check_ref --flag_pick_allele --force_overwrite --species homo_sapiens --assembly " + str(vep_assembly) + " --offline --fork " + str(config_options['other']['n_vep_forks']) + " --hgvs --dont_skip --failed 1 --af --af_1kg --af_gnomad --variant_class --regulatory --domains --symbol --protein --ccds --uniprot --appris --biotype --canonical --gencode_basic --cache --numbers --total_length --allele_number --no_escape --xref_refseq --dir /usr/local/share/vep/data"
vep_options = "--vcf --check_ref --flag_pick_allele --force_overwrite --species homo_sapiens --assembly " + str(vep_assembly) + " --offline --fork " + str(config_options['other']['n_vep_forks']) + " --hgvs --dont_skip --failed 1 --af --af_1kg --af_gnomad --variant_class --regulatory --domains --symbol --protein --ccds --uniprot --appris --biotype --canonical --gencode_basic --cache --numbers --total_length --allele_number --no_escape --xref_refseq --plugin LoF --dir /usr/local/share/vep/data"
if config_options['other']['vep_skip_intergenic'] == 1:
vep_options = vep_options + " --no_intergenic"
vep_main_command = str(docker_command_run1) + "vep --input_file " + str(input_vcf_gvanno_ready) + " --output_file " + str(vep_tmp_vcf) + " " + str(vep_options) + " --fasta " + str(fasta_assembly) + "\""
Expand All @@ -331,7 +331,7 @@ def run_gvanno(host_directories, docker_image_version, config_options, sample_id
print()
logger = getlogger('gvanno-vcfanno')
logger.info("STEP 2: Clinical/functional variant annotations with gvanno-vcfanno (ClinVar, dbNSFP, UniProtKB)")
gvanno_vcfanno_command = str(docker_command_run2) + "gvanno_vcfanno.py --num_processes " + str(config_options['other']['n_vcfanno_proc']) + " --dbnsfp --clinvar --uniprot --gvanno_xref " + str(vep_vcf) + ".gz " + str(vep_vcfanno_vcf) + " /data/data/" + str(genome_assembly) + "\""
gvanno_vcfanno_command = str(docker_command_run2) + "gvanno_vcfanno.py --num_processes " + str(config_options['other']['n_vcfanno_proc']) + " --dbnsfp --clinvar --uniprot --pcgr_onco_xref " + str(vep_vcf) + ".gz " + str(vep_vcfanno_vcf) + " /data/data/" + str(genome_assembly) + "\""
check_subprocess(gvanno_vcfanno_command)
logger.info("Finished")

Expand Down
Loading

0 comments on commit c96d8ff

Please sign in to comment.