Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update plugin docs #674

Merged
merged 4 commits into from
Jan 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 8 additions & 8 deletions AlphaMissense.pm
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,15 @@ limitations under the License.

This plugin will add two annotations per missense variant:

'am_pathogenicity', a continuous score between 0 and 1 which can be interpreted as the predicted
- 'am_pathogenicity', a continuous score between 0 and 1 which can be interpreted as the predicted
probability of the variant being pathogenic.

'am_class' is the classification of the variant into one of three discrete categories:
'Likely pathogenic', 'Likely benign', or 'ambiguous'. These are derived using the following
thresholds of am_pathogenicity:
'Likely benign' if am_pathogenicity < 0.34;
'Likely pathogenic' if am_pathogenicity > 0.564;
'ambiguous' otherwise.
- 'am_class' is the classification of the variant into one of three discrete categories:
'likely_pathogenic', 'likely_benign', or 'ambiguous'. These are derived using the following
thresholds of am_pathogenicity:
'likely_benign' if 'am_pathogenicity' < 0.34;
'likely_pathogenic' if 'am_pathogenicity' > 0.564;
'ambiguous' otherwise.

These thresholds were chosen to achieve 90% precision for both pathogenic and benign ClinVar variants.
Note that AlphaMissense was not trained on ClinVar variants. Variants labeled as 'ambiguous' should be
Expand Down Expand Up @@ -92,7 +92,7 @@ limitations under the License.
file : (mandatory) Tabix-indexed AlphaMissense data
cols : (optional) Colon-separated columns to print from
AlphaMissense data; if set to 'all', all columns are printed
(default: Missense_pathogenicity:Missense_class)
(default: 'Missense_pathogenicity:Missense_class')
transcript_match : Only print data if transcript identifiers match those from
AlphaMissense data (default: 0)

Expand Down
15 changes: 7 additions & 8 deletions AncestralAllele.pm
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ limitations under the License.
A VEP plugin that retrieves ancestral allele sequences from a FASTA file.

Ensembl produces FASTA file dumps of the ancestral sequences of key species.
Data files for GRCh37 are available from https://ftp.ensembl.org/pub/release-75/fasta/ancestral_alleles/
Data files for GRCh38 are available from https://ftp.ensembl.org/pub/current_fasta/ancestral_alleles/
- Data files for GRCh37: https://ftp.ensembl.org/pub/release-75/fasta/ancestral_alleles/
- Data files for GRCh38: https://ftp.ensembl.org/pub/current_fasta/ancestral_alleles/

For optimal retrieval speed, you should pre-process the FASTA files into a single
bgzipped file that can be accessed via Bio::DB::HTS::Faidx (installed by VEP's
bgzipped file that can be accessed via 'Bio::DB::HTS::Faidx' (installed by VEP's
INSTALL.pl):

wget https://ftp.ensembl.org/pub/current_fasta/ancestral_alleles/homo_sapiens_ancestor_GRCh38.tar.gz
Expand All @@ -48,16 +48,15 @@ limitations under the License.
rm -rf homo_sapiens_ancestor_GRCh38/ homo_sapiens_ancestor_GRCh38.tar.gz
./vep -i variations.vcf --plugin AncestralAllele,homo_sapiens_ancestor_GRCh38.fa.gz

Data file is only available for GRCh38.
The plugin is also compatible with Bio::DB::Fasta and an uncompressed FASTA file.
The plugin is also compatible with 'Bio::DB::Fasta' and an uncompressed FASTA file.

Note the first time you run the plugin with a newly generated FASTA file it will
spend some time indexing the file. DO NOT INTERRUPT THIS PROCESS, particularly
if you do not have Bio::DB::HTS installed.
if you do not have 'Bio::DB::HTS' installed.

Special cases:
"-" represents an insertion
"?" indicates the chromosome could not be looked up in the FASTA
- '-' represents an insertion
- '?' indicates the chromosome could not be looked up in the FASTA

=cut

Expand Down
18 changes: 9 additions & 9 deletions BayesDel.pm
Original file line number Diff line number Diff line change
Expand Up @@ -51,15 +51,15 @@ limitations under the License.
For GRCh37:
tar zxvf BayesDel_170824_addAF.tgz
rm *.gz.tbi
gunzip *.gz
for f in BayesDel_170824_addAF_chr*; do grep -v "^#" $f >> BayesDel_170824_addAF.txt; done
cat BayesDel_170824_addAF.txt | sort -k1,1 -k2,2n > BayesDel_170824_addAF_sorted.txt
grep "^#" BayesDel_170824_addAF_chr1 > BayesDel_170824_addAF_all_scores.txt
cat BayesDel_170824_addAF_sorted.txt >> BayesDel_170824_addAF_all_scores.txt
bgzip BayesDel_170824_addAF_all_scores.txt
tabix -s 1 -b 2 -e 2 BayesDel_170824_addAF_all_scores.txt.gz
> tar zxvf BayesDel_170824_addAF.tgz
> rm *.gz.tbi
> gunzip *.gz
> for f in BayesDel_170824_addAF_chr*; do grep -v "^#" $f >> BayesDel_170824_addAF.txt; done
> cat BayesDel_170824_addAF.txt | sort -k1,1 -k2,2n > BayesDel_170824_addAF_sorted.txt
> grep "^#" BayesDel_170824_addAF_chr1 > BayesDel_170824_addAF_all_scores.txt
> cat BayesDel_170824_addAF_sorted.txt >> BayesDel_170824_addAF_all_scores.txt
> bgzip BayesDel_170824_addAF_all_scores.txt
> tabix -s 1 -b 2 -e 2 BayesDel_170824_addAF_all_scores.txt.gz
For GRCh38:
Remap GRCh37 file
Expand Down
23 changes: 12 additions & 11 deletions Condel.pm
Original file line number Diff line number Diff line change
Expand Up @@ -43,18 +43,19 @@ limitations under the License.
plugin is based on a script provided by this group and slightly reformatted to
fit into the Ensembl API.
The plugin takes 3 command line arguments, the first is the path to a Condel
configuration directory which contains cutoffs and the distribution files etc.,
the second is either "s", "p", or "b" to output the Condel score, prediction or
both (the default is both), and the third argument is either 1 or 2 to use the
original version of Condel (1), or the newer version (2) - 2 is the default and
is recommended to avoid false positive predictions from Condel in some
circumstances.
The plugin takes 3 command line arguments by this order:
1. Path to a Condel configuration directory which contains cutoffs and the
distribution files, etc.
2. Output: output the Condel score ('s'), prediction ('p') or both ('b');
both ('b') is the default.
3. Version of Condel to use: either 1 (original version) or 2 (newer version);
'2' is the default and is recommended to avoid false positive predictions
from Condel in some circumstances.
An example Condel configuration file and a set of distribution files can be
found in the config/Condel directory in this repository. You should edit the
config/Condel/config/condel_SP.conf file and set the 'condel.dir' parameter to
the full path to the location of the config/Condel directory on your system.
found in the 'config/Condel' directory in this repository. You should edit the
'config/Condel/config/condel_SP.conf' file and set the 'condel.dir' parameter to
the full path to the location of the 'config/Condel' directory on your system.
References:
Expand All @@ -76,7 +77,7 @@ limitations under the License.
(4) Flicek P, et al.
Ensembl 2012
Nucleic Acids Research (2011)
doi: 10.1093/nar/gkr991
doi:10.1093/nar/gkr991
=cut

Expand Down
4 changes: 2 additions & 2 deletions DisGeNET.pm
Original file line number Diff line number Diff line change
Expand Up @@ -64,9 +64,9 @@ limitations under the License.
- diseases/phenotype names (optional)
- dbSNP variant Identifier (optional)
This plugin uses file 'all_variant_disease_pmid_associations.tsv.gz'.
File can be downloaded from: https://www.disgenet.org/downloads.
The following steps are necessary before running this plugin (tested with DisGeNET export date 2020-05-26):
This plugin uses file 'all_variant_disease_pmid_associations.tsv.gz'
File can be downloaded from: https://www.disgenet.org/downloads
gunzip all_variant_disease_pmid_associations.tsv.gz
awk '($1 ~ /^snpId/ || $2 ~ /NA/) {next} {print $0}' all_variant_disease_pmid_associations.tsv > all_variant_disease_pmid_associations_clean.tsv
Expand Down
6 changes: 4 additions & 2 deletions Draw.pm
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ limitations under the License.
=head1 DESCRIPTION
A VEP plugin that draws pictures of the transcript model showing the
variant location. Can take five optional paramters:
variant location.
Takes five optional paramters:
1) File name stem for images
2) Image width in pixels (default: 1000px)
Expand All @@ -45,7 +47,7 @@ limitations under the License.
./vep -i variations.vcf --plugin Draw,myimg,2000,100
Images are written to [file_stem]_[transcript_id]_[variant_id].png
Images are written to '[file_stem]_[transcript_id]_[variant_id].png'
Requires GD library installed to run.
Expand Down
1 change: 1 addition & 0 deletions EVE.pm
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ limitations under the License.
This is a plugin for the Ensembl Variant Effect Predictor (VEP) that
adds information from EVE (evolutionary model of variant effect).
This plugin only report EVE scores for input variants
and does not merge input lines to report on adjacent variants.
It is only available for GRCh38.
Expand Down
3 changes: 2 additions & 1 deletion FlagLRG.pm
Original file line number Diff line number Diff line change
Expand Up @@ -32,8 +32,9 @@ limitations under the License.
=head1 DESCRIPTION
A VEP plugin that retrieves the LRG ID matching either the RefSeq or Ensembl
transcript IDs. You can obtain the 'list_LRGs_transcripts_xrefs.txt' using:
transcript IDs.
You can obtain the 'list_LRGs_transcripts_xrefs.txt' using:
> wget https://ftp.ebi.ac.uk/pub/databases/lrgex/list_LRGs_transcripts_xrefs.txt
=cut
Expand Down
38 changes: 19 additions & 19 deletions G2P.pm
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ limitations under the License.

For further information see:
Thormann A, Halachev M, McLaren W, et al. Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP.
Nature Communications. 2019 May;10(1):2373. DOI: 10.1038/s41467-019-10016-3. PMID: 31147538; PMCID: PMC6542828.
Nature Communications. 2019 May;10(1):2373. doi:10.1038/s41467-019-10016-3. PMID: 31147538; PMCID: PMC6542828.


Options are passed to the plugin as key=value pairs, (defaults in parentheses):
Expand All @@ -64,47 +64,47 @@ limitations under the License.
Setting the value to 1 will overwrite any confidence levels provided with the
confidence_levels option.
af_from_vcf : set value to 1 to include allele frequencies from VCF file.
Specifiy the list of reference populations to include with --af_from_vcf_keys
Specifiy the list of reference populations to include with '--af_from_vcf_keys'
af_from_vcf_keys : VCF collections used for annotating variant alleles with observed
allele frequencies. Allele frequencies are retrieved from VCF files. If
af_from_vcf is set to 1 but no VCF collections are specified with --af_from_vcf_keys
af_from_vcf is set to 1 but no VCF collections are specified with '--af_from_vcf_keys'
all available VCF collections are included.
Available VCF collections: topmed, uk10k, gnomADe, gnomADe_r2.1.1, gnomADg, gnomADg_v3.1.2.
Available VCF collections: 'topmed', 'uk10k', 'gnomADe', 'gnomADe_r2.1.1', 'gnomADg', 'gnomADg_v3.1.2'.
Separate multiple values with '&'.
VCF collections contain the following populations:
topmed : TOPMed (available for GRCh37 and GRCh38).
uk10k : ALSPAC, TWINSUK (available for GRCh37 and GRCh38).
gnomADe & gnomADe_r2.1.1 - gnomADe:AFR, gnomADe:ALL, gnomADe:AMR, gnomADe:ASJ, gnomADe:EAS, gnomADe:FIN, gnomADe:NFE, gnomADe:OTH, gnomADe:SAS (for GRCh37 and GRCh38 respectively).
gnomADg & gnomADg_v3.1.2 - gnomADg:AFR, gnomADg:ALL, gnomADg:AMR, gnomADg:ASJ, gnomADg:EAS, gnomADg:FIN, gnomADg:NFE, gnomADg:OTH (for GRCh37 and GRCh38 respectively).
Need to use af_from_vcf paramter to use this option.
* 'topmed' - TOPMed (available for GRCh37 and GRCh38).
* 'uk10k' - ALSPAC, TWINSUK (available for GRCh37 and GRCh38).
* 'gnomADe' & 'gnomADe_r2.1.1' - gnomADe:AFR, gnomADe:ALL, gnomADe:AMR, gnomADe:ASJ, gnomADe:EAS, gnomADe:FIN, gnomADe:NFE, gnomADe:OTH, gnomADe:SAS (for GRCh37 and GRCh38 respectively).
* 'gnomADg' & 'gnomADg_v3.1.2' - gnomADg:AFR, gnomADg:ALL, gnomADg:AMR, gnomADg:ASJ, gnomADg:EAS, gnomADg:FIN, gnomADg:NFE, gnomADg:OTH (for GRCh37 and GRCh38 respectively).
Need to use 'af_from_vcf' parameter to use this option.
default_af : default frequency of the input variant if no frequency data is
found (0). This determines whether such variants are included;
the value of 0 forces variants with no frequency data to be
included as this is considered equivalent to having a frequency
of 0. Set to 1 (or any value higher than af) to exclude them.
of 0. Set to 1 (or any value higher than 'af') to exclude them.
types : SO consequence types to include. Separate multiple values with '&'
(splice_donor_variant,splice_acceptor_variant,stop_gained,
frameshift_variant,stop_lost,initiator_codon_variant,
inframe_insertion,inframe_deletion,missense_variant,
coding_sequence_variant,start_lost,transcript_ablation,
transcript_amplification,protein_altering_variant)
(splice_donor_variant, splice_acceptor_variant, stop_gained,
frameshift_variant, stop_lost, initiator_codon_variant,
inframe_insertion, inframe_deletion,missense_variant,
coding_sequence_variant, start_lost,transcript_ablation,
transcript_amplification, protein_altering_variant)

log_dir : write stats to log files in log_dir

txt_report : write all G2P complete genes and attributes to txt file

html_report : write all G2P complete genes and attributes to html file

filter_by_gene_symbol: set to 1 if filter by gene symbol.
Do not set if filtering by HGNC_id.
This option is set to 1 when using PanelApp files.
filter_by_gene_symbol : set to 1 if filter by gene symbol.
Do not set if filtering by HGNC_id.
This option is set to 1 when using PanelApp files.

only_mane : set to 1 to ignore transcripts that are not MANE
N/B - Information may be lost if this option is used.

For more information - https://www.ebi.ac.uk/gene2phenotype/g2p_vep_plugin

Example:

--plugin G2P,file=G2P.csv,af_monoallelic=0.05,types=stop_gained&frameshift_variant
--plugin G2P,file=G2P.csv,af_monoallelic=0.05,af_from_vcf=1
--plugin G2P,file=G2P.csv,af_from_vcf=1,af_from_vcf_keys='topmed&gnomADe_r2.1.1'
Expand Down
5 changes: 3 additions & 2 deletions GeneSplicer.pm
Original file line number Diff line number Diff line change
Expand Up @@ -60,15 +60,16 @@ limitations under the License.
Example: diff/donor/621915-621914/Medium-Medium/7.020731-6.988368
Several parameters can be modified by passing them to the plugin string:
Several key=value parameters can be modified in the the plugin string:
context : change the amount of sequence added either side of
the variant (default: 100bp)
tmpdir : change the temporary directory used (default: /tmp)
cache_size : change how many sequences' scores are cached in memory
(default: 50)
Example: --plugin GeneSplicer,$GS/bin/linux/genesplicer,$GS/human,context=200,tmpdir=/mytmp
Example:
--plugin GeneSplicer,$GS/bin/linux/genesplicer,$GS/human,context=200,tmpdir=/mytmp
On some systems the binaries provided will not execute, but can be compiled from source:
Expand Down
10 changes: 5 additions & 5 deletions Geno2MP.pm
Original file line number Diff line number Diff line change
Expand Up @@ -42,11 +42,11 @@ limitations under the License.
rare variant genotypes linked to phenotypic information.
Parameters can be set using a key=value system:
file: VCF file containing Geno2MP data
cols: colon-delimited list of Geno2MP columns to return from INFO fields
(by default it only returns the column HPO_CT)
url: build and return URL to Geno2MP variant page (boolean; 0 by default);
the variant location in Geno2MP website is based on GRCh37 coordinates
file : VCF file containing Geno2MP data
cols : colon-delimited list of Geno2MP columns to return from INFO fields
(by default it only returns the column HPO_CT)
url : build and return URL to Geno2MP variant page (boolean; 0 by default);
the variant location in Geno2MP website is based on GRCh37 coordinates
Please cite Geno2MP alongside the VEP if you use this resource:
Geno2MP, NHGRI/NHLBI University of Washington-Center for Mendelian Genomics (UW-CMG), Seattle, WA
Expand Down
28 changes: 14 additions & 14 deletions IntAct.pm
Original file line number Diff line number Diff line change
Expand Up @@ -56,23 +56,23 @@ limitations under the License.

Options are passed to the plugin as key=value pairs:

mapping_file : (mandatory) Path to tabix-indexed genomic location mapped file
mutation_file : (mandatory) Path to IntAct data file
mapping_file : (mandatory) Path to tabix-indexed genomic location mapped file
mutation_file : (mandatory) Path to IntAct data file

By default the output will always contain feature_type and interaction_ac from the IntAct data file. You can also add more fields using the following options -
feature_ac : Set value to 1 to include Feature AC in the output
feature_short_label : Set value to 1 to include Feature short label in the output
feature_annotation : Set value to 1 to include Feature annotation in the output
ap_ac : Set value to 1 to include Affected protein AC in the output
interaction_participants : Set value to 1 to include Interaction participants in the output
pmid : Set value to 1 to include PubMedID in the output

There are also two other options for customizing the output -
all : Set value to 1 to include all the fields
minimal : Set value to 1 to overwrite default behavior and include only interaction_ac
By default the output will always contain feature_type and interaction_ac from the IntAct data file. You can also add more fields using the following key=value options -
feature_ac : Set value to 1 to include Feature AC in the output
feature_short_label : Set value to 1 to include Feature short label in the output
feature_annotation : Set value to 1 to include Feature annotation in the output
ap_ac : Set value to 1 to include Affected protein AC in the output
interaction_participants : Set value to 1 to include Interaction participants in the output
pmid : Set value to 1 to include PubMedID in the output

There are also two other key=value options for customizing the output -
all : Set value to 1 to include all the fields
minimal : Set value to 1 to overwrite default behavior and include only interaction_ac
in the output by default

See what this options mean - https://www.ebi.ac.uk/intact/download/datasets#mutations
See what these options mean - https://www.ebi.ac.uk/intact/download/datasets#mutations

Note that, interaction accession can be used to link to full details on the interaction website. For example,
where the VEP output reports an interaction_ac of EBI-12501485, the URL would be :
Expand Down
Loading