Skip to content

Commit

Permalink
Merge pull request #689 from nuno-agostinho/update/GeneSplicer
Browse files Browse the repository at this point in the history
GeneSplicer: improve argument processing
  • Loading branch information
olaaustine authored Feb 19, 2024
2 parents a84ff54 + 2697863 commit 047fc6c
Showing 1 changed file with 82 additions and 44 deletions.
126 changes: 82 additions & 44 deletions GeneSplicer.pm
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,12 @@ limitations under the License.
=head1 SYNOPSIS
mv GeneSplicer.pm ~/.vep/Plugins
./vep -i variants.vcf --plugin GeneSplicer,[path_to_genesplicer_bin],[path_to_training_dir],[option1=value],[option2=value]
./vep -i variants.vcf --plugin GeneSplicer,binary=$GS/bin/linux/genesplicer,training=$GS/human
./vep -i variants.vcf --plugin GeneSplicer,binary=$GS/bin/linux/genesplicer,training=$GS/human,context=200,tmpdir=/mytmp
# VEP Docker/Singularity: if 'genesplicer' is a command available in $PATH,
# there is no need to specify the location of the binary
./vep -i variants.vcf --plugin GeneSplicer,training=$GS/human
=head1 DESCRIPTION
Expand All @@ -41,49 +46,58 @@ limitations under the License.
sequence included either side defaults to 100bp, but can be modified
by passing e.g. "context=50" as a parameter to the plugin.
Any predicted splicing regions that overlap the variant are reported
in the output with one of four states: no_change, diff, gain, loss
You will need to download the GeneSplicer binary and data from
ftp://ftp.ccb.jhu.edu/pub/software/genesplicer/GeneSplicer.tar.gz. Extract the
folder using:
tar -xzf GeneSplicer.tar.gz
There follows a "/"-separated string consisting of the following data:
GeneSplicer comes with precompiled binaries for multiple systems. If the
provided binaries do not run, compile `genesplicer` from source:
1) type (donor, acceptor)
2) coordinates (start-end)
3) confidence (Low, Medium, High)
4) score
```
cd $GS/sources
# if macOS, run this step
[[ $(uname -s) == "Darwin" ]] && perl -pi -e "s/^main /int main /" genesplicer.cpp
make
cd -
./vep [options] --plugin GeneSplicer,$GS/sources/genesplicer,$GS/human
```
Example: loss/acceptor/727006-727007/High/16.231924
Predicted splicing regions that overlap the variant are reported in the output
with a '/'-separated string (e.g., 'loss/acceptor/727006-727007/High/16.231924')
consisting of the following data by this order:
1) state (no_change, diff, gain, loss)
2) type (donor, acceptor)
3) coordinates (start-end)
4) confidence (Low, Medium, High)
5) score
If multiple sites are predicted, their reports are separated by ",".
For diff, the confidence and score for both the reference and alternate
sequences is reported as REF-ALT.
Example: diff/donor/621915-621914/Medium-Medium/7.020731-6.988368
sequences is reported as REF-ALT, such as
'diff/donor/621915-621914/Medium-Medium/7.020731-6.988368'.
Several key=value parameters can be modified in the the plugin string:
training : (mandatory) directory to species-specific training data, such as
`GeneSplicer/human`
binary : path to `genesplicer` binary (default: `genesplicer`)
context : change the amount of sequence added either side of
the variant (default: 100bp)
tmpdir : change the temporary directory used (default: /tmp)
tmpdir : change the temporary directory used (default: `/tmp`)
cache_size : change how many sequences' scores are cached in memory
(default: 50)
Example:
--plugin GeneSplicer,$GS/bin/linux/genesplicer,$GS/human,context=200,tmpdir=/mytmp
On some systems the binaries provided will not execute, but can be compiled from source:
cd $GS/sources
make
cd -
./vep [options] --plugin GeneSplicer,$GS/sources/genesplicer,$GS/human
On Mac OSX the make step is known to fail; the genesplicer.cpp file requires modification:
--plugin GeneSplicer,binary=$GS/bin/linux/genesplicer,training=$GS/human,context=200,tmpdir=/mytmp
cd $GS/sources
perl -pi -e "s/^main /int main /" genesplicer.cpp
make
When using VEP Docker/Singularity, the `binary` argument can be ommitted, as
the `genesplicer` command is exported in the $PATH variable and is thus
automatically detected by the plugin:
--plugin GeneSplicer,training=$GS/human,context=200,tmpdir=/mytmp
=cut

Expand All @@ -106,6 +120,23 @@ our %DEFAULTS = (
cache_size => 50,
);

sub _check_binary {
my $binary_file = shift;
# if $binary_file is the genesplicer command, try to get its path to check if it exists
$binary_file = (`which $binary_file 2>&1` || '') unless -e $binary_file;
chomp $binary_file;
die("ERROR: genesplicer binary not found\n") unless -e $binary_file;

my $test = `$binary_file 2>&1` || '';
die("ERROR: failed to run genesplicer binary:\n$test\n") unless $test =~ /^USAGE/;
}

sub _check_training_dir {
my $training_dir = shift;
die("ERROR: training directory not specified\n") unless $training_dir;
die("ERROR: training directory not found\n") unless -d $training_dir;
}

sub new {
my $class = shift;

Expand All @@ -116,32 +147,39 @@ sub new {

my $params = $self->params;

my $bin = shift @$params;
die("ERROR: genesplicer binary not specified\n") unless $bin;
die("ERROR: genesplicer binary not found\n") unless -e $bin;
my $test = `$bin 2>&1`;
die("ERROR: failed to run genesplicer binary:\n$test\n") unless $test =~ /^USAGE/;
$self->{_bin} = $bin;

my $training_dir = shift @$params;
die("ERROR: training directory not specified\n") unless $training_dir;
die("ERROR: training directory not found\n") unless -d $training_dir;
$self->{_training_dir} = $training_dir;

# defaults
$self->{'_param_'.$_} = $DEFAULTS{$_} for keys %DEFAULTS;

# REST API passes 1 as first param
shift @$params if $params->[0] && $params->[0] eq '1';

# set/override with user params
foreach my $param(@$params) {
next if $param eq '1'; # REST API passes 1 as a param

my ($key, $val) = split('=', $param);
die("ERROR: Failed to parse parameter $param\n") unless defined($key) && defined($val);

$self->{'_param_'.$key} = $val;
if (!defined $val) {
# for positional arguments, set 'binary' and 'training data'
if (!defined $self->{_bin}) {
$self->{_bin} = $param;
next;
} elsif (!defined $self->{_training_dir}) {
$self->{_training_dir} = $param;
next;
}
die("ERROR: Failed to parse parameter $param\n") unless defined($key);
}

if ($key eq 'binary') {
$self->{_bin} = $val;
} elsif ($key eq 'training') {
$self->{_training_dir} = $val;
} else {
$self->{'_param_'.$key} = $val;
}
}

$self->{_bin} ||= 'genesplicer';
_check_binary($self->{_bin});
_check_training_dir($self->{_training_dir});
return $self;
}

Expand Down

0 comments on commit 047fc6c

Please sign in to comment.