Skip to content

ASAFind latest version, with options for graphical output and ppc protein prediction

Notifications You must be signed in to change notification settings

ASAFind/ASAFind-2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ASAFind logo

Download ASAFind to install locally

For local installation, a command line version of ASAFind can be downloaded from our GitHub repository:

https://github.com/ASAFind/ASAFind-2

Installation steps

Using the GitHub function, you can either dowload the files as a zip archive, or you can clone the repository using the provided URL. After download of the latest version, follow these installation steps:
  • step into the directory where you want to make the installation e.g.
    cd /home/marta/asafind
  • Make a clone of the GitHub
    git clone https://github.com/ASAFind/ASAFind-2.git
  • run the following command from the command line
    python3 -m venv asafind_line_command . asafind_line_command/bin/activate
    pip install --upgrade pip
    cd ASAFind-2/
    pip install -r requirements.txt
  • Now you are in virtual environment named asafind_line_command. Here you can ask for help e.g.
    (asafind_line_command) marta@marta-mini:~/asafind/ASAFind-2$ python3 S0_ASAFind.py --help

It will create the environment called asafind_line_command including subdirectries temp and output, install all required packages activate it.

ASAFind structure


ASAFind 2.0


In the environment root directory is the program S1_ASAFind_v3.py
Takes a Fasta and companion TargetP v.2.0 short format file as input, with the complete TargetP header
(two lines starting with '#'). Some versions of SignalP truncate the sequence names. SignalP-3.0 to 20 characters,
and 4.0, 4.1 to 58 characters. Therefore,
ASAFind only considers the first corresponding characters of the fasta name (and the first 90 in the
case of TargetP 2.0), which must be unique within the file. Parts of the fasta name
after that character are ignored. Additionally, the fasta name may not contain a '-' or '|'. This
requirement is because SignalP converts characters in sequence names (e.g. '-' is changed to '_').
ASAFind requires at least 7 aa upstream and 22 aa downstream of the cleavage site suggested by
SignalP. The output of this script is a tab delimited table.
Python >= 3.10 required.

python S0_ASAFind.py --help
usage: S1_ASAFind_v3.py
usage: S0_ASAFind.py [-h] -f FASTA_FILE -p SIGNALP_FILE
[-s SIMPLE_SCORE_CUTOFF] [-t FASTA_FILE_WITH_MOTIFS] [-w]
[-v1] [-ppc] [-s_ppc SCORE_CUTOFF_PPC]
[-t_ppc FASTA_FILE_WITH_MOTIFS_PPC] [-l]
[-my_org MY_ORGANISM] [-v]



-h, --helpshow this help message and exit
-f FASTA_FILE, --fasta_file FASTA_FILE Specify the input fasta FILE.
-p SIGNALP_FILE, --signalp_file SIGNALP_FILE Specify the input TargetP FILE..
-s SIMPLE_SCORE_CUTOFF, --simple_score_cutoff SIMPLE_SCORE_CUTOFF Optionally, specify an explicit score cutoff, rather than using ASAFind's default algorithm, not compatible with option -v1. The score given here will not be normalized and therefore should be obtained form a distribution of normalized scores.
-t FASTA_FILE_WITH_MOTIFS, --fasta_file_with_motifs FASTA_FILE_WITH_MOTIFS Optionally, specify a custom scoring table. The scoring table will be normalized with the maximum score, which allows for processing of non-normalized as well as normalized scoring tables.
-w, --web_output Format output for web display. This is mostly useful when called by a web app.
-v1, --reproduce_ASAFind_1 Reproduce ASAFind 1.x scores and results (non-normalized scores, if no custom scoring table is
specified, the original default scoring table generated without small sample size correction
will be used, not compatible with option -s).
-ppc, --include_ppc_prediction Include prediction of proteins that might be targeted to the periplastidic compartment.
-t SCORE_TABLE_FILE, --score_table_file SCORE_TABLE_FILE Optionally, specify a custom scoring table. The scoring table will be normalized with the
maximum score, which allows for processing of non-normalized as well as normalized scoring
tables.
-o OUT_FILE, --out_file OUT_FILE Specify the path and name of the output file you wish to create. Default will be the same as
the fasta_file, but with a ".tab" suffix.
-s_ppc SCORE_CUTOFF_PPC, --score_cutoff_ppc SCORE_CUTOFF_PPC Optionally, specify an explicit score cutoff for the ppc protein prediction, if given, ppc
protein prediction will be included. The score given here will not be normalized and therefore
should be obtained form a distribution of normalized scores.
-t_ppc SCORE_TABLE_FILE_PPC, --score_table_file_ppc SCORE_TABLE_FILE_PPC Optionally, specify a custom scoring table for the ppc protein prediction, if given, ppc
protein prediction will be included. The scoring table will be normalized with the maximum
score, which allows for processing of non-normalized as well as normalized scoring tables.
-l {yes,no}, --logomaker {yes,no} Choose "yes" or "no" (default: no). Optionally, from version 3.0 you can define with keyword
yes or no, if the program will generate also the logomaker pictures in .png and .svg formats.
They will be include into the output compressed package.
-my_org MY_ORGANISM, --my_organism MY_ORGANISM Specify the name of organism.
-v, --version show program's version number and exit

Example of run

python S0_ASAFind.py -f /data/haptophyta/temp/haptophyta_fasta_for_targetp2.fsa -p /data/haptophyta/temp/haptophyta_fasta_for_targetp2.targetp2 -my_org haptophyta -l

Releases

No releases published

Packages

No packages published

Languages