Skip to content

Example command lines

sarahet edited this page Dec 19, 2023 · 7 revisions

Simple BlastX-like run

  1. Download pre-formatted UniprotSprot from Pre‐built Database Indexes and unpack.
  2. Select your query file or take this example.
  3. Run bin/lambda3 -q /path/to/CAMI_plant_associated_sample0_10Mb.fasta -d /path/to/uniprot_sprot_20230713.lba.gz

You will see something like this:

LAMBDA - the Local Aligner for Massive Biological DatA
======================================================
Version 3.0.0

Reading index properties... done.
Detecting query alphabet... dna5 detected.
Checking memory requirements... met.
Loading Database Index... done.
Loading Query Sequences... done.
Searching and extending hits on-line...progress:
0%  10%  20%  30%  40%  50%  60%  70%  80%  90%  100%
|....:....:....:....:....:....:....:....:....:....|
Number of valid hits:                           120549
Number of Queries with at least one valid hit:   16106

Since you did not specify it, the default output file name and format was used: output.m8. Browse the output file with your editor of choice, or less on the command line:

% less output.m8
S0R0/2  sp|G3XD97|PTXS_PSEAE    74.47   47      12      0       141     1       264     310     7e-14   75.9
S0R0/2  sp|Q88HH7|PTXS_PSEPK    78.57   42      9       0       126     1       269     310     5e-12   69.7
S0R0/2  sp|A0A167V873|PTXS_PSEDL        70.73   41      12      0       123     1       270     310     6e-11   66.2
S0R12844/1      sp|I6YDK7|ACCD1_MYCTU   79.07   43      9       0       21      149     78      120     3e-13   73.9
S0R1/1  sp|A1VMB2|KHSE_POLNA    57.14   49      14      1       3       149     190     231     2e-10   64.3
S0R1/1  sp|Q82UL3|KHSE_NITEU    51.02   49      17      2       3       149     186     227     1e-05   48.9
S0R1/1  sp|Q2YBJ8|KHSE_NITMU    46.94   49      19      1       3       149     186     227     2e-05   48.1
S0R1/1  sp|Q0AHY7|KHSE_NITEC    46.00   50      18      2       3       149     186     227     6e-05   46.2
S0R1/1  sp|Q9RAM6|KHSE_METFK    42.86   49      21      2       3       149     186     227     4e-04   43.5
S0R1/1  sp|O32378|KHSE_METGL    40.82   49      22      2       3       149     186     227     4e-04   43.5
S0R1/1  sp|Q9JWE5|KHSE_NEIMA    34.69   49      25      1       3       149     186     227     0.002   41.2
S0R1/1  sp|Q4W557|KHSE_NEIMB    34.69   49      25      1       3       149     186     227     0.002   41.2
[...]

NOTE: Because Lambda uses multiple threads by default, the output is not guaranteed to be in the same order (however matches of one query sequence always appear en-bloc and sorted by E-value).

SAM output and E-value cutoff

Follow above instructions, but choose .sam-format as output. Also use an E-value cutoff of 1e-4.

How would the command line look? [click to see] `bin/lambda3 -q /path/to/CAMI_plant_associated_sample0_10Mb.fasta -d /path/to/uniprot_sprot_20230713.lba.gz -o output.sam -e 1e-4`

The program will now print:

LAMBDA - the Local Aligner for Massive Biological DatA
======================================================
Version 3.0.0

Reading index properties... done.
Detecting query alphabet... dna5 detected.
Checking memory requirements... met.
Loading Database Index... done.
Loading Query Sequences... done.
Searching and extending hits on-line...progress:
0%  10%  20%  30%  40%  50%  60%  70%  80%  90%  100%
|....:....:....:....:....:....:....:....:....:....|
Number of valid hits:                           107983
Number of Queries with at least one valid hit:   14349

As you can see, the number of hits has been reduced slightly due to the more stringent cutoff.

View the output again to verify that it is beautiful SAM:

@HD     VN:1.4  GO:query
@PG     ID:lambda       PN:lambda       VN:3.0.0        CL:searchp -i uniprot_sprot_20230713.lba.gz -q CAMI_plant_associated_sample0_10Mb.fasta -o output.sam -e 1e-4
@CO     Lambda is a high performance BLAST compatible local aligner, please see http://seqan.de/lambda for more information.
@CO     SAM/BAM dialect documentation is available here: https://github.com/seqan/lambda/wiki/Output-Formats
@CO     If you use any results found by Lambda, please cite Hauswedell et al. (2014) doi: 10.1093/bioinformatics/btu439
@CO     Optional tags as follow AS:bit score    NM:edit distance (in protein space unless BLASTN)       ae:expect value ai:% identity (in protein space unless BLASTN)  qf:query frame
S0R0/2  16      sp|G3XD97|PTXS_PSEAE    264     255     141M9H  *       0       0       CGTGAACCCAAGTGCAATCTGTTCGATGACGTGGGCTTGATCGCCCTCGATGACCTGGATTGGTACCCGTTGGTGGGCAGCGGCATTACCGCTCTCGCGCAGCCGACCACCGAGATGGGCGCCAGTGCATTTGAGTGTCTG   *       ae:f:7.45803e-14        AS:i:75 ai:i:74 qf:i:-1 NM:i:12
S0R0/2  272     sp|Q88HH7|PTXS_PSEPK    269     255     126M24H *       0       0       AATCTGTTCGATGACGTGGGCTTGATCGCCCTCGATGACCTGGATTGGTACCCGTTGGTGGGCAGCGGCATTACCGCTCTCGCGCAGCCGACCACCGAGATGGGCGCCAGTGCATTTGAGTGTCTG  *       ae:f:5.34478e-12        AS:i:69 ai:i:78 qf:i:-1 NM:i:9
S0R0/2  272     sp|A0A167V873|PTXS_PSEDL        270     255     123M27H *       0       0       CTGTTCGATGACGTGGGCTTGATCGCCCTCGATGACCTGGATTGGTACCCGTTGGTGGGCAGCGGCATTACCGCTCTCGCGCAGCCGACCACCGAGATGGGCGCCAGTGCATTTGAGTGTCTG     *       ae:f:5.90935e-11        AS:i:66 ai:i:70 qf:i:-1 NM:i:12
S0R1/1  0       sp|A1VMB2|KHSE_POLNA    190     255     2H48M21I78M1H   *       0       0       GTGCATGCCGACATGTTCCGCGACAACGTGATGTTCGCCACCGGTGAAGACGCCGGCGCAGCGCCGCGCCTCACCGGCGTTTTCGACTTCTATTTCGCGGGCACCGACACATGGCTGTTCGACCTGGCTGTGTGCCTGTACCACTGG     *       ae:f:2.24555e-10        AS:i:64 ai:i:57 qf:i:3  NM:i:21
S0R1/1  256     sp|Q82UL3|KHSE_NITEU    186     255     2H36M3I18M18I72M1H      *       0       0       *       *       ae:f:9.7631e-06 AS:i:48 ai:i:51 qf:i:3  NM:i:24
S0R1/1  256     sp|Q2YBJ8|KHSE_NITMU    186     255     2H45M21I81M1H   *       0       0       *       *       ae:f:1.66533e-05        AS:i:48 ai:i:46 qf:i:3  NM:i:26
S0R1/1  256     sp|Q0AHY7|KHSE_NITEC    186     255     2H36M3D9M24I78M1H       *       0       0       *       *       ae:f:6.32826e-05        AS:i:46 ai:i:46 qf:i:3  NM:i:27
S0R3/1  16      sp|A0QX20|ACNA_MYCS2    181     255     1H144M5H        *       0       0       GGCATCGTACACCAGGTCAACCTGGAATACCTGGCGCGCGGCGTGCACCGGAAGGACGGCGTCTACTACCCTGACCCGCTGGTCGGCACCGAATCGCACACCACCATGATCAACGGCATCGGCGTGGTCGGCTGGGGCGTCGGC        *       ae:f:1.35917e-15        AS:i:81 ai:i:75 qf:i:-3 NM:i:12
S0R3/1  272     sp|O53166|ACNA_MYCTU    176     255     1H144M5H        *       0       0       *       *       ae:f:5.16485e-15        AS:i:79 ai:i:72 qf:i:-3 NM:i:13
S0R3/1  272     sp|Q92G90|ACNA_RICCN    175     255     1H144M5H        *       0       0       *       *       ae:f:5.16485e-15        AS:i:79 ai:i:72 qf:i:-3 NM:i:13
S0R3/1  272     sp|Q4UK20|ACNA_RICFE    175     255     1H144M5H        *       0       0       *       *       ae:f:5.16485e-15        AS:i:79 ai:i:72 qf:i:-3 NM:i:13
S0R3/1  272     sp|Q9RTN7|ACNA_DEIRA    179     255     1H84M3D6M9D54M5H        *       0       0       *       *       ae:f:6.7455e-15 AS:i:79 ai:i:75 qf:i:-3 NM:i:13
[...]

For more information on the selection of output formats and more fine-grained options, see the article.