These shells generate check list with currently accepted names of species from OBIS database and these names are both compared and matched with check list with currently accepted names of species from BOLD database. The WoRMS database is used for validating species names.
Software requierements:
- anaconda 3
- git
git clone https://github.com/Ulises-Rosas/OBc.git && cd OBc
conda env create -f environment.yml
source activate OBcThere are two mock files available for testing:
head list_*==> list_geo <==
38,Colombia
260,Chile
==> list_invert <==
Acanthocephala
Reptilia
Therefore, the checklists shell can run it with:
checklists -t list_invert -g list_geo
ls *.txtChile_260_Acanthocephala_bold_validated.txt Chile_260_Reptilia_bold_validated.txt Colombia_38_Acanthocephala_bold_validated.txt Colombia_38_Reptilia_bold_validated.txt
Chile_260_Acanthocephala_obis_validated.txt Chile_260_Reptilia_obis_validated.txt Colombia_38_Acanthocephala_obis_validated.txt Colombia_38_Reptilia_obis_validated.txt
* Intermediate files generated up while running this command are the same at each run. Therefore, if this command is running in parallel, specific directory per run must be used in order to avoid intermediate file crashing. Since the following example is a single run, repo directory is used as the working directory.
As its name suggests, this command merge results from checklists command by adding metadata that is already stated on filenames:
joinfiles --matching _obis_valid_name,region,subgroup,group
Dermochelys coriacea,Chile,Reptilia,Reptilia
Lepidochelys olivacea,Chile,Reptilia,Reptilia
Caretta caretta,Colombia,Reptilia,Reptilia
Dermochelys coriacea,Colombia,Reptilia,Reptilia
Eretmochelys imbricata,Colombia,Reptilia,Reptilia
joinfiles --matching _bold_valid_name,synonyms,availability,region,subgroup,group
Dermochelys coriacea,Dermochelys coriacea,public_outside,Chile,Reptilia,Reptilia
Lepidochelys olivacea,Lepidochelys olivacea,public_outside,Chile,Reptilia,Reptilia
Caretta caretta,Caretta caretta,public_outside,Colombia,Reptilia,Reptilia
Dermochelys coriacea,Dermochelys coriacea,public_outside,Colombia,Reptilia,Reptilia
Eretmochelys imbricata,Eretmochelys imbricata,public_inside,Colombia,Reptilia,Reptilia
Default value of --matching option is _bold_. It is, however, stated as a matter of clearness. While default values of column group is the same from subgroup, this can be modified with --as option. This is particularly usefull when merging an entire directory (i.e. using --from option) under a custom group:
joinfiles \
--from data/Invertebrate\
--as Invertebrate\
--matching _bold_ > invertebrate_bold.txt
head -n 5 invertebrate_bold.txtvalid_name,synonyms,availability,region,subgroup,group
Aglaophamus macroura,Aglaophamus macroura,private,Chile,Annelida,Invertebrate
Aglaophamus trissophyllus,Aglaophamus trissophyllus,public_outside,Chile,Annelida,Invertebrate
Amphitrite kerguelensis,Amphitrite kerguelensis,private,Chile,Annelida,Invertebrate
Ancistrosyllis groenlandica,Ancistrosyllis groenlandica,public_outside,Chile,Annelida,Invertebrate
joinfiles \
--from data/Invertebrate\
--as Invertebrate\
--matching _obis_ > invertebrate_obis.txt
head -n 5 invertebrate_obis.txtvalid_name,region,subgroup,group
Abyssoninoe abyssorum,Chile,Annelida,Invertebrate
Aglaophamus foliosus,Chile,Annelida,Invertebrate
Aglaophamus macroura,Chile,Annelida,Invertebrate
Aglaophamus peruana,Chile,Annelida,Invertebrate
Likewise, this command can also join files from different directories while adding corresponding values for group column:
joinfiles \
--from data/Invertebrate data/Actinopterygii data/Elasmobranchii data/Reptilia data/Mammalia\
--as Invertebrate Actinopterygii Elasmobranchii Reptilia Mammalia\
--matching _bold_ > WholeDirectories_bold.txtjoinfiles \
--from data/Invertebrate data/Actinopterygii data/Elasmobranchii data/Reptilia data/Mammalia\
--as Invertebrate Actinopterygii Elasmobranchii Reptilia Mammalia\
--matching _obis_ > WholeDirectories_obis.txtEach file is bigger than 400 KB and these can be found here: WholeDirectories_bold.txt, WholeDirectories_obis.txt
Now, let's suppose we have a species list and not a list of taxonomical ranks instead. This species list may come from any source but OBIS database (e.g. FishBase, WoRMS, etc). checkspps was justly created to take a custom species list and to directly compare species into BOLD by skiping data mining steps from OBIS.
If we have a species list called sl_test.txt, then:
checkspps Reptilia\
--area-name Peru\
--species-list sl_test.txt\
--at PhylumIt will return both: Peru_Reptilia_obis_validated.txt and Peru_Reptilia_bold_validated.txt. Their format are the same from joinfiles.py command. Values of subgroup column are taken from a taxonomical rank of species specified with --at option. Remaining values for both group and country columns are filled with the positional argument (i.e. Reptilia in above case) and --area-name option correspondingly.
* Intermediate files generated up while running this command are the same at each run. Therefore, if this command is running in parallel, specific directory per run must be used in order to avoid intermediate file crashing. Since the following example is a single run, repo directory is used as the working directory.
barplot -i data/invertebrate_bold.txtupsetplot -i data/bold.csvsankeyplot -b data/bold.csv -o data/obis.csvThis command adds both an audition step (Oliveira et al. 2016) and custom taxonomical ranks (i.e. according to WoRMS database)to each selected specimen with public records in BOLD repository:
auditspps -i data/bold.csv --at Phylum Order Family GenusOutput name is based on its input and --at option is used for specifying taxonomical ranks to look for.
head bold_audited.tsvGroup Species Classification sppsOnBins N N_Institutes taxIDs BINs Phylum Order Family Genus
Actinopterygii Ablennes hians C Ablennes hians 14 6 13055 BOLD:AAB9824,BOLD:AAC1231,BOLD:AAC1232,BOLD:AAH7716 Chordata Beloniformes Belonidae Ablennes
Actinopterygii Abudefduf concolor D 3 1 11481 Chordata Perciformes Pomacentridae Abudefduf
Actinopterygii Abudefduf saxatilis E** Abudefduf taurus,Abudefduf saxatilis 58 7 34219 BOLD:AAA7276,BOLD:AAA7275 Chordata Perciformes Pomacentridae Abudefduf
Actinopterygii Abudefduf taurus E* Abudefduf taurus,Abudefduf saxatilis 9 3 60120 BOLD:AAA7276 Chordata Perciformes Pomacentridae Abudefduf
Actinopterygii Abudefduf troschelii D Abudefduf troschelii 3 1 11480 BOLD:AAC8011 Chordata Perciformes Pomacentridae Abudefduf
Actinopterygii Acanthemblemaria balanorum D Acanthemblemaria balanorum 1 1 374819 BOLD:ABU5784 Chordata Perciformes Chaenopsidae Acanthemblemaria
Actinopterygii Acanthemblemaria castroi D Acanthemblemaria castroi 1 1 175464 BOLD:AAJ3429 Chordata Perciformes Chaenopsidae Acanthemblemaria
Actinopterygii Acanthemblemaria exilispinus D 3 1 18107 Chordata Perciformes Chaenopsidae Acanthemblemaria
Actinopterygii Acanthemblemaria hancocki D 2 1 18109 Chordata Perciformes Chaenopsidae Acanthemblemaria
This command uses radar plot to depict composition of audition performed by auditspps
radarplot -i bold_audited.tsv --at Order --n 4 -lThe prior example plot radars in order according to species counts and uses Order category for making inner polygons. Furthermore, --n option indicate the maximum amount of polygons per radar and -l is used to include legends. There are options available to aesthetically enhance radars which can be explored with radarplot -h.




