-
Notifications
You must be signed in to change notification settings - Fork 2
Check data validity
Vinh Tran edited this page Jul 29, 2020
·
2 revisions
Normally all data come together with HaMStR-oneSeq and data resulted from addTaxon1s
or addTaxa1s
are ready to use. However, if you manually add taxa into HaMStR, you should check for their validity by running this command:
checkData1s [-g GENOMEDIR] [-b BLASTDIR] [-w WEIGHTDIR] [--replace] [--delete] [--concat]
GENOMEDIR
, BLASTDIR
and WEIGHTDIR
are only needed if they are not placed in the same directory of the installed HaMStR-oneSeq.
This script will check for:
- valid folder name (must not contain PIPE, space or some other special characters)
- valid fasta file (no space/tab allowed, no special characters or numbers in the sequences, each sequence must be written in single line)
- missing annotations (all taxa present in genome_dir and blast_dir must have annotations in weight_dir)
- missing or duplicated NCBI taxonomy IDs
You will have options to process the fasta files if they are not in the right format, such as delete special characters in the sequences, or replace them with "X", or convert multi-line sequences into single-line sequences.