Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Taxonomyreport doesn't work on same database #408

Closed
etowahadams opened this issue Feb 7, 2021 · 4 comments
Closed

Taxonomyreport doesn't work on same database #408

etowahadams opened this issue Feb 7, 2021 · 4 comments

Comments

@etowahadams
Copy link

I am trying to generate a taxonomy report for a seqTaxDB using taxonomyreport.
When I try the following, however, I get it tells me that my seqTaxDB is an amino acid db, which it isn't.

mmseqs taxonomyreport seqTaxDB seqTaxDB report
Input database "seqTaxDB" has the wrong type (Aminoacid)
Allowed input:
- Alignment
- Prefilter
- Bi-directional prefilter
- Clustering
- Taxonomy

I know my seqTaxDB is valid since I can use it with taxonomyreport in other contexts successfully. How should I generate a taxonomy report from an existing seqTaxDB?

@milot-mirdita
Copy link
Member

This wasn't a use case we thought of for this tool. We expected it would be used to process the output of the taxonomy output and since a few days ago a search result.

I can add support for this use-case, however, could you explain why this is useful? Seeing the taxonomic composition of a large database, such as the Uniprot will basically result in an enormous tree containing nearly every Taxon on existence. I could imagine this is useful for a small database, I guess.

@etowahadams
Copy link
Author

etowahadams commented Feb 7, 2021

Yes perhaps I am using taxonomyreport in an unusual way.

I wanted to cluster sequences of all the proteins with a domain from the NCBI domain database so I got a list of accession numbers of all the proteins containing that domain, made a fasta file with them (using blastdbcmd), then created a taxidmapping for those sequences. Using the fasta file and the taxidmapping, I made a seqTaxDB (createtaxdb with --tax-mapping-file option).

I then ran cluster on the database, and then taxonomyreport on the cluster database which worked great. I wanted to compare the taxonomic distribution of the clustered sequences to the original set of sequences so I tried taxonomyreport seqTaxDB seqTaxDB report. Perhaps there is a better way to do what I am trying to do?

@milot-mirdita
Copy link
Member

I just pushed the changes to allow annotating a sequence database. However you should note that it will count every cluster member when you pass it a cluster result. So the taxonomyreport of the seqTaxDB before and after the clustering would (should at least if i didn't do anything wrong) be the same. They will differ from the original sequence set however.

@etowahadams
Copy link
Author

I see that makes sense. Thank you for these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants