-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
taxize::downstream() result is incomplete? #848
Comments
A little more digging using the WoRMS API directly reveals that the above issue of dropped taxa is actually two different problems. Problem 1: Truncated list of octopus spp:The truncated list seems to be because the children API endpoint returns a max of 50 records. But trying the
Also, oddly,
It would be great if Problem 2: Dropping taxa downstreamHere's the tree from WoRMS from phylum down to order for the mammals: Looks like the
And
These same results are occurring from the WoRMS API directly: e.g. for mammals, id 1837, marine only = TRUE (default) When setting marine only = FALSE, you get the full tree, but also a lot of non-marine taxa. This seems to be something that WoRMS needs to fix on its end, for the long term. In the short term, I suppose I could get all the taxa and then somehow figure out which ones to drop later on. However, the
|
Update: the dropped marine species appear to be due to incorrect environment flags (
|
…dren in various fxns - should allow marine_only to be passed through children(), downstream(), worms_downstream - added tests for marine_only param in the above fxns - bump version
Thanks for the report @oharac Thanks for reaching out to WORMS folks! I added In |
Thanks @sckott - until the WoRMS data is fixed, I'll try using the I'll also give the |
…t all results both functions now use a factored out fxn worms_children_all re-record fixtures for these affected functions
Just committed, i changed internals of try e.g., note that with the |
@sckott Your fixes seem to be working great, thanks so much. However, now I'm seeing the same I emailed these new problems (class Elasmobranchii and some subclasses/infraclasses/orders below it) to the WoRMS folks as well, but seems like a pretty serious systemic issue with their database, so I wanted to post here to warn people to be very wary of these functions that access the WoRMS API until they can figure it out. |
Thanks again for letting them know and posting here for others to see. |
@oharac did you hear back on this yet? |
Nothing yet, and still encountering the same error. I'll ping them again to see if there is an estimated timeframe. |
okay, thanks |
Received an email from someone on the WoRMS data management team saying they have executed the fix for this issue this morning. A quick check via their API page showed a promising result, though I have yet to do a more thorough investigation or try interfacing via the |
The new WoRMS database fix seems to be playing nice with |
Confirmed a more reasonable number of valid elasmobranch spp (1240, instead of 3379) by comparing the status of each returned ID, looking only at those spp whose status is "accepted" (other possible statuses I see: "unaccepted", "alternate representation", "interim unpublished"). Here's a quick function to determine status (for any rank), accessing the WoRMS API check_status <- function(check_ids) {
check_ids <- check_ids[!is.na(check_ids)]
n_chunks <- ceiling(length(check_ids) / 50)
records_list <- vector('list', length = n_chunks)
for(i in 1:n_chunks) {
indices <- ((i-1)*50 + 1):min(i*50, length(check_ids))
ids_chunk <- check_ids[indices]
ids_param <- paste0('aphiaids[]=', ids_chunk, collapse = '&')
records_url <- paste0('https://www.marinespecies.org/rest/AphiaRecordsByAphiaIDs?', ids_param)
records_list[[i]] <- jsonlite::fromJSON(records_url) %>%
select(id = AphiaID, sciname = scientificname, status) %>%
distinct()
}
records_df <- records_list %>%
bind_rows()
return(records_df)
}
### example:
x <- taxize::downstream('elasmobranchii', db = 'worms', downto = 'species')[[1]]
y <- check_status(x$id) %>%
filter(status == 'accepted')
z <- x %>%
filter(id %in% y$id) |
Thanks for your work on this! Lots of people will appreciate this work i'm sure. So are there any outstanding issues? |
Thanks for your work on this as well! I am not seeing any particular issues remaining on this related to the WoRMS API or the |
Okay, great. closing |
A new bug using
taxize
with the WoRMS database. I noticed that many of the species for which I have traits data (independent of WoRMS) are not being found in the dataset I've created usingdownstream()
. It appears that thedownstream()
result is incomplete, missing species names for whichclassification()
returns valid results. EDIT: this seems to happen withchildren()
as well.Reproducible example:
A quick inspection of the
downstream()
results (x
) shows an alphabetical list of octopus species that abruptly ends after Octopus carolinensis and a handful of Octopus followed by a parenthetical (alternate genus). Not sure if that truncation is relevant... other examples didn't seem to truncate like that.This is not just at the lowest level. Balaena mysticetus is class Mammalia (id 1837) and order Cetartiodactyla, but
downstream(1837, db = 'worms', downto = 'order')
returns only Didelphimorphia. Which makes me wonder whether there are marine opposums. But more problematic is that it seems likedownstream()
depends on successful results at each intermediate step downward, sodownstream('mammalia', db = 'worms', downto = [anything])
will return an empty set.Session Info
The text was updated successfully, but these errors were encountered: