Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

taxize::downstream() result is length 0 error using WoRMS #847

Closed
oharac opened this issue Sep 28, 2020 · 13 comments
Closed

taxize::downstream() result is length 0 error using WoRMS #847

oharac opened this issue Sep 28, 2020 · 13 comments
Labels
Milestone

Comments

@oharac
Copy link

oharac commented Sep 28, 2020

Hi,

I'm finding this package to be really useful, but I'm running into a bug. I am using taxize::downstream to access the WoRMS database to get all families related to a set of specific orders. For nearly everything, it works fine, but for decapoda (1130) and amphipoda (1135) it returns this error:

Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1) : 
  values must be length 1,
 but FUN(X[[53]]) result is length 0

EDIT: I see that this is similar to #821 and #824 - those were related to a problem with rank name - perhaps something similar happening here?

Reproducible example:

library(taxize)
x <- downstream(sci_id = 'decapoda', db = 'worms', downto = 'family', intermediate = FALSE)
x <- downstream(sci_id = 1130, db = 'worms', downto = 'family', intermediate = FALSE)
Session Info
R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] taxize_0.9.98.91

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5        ape_5.4-1         lattice_0.20-41   prettyunits_1.1.1 ps_1.3.3         
 [6] zoo_1.8-8         assertthat_0.2.1  rprojroot_1.3-2   digest_0.6.25     foreach_1.5.0    
[11] R6_2.4.1          plyr_1.8.6        backports_1.1.8   RSQLite_2.2.0     pillar_1.4.6     
[16] rlang_0.4.7       curl_4.3          uuid_0.1-4        rstudioapi_0.11   data.table_1.13.0
[21] callr_3.4.3       blob_1.2.1        worrms_0.4.2      desc_1.2.0        urltools_1.7.3   
[26] devtools_2.3.0    stringr_1.4.0     bit_4.0.4         triebeard_0.3.0   compiler_3.6.3   
[31] xfun_0.14         pkgconfig_2.0.3   pkgbuild_1.0.8    conditionz_0.1.0  tidyselect_1.1.0 
[36] tibble_3.0.3      httpcode_0.3.0    codetools_0.2-16  reshape_0.8.8     fansi_0.4.1      
[41] crayon_1.3.4      dplyr_1.0.2       hoardr_0.5.2      dbplyr_1.4.4      withr_2.2.0      
[46] rappdirs_0.3.1    crul_1.0.0        grid_3.6.3        nlme_3.1-148      jsonlite_1.7.1   
[51] lifecycle_0.2.0   DBI_1.1.0         magrittr_1.5      taxizedb_0.2.2.93 cli_2.0.2        
[56] stringi_1.5.3     fs_1.4.1          remotes_2.1.1     testthat_2.3.2    xml2_1.3.2       
[61] ellipsis_0.3.1    generics_0.0.2    vctrs_0.3.4       iterators_1.0.12  tools_3.6.3      
[66] bold_1.1.0        bit64_4.0.5       glue_1.4.2        purrr_0.3.4       processx_3.4.2   
[71] pkgload_1.1.0     parallel_3.6.3    sessioninfo_1.1.1 memoise_1.1.0     knitr_1.28       
[76] usethis_1.6.1  
@sckott sckott closed this as completed in b8c948c Sep 28, 2020
@sckott
Copy link
Contributor

sckott commented Sep 28, 2020

Thanks - taxonomy is a deep dark hole from which many weird taxonomic ranks emerge from time to time. one of the taxa had the rank "epifamily" https://www.marinespecies.org/aphia.php?p=taxdetails&id=1459303

fixed, if you reinstall it should work

@sckott sckott modified the milestones: v0.9.99, v1.0 Sep 28, 2020
@oharac
Copy link
Author

oharac commented Sep 28, 2020 via email

@sckott
Copy link
Contributor

sckott commented Sep 28, 2020

thanks

@oharac
Copy link
Author

oharac commented Oct 1, 2020

I encountered this same error with the WoRMS database again, but seems to be for a different reason. taxize::downstream for family Polynoidae (id 939) returns:

downstream(939, db = 'worms', downto = 'species')[[1]]
# Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1) : 
#   values must be length 1,
#  but FUN(X[[376]]) result is length 0

Knowing that prior issues were due to oddball ranks, so I checked the downstream ranks. Here the problem is an NA rank, caused by a null rank listed in the output from the AphiaChildrenByAphiaID API endpoint. The ones I've found so far are children of ID 129496 though I have not done an exhaustive search so there may be others as well. Here is part of the record for one example, ID 333822, as retrieved from https://www.marinespecies.org/rest/AphiaChildrenByAphiaID/129496?marine_only=true&offset=95:

    "AphiaID": 333822,
    "url": "https://www.marinespecies.org/aphia.php?p=taxdetails&id=333822",
    "scientificname": "Lepidonotus pellucidus",
    "authority": "Dyster in Johnston, 1865",
    "status": "accepted",
    "unacceptreason": null,
    "taxonRankID": 220,
    "rank": null,
    "valid_AphiaID": 333822,
    "valid_name": "Lepidonotus pellucidus",

However, when accessing this species in the other direction, using the AphiaClassificationByAphiaID endpoint (https://www.marinespecies.org/rest/AphiaClassificationByAphiaID/333822), the API seems to return the rank as "Species" as expected. This seems to be an issue on the WoRMS end (and I emailed them to point it out), but in the mean time perhaps there's a graceful way to handle the NA rank value in taxize::downstream() without throwing an error. Thanks!

@sckott sckott modified the milestones: v1.0, v0.9.99 Oct 1, 2020
sckott added a commit that referenced this issue Oct 1, 2020
- ignore_missing_rank=TRUE will set any ranks that are NA to "no rank"
- then they are treated as NCBI no ranks are treated, for the most part dropped
- change worms_downstream to use the new param ignore_missing_rank
@sckott
Copy link
Contributor

sckott commented Oct 1, 2020

Thanks for the report.
Unfortunately, there's no way to handle missing ranks really, other than perhaps making additional http requests for every single name that does not have a rank, which seems like a mess and I'd rather avoid doing that.
For now, I'm changing (reinstall to get change) the code to change missing ranks for WORMS to "no rank" (which NCBI has a lot of), and then the existing code handles the "no rank" already. "no rank" taxa are dropped in most cases. The errors are coming from the prune_too_low function https://github.com/ropensci/taxize/blob/master/R/downstream-utils.R#L9 where we drop any taxa that have ranks lower than the target rank.

@oharac
Copy link
Author

oharac commented Oct 3, 2020

I emailed the WoRMS folks and their response was that they couldn't replicate the null rank thing - so checking today, I can't replicate it either - I guess it was an intermittent problem (though I could replicate it on the day I posted the issue).

@sckott
Copy link
Contributor

sckott commented Oct 6, 2020

Thanks for the follow up. Well glad it was an intermittent thing; hopefully it doesn't come back.

@oharac
Copy link
Author

oharac commented Oct 19, 2020

A new instance of the zero-length error in WoRMS downstream:

downstream(345465, db = 'worms', downto = 'class', marine_only = FALSE)[[1]]

In case this is a similar problem to those noted before, where odd taxonomic ranks would create this error, I checked the children of this sequentially to identify any unusual ranks.

  • children(345465, db = 'worms', marine_only = FALSE)[[1]] returned a couple of "Subphylum" ranks.
  • children(588641, db = 'worms', marine_only = FALSE)[[1]] returned a couple of "Infraphylum" ranks.
  • Below those are a couple of "Superclass" ranks.
  • children(369192, db = 'worms', marine_only = FALSE)[[1]] returned 151 instances where the classification skips from "Subphylum" (369192) all the way down to "Genus" in one step, which seems odd.

@sckott
Copy link
Contributor

sckott commented Oct 19, 2020

thanks! will have a look

@sckott
Copy link
Contributor

sckott commented Oct 20, 2020

@oharac should be fixed now. the missing rank was infraphylum

@oharac
Copy link
Author

oharac commented Aug 21, 2021

EDITED...

getting back into this project, ran across this error again

Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1) : 
  values must be length 1,
 but FUN(X[[53]]) result is length 0

Reprex:

library(taxize)
downstream(sci_id = 1821, db = 'worms', downto = 'class')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[3]]) result is length 0

Created on 2021-08-20 by the reprex package (v1.0.0)

Sequential calls to children showed where the code seemed to be choking. I wonder if these ranks need to be added to the rank_ref_zoo?

parvphylum, megaclass, gigaclass

More reprex:

library(taxize)
### chokes on 1821:
downstream(1821, db = 'worms', downto = 'class')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[3]]) result is length 0
children(sci_id = 1821, db = 'worms')
#> $`1821`
#> # A tibble: 4 x 3
#>   childtaxa_id childtaxa_name  childtaxa_rank
#>          <int> <chr>           <chr>         
#> 1         1824 Cephalochordata Subphylum     
#> 2       146420 Tunicata        Subphylum     
#> 3         1822 Urochordata     Subphylum     
#> 4       146419 Vertebrata      Subphylum     
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### chokes on subphylum Vertebrata:
downstream(146419, downto = 'class', db = 'worms')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[3]]) result is length 0
children(146419, db = 'worms')
#> $`146419`
#> # A tibble: 2 x 3
#>   childtaxa_id childtaxa_name childtaxa_rank
#>          <int> <chr>          <chr>         
#> 1         1829 Agnatha        Infraphylum   
#> 2         1828 Gnathostomata  Infraphylum   
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### chokes on infraphylum Gnathostomata:
downstream(1828, downto = 'class', db = 'worms')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[1]]) result is length 0
children(1828, db = 'worms')
#> $`1828`
#> # A tibble: 4 x 3
#>   childtaxa_id childtaxa_name childtaxa_rank
#>          <int> <chr>          <chr>         
#> 1      1517375 Chondrichthyes Parvphylum    
#> 2       152352 Osteichthyes   Parvphylum    
#> 3        11676 Pisces         Superclass    
#> 4         1831 Tetrapoda      Megaclass     
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### chokes on parvphylum Osteichthyes
downstream(152352, downto = 'class', db = 'worms')
#> Error in vapply(x$rank, function(z) which_rank(z, zoo = zoo), 1): values must be length 1,
#>  but FUN(X[[1]]) result is length 0
children(152352, db = 'worms')
#> $`152352`
#> # A tibble: 2 x 3
#>   childtaxa_id childtaxa_name childtaxa_rank
#>          <int> <chr>          <chr>         
#> 1        10194 Actinopterygii Gigaclass     
#> 2       163509 Sarcopterygii  Gigaclass     
#> 
#> attr(,"class")
#> [1] "children"
#> attr(,"db")
#> [1] "worms"

### finally is OK at this stage
downstream(10194, downto = 'class', db = 'worms')
#> $`10194`
#>       id        name  rank
#> 1 843664 Actinopteri class
#> 
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "worms"
downstream(163509, downto = 'class', db = 'worms')
#> $`163509`
#>       id        name  rank
#> 1 843665 Coelacanthi class
#> 
#> attr(,"class")
#> [1] "downstream"
#> attr(,"db")
#> [1] "worms"
# both OK

Created on 2021-08-21 by the reprex package (v1.0.0)

@zachary-foster
Copy link
Collaborator

Thanks for the info! I will look into this and see about adding those ranks.

@rogmei rogmei mentioned this issue Nov 1, 2021
@zachary-foster
Copy link
Collaborator

Sorry for the delay. I have added the ranks and made the error message better.

You can try out the change by installing this version that will be pushed to CRAN soon hopefully, but note this version has many other changes and might break other code.

install.packages("remotes")
remotes::install_github("ropensci/taxize")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants