Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

listGenomes() fails to filter by *subset* with Ensembl Genomes instances #107

Closed
almeidasilvaf opened this issue Nov 10, 2023 · 3 comments
Closed

Comments

@almeidasilvaf
Copy link

Hi, @HajkD

When using the function listGenomes on Ensembl Genomes instances, it returns all genomes in all Ensembl instances, regardless of what users specify in the argument subset.

Below you can find an example for Ensembl Fungi, but it's the same for all other instances. Obviously, there are not 33791 species on Ensembl Fungi.

library(biomartr)
fungi <- listGenomes(db = "ensembl", subset = "EnsemblFungi", skip_bacteria = TRUE)
#> Starting information retrieval for: EnsemblVertebrates
#> Starting information retrieval for: EnsemblPlants
#> Starting information retrieval for: EnsemblFungi
#> Starting information retrieval for: EnsemblMetazoa
#> Starting information retrieval for: EnsemblBacteria
#> Starting information retrieval for: EnsemblProtists

length(fungi)
#> [1] 33791
head(fungi)
#> [1] "leptobrachium_leishanense"  "mus_musculus_pwkphj"       
#> [3] "strigops_habroptila"        "sus_scrofa_hampshire"      
#> [5] "struthio_camelus_australis" "latimeria_chalumnae"

Created on 2023-11-10 with reprex v2.0.2

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.1 (2023-06-16)
#>  os       Ubuntu 20.04.6 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Brussels
#>  date     2023-11-10
#>  pandoc   3.1.1 @ /usr/lib/rstudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package          * version    date (UTC) lib source
#>  AnnotationDbi      1.64.1     2023-11-03 [1] Bioconductor
#>  Biobase            2.62.0     2023-10-24 [1] Bioconductor
#>  BiocFileCache      2.10.1     2023-10-26 [1] Bioconductor
#>  BiocGenerics       0.48.1     2023-11-01 [1] Bioconductor
#>  biomaRt            2.58.0     2023-10-24 [1] Bioconductor
#>  biomartr         * 1.0.8.9000 2023-11-10 [1] Github (ropensci/biomartr@201b7f4)
#>  Biostrings         2.70.1     2023-10-25 [1] Bioconductor
#>  bit                4.0.5      2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64              4.0.5      2020-08-30 [1] CRAN (R 4.3.0)
#>  bitops             1.0-7      2021-04-24 [1] CRAN (R 4.3.0)
#>  blob               1.2.4      2023-03-17 [1] CRAN (R 4.3.0)
#>  cachem             1.0.8      2023-05-01 [1] CRAN (R 4.3.0)
#>  cli                3.6.1      2023-03-23 [1] CRAN (R 4.3.0)
#>  crayon             1.5.2      2022-09-29 [1] CRAN (R 4.3.0)
#>  curl               5.1.0      2023-10-02 [1] CRAN (R 4.3.0)
#>  data.table         1.14.8     2023-02-17 [1] CRAN (R 4.3.0)
#>  DBI                1.1.3      2022-06-18 [1] CRAN (R 4.3.0)
#>  dbplyr             2.4.0      2023-10-26 [1] CRAN (R 4.3.1)
#>  digest             0.6.33     2023-07-07 [1] CRAN (R 4.3.0)
#>  dplyr              1.1.3      2023-09-03 [1] CRAN (R 4.3.0)
#>  evaluate           0.23       2023-11-01 [1] CRAN (R 4.3.1)
#>  fansi              1.0.5      2023-10-08 [1] CRAN (R 4.3.0)
#>  fastmap            1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
#>  filelock           1.0.2      2018-10-05 [1] CRAN (R 4.3.0)
#>  fs                 1.6.3      2023-07-20 [1] CRAN (R 4.3.0)
#>  generics           0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
#>  GenomeInfoDb       1.38.0     2023-10-24 [1] Bioconductor
#>  GenomeInfoDbData   1.2.11     2023-11-09 [1] Bioconductor
#>  glue               1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
#>  hms                1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
#>  htmltools          0.5.7      2023-11-03 [1] CRAN (R 4.3.1)
#>  httr               1.4.7      2023-08-15 [1] CRAN (R 4.3.0)
#>  IRanges            2.36.0     2023-10-24 [1] Bioconductor
#>  jsonlite           1.8.7      2023-06-29 [1] CRAN (R 4.3.0)
#>  KEGGREST           1.42.0     2023-10-24 [1] Bioconductor
#>  knitr              1.45       2023-10-30 [1] CRAN (R 4.3.1)
#>  lifecycle          1.0.4      2023-11-07 [1] CRAN (R 4.3.1)
#>  magrittr           2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
#>  memoise            2.0.1      2021-11-26 [1] CRAN (R 4.3.0)
#>  pillar             1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig          2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
#>  png                0.1-8      2022-11-29 [1] CRAN (R 4.3.0)
#>  prettyunits        1.2.0      2023-09-24 [1] CRAN (R 4.3.0)
#>  progress           1.2.2      2019-05-16 [1] CRAN (R 4.3.0)
#>  purrr              1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache            0.16.0     2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3        1.8.2      2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo               1.25.0     2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils            2.12.2     2022-11-11 [1] CRAN (R 4.3.0)
#>  R6                 2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
#>  rappdirs           0.3.3      2021-01-31 [1] CRAN (R 4.3.0)
#>  RCurl              1.98-1.13  2023-11-02 [1] CRAN (R 4.3.1)
#>  readr              2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
#>  reprex             2.0.2      2022-08-17 [1] CRAN (R 4.3.0)
#>  rlang              1.1.2      2023-11-04 [1] CRAN (R 4.3.1)
#>  rmarkdown          2.25       2023-09-18 [1] CRAN (R 4.3.0)
#>  RSQLite            2.3.3      2023-11-04 [1] CRAN (R 4.3.1)
#>  rstudioapi         0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
#>  S4Vectors          0.40.1     2023-10-26 [1] Bioconductor
#>  sessioninfo        1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
#>  stringi            1.7.12     2023-01-11 [1] CRAN (R 4.3.0)
#>  stringr            1.5.0      2022-12-02 [1] CRAN (R 4.3.0)
#>  styler             1.10.2     2023-08-29 [1] CRAN (R 4.3.0)
#>  tibble             3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyselect         1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
#>  tzdb               0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
#>  utf8               1.2.4      2023-10-22 [1] CRAN (R 4.3.1)
#>  vctrs              0.6.4      2023-10-12 [1] CRAN (R 4.3.0)
#>  vroom              1.6.4      2023-10-02 [1] CRAN (R 4.3.0)
#>  withr              2.5.2      2023-10-30 [1] CRAN (R 4.3.1)
#>  xfun               0.41       2023-11-01 [1] CRAN (R 4.3.1)
#>  XML                3.99-0.15  2023-11-02 [1] CRAN (R 4.3.1)
#>  xml2               1.3.5      2023-07-06 [1] CRAN (R 4.3.0)
#>  XVector            0.42.0     2023-10-24 [1] Bioconductor
#>  yaml               2.3.7      2023-01-23 [1] CRAN (R 4.3.0)
#>  zlibbioc           1.48.0     2023-10-24 [1] Bioconductor
#> 
#>  [1] /home/faalm/R/x86_64-pc-linux-gnu-library/4.3
#>  [2] /usr/local/lib/R/site-library
#>  [3] /usr/lib/R/site-library
#>  [4] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I know that support for Ensembl Genomes was added recently, so it's still in experimental stage, but this is something that could be easily avoided by writing comprehensive unit tests for functions. Maybe that's something to consider for the future.

I also do not understand why {biomartr} has to download data for all Ensembl instances beforehand, even when users specify that they only want one instance. This could also be fixed to improve efficiency.

Best,
Fabricio

@Roleren
Copy link
Contributor

Roleren commented Nov 29, 2023

Hello you are right, these are now fixed in my latest pull request #108. Either wait for merge or pull my version.

Let me know if there is anything else :)

@almeidasilvaf
Copy link
Author

Hi, @Roleren

Amazing! Thank you very much!

Feel free to close this issue once the PR is merged.

Best,
Fabricio

@HajkD
Copy link
Member

HajkD commented Nov 30, 2023

Hi @almeidasilvaf,

Many thanks for notifying us about this issue and @Roleren: Thank you so much for addressing it.

I now merged the fix and biomartr should be in better shape now.

Cheers,
Hajk

@HajkD HajkD closed this as completed Nov 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants