Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allodb: An R package for biomass estimation at extratropical forest plots #436

Closed
15 of 27 tasks
gonzalezeb opened this issue Mar 17, 2021 · 54 comments
Closed
15 of 27 tasks

Comments

@gonzalezeb
Copy link

gonzalezeb commented Mar 17, 2021

Date accepted: 2021-09-27
Submitting Author: Erika Gonzalez-Akre (@gonzalezeb)
Other Authors: Camille Piponiot (@cpiponiot), Mauro Lepore (@maurolepore), Kristina Teixeira (@teixeirak)
Repository: https://github.com/forestgeo/allodb
Version submitted: 0.0.0.9000
Editor: @adamhsparks
Reviewers: @jeffreyhanson, @jstillh

Due date for @jeffreyhanson: 2021-06-14

Due date for @jstillh: 2021-06-24
Archive: TBD
Version accepted: TBD


  • Paste the full DESCRIPTION file inside a code block below:
Package: allodb
Title: Tree Biomass Estimation at Extratropical Forest Plots
Version: 0.0.0.9000
Authors@R: 
    c(person(given = "Erika",
             family = "Gonzalez-Akre",
             role = c("aut", "cre", "cph"),
             email = "GonzalezEB@si.edu"),
      person(given = "Camille",
             family = "Piponiot",
             role = "aut",
             email = "camille.piponiot@gmail.com"),
      person(given = "Mauro",
             family = "Lepore",
             role = "aut",
             email = "maurolepore@gmail.com",
             comment = c(ORCID = "0000-0002-1986-7988")),
      person(given = "Kristina",
             family = "Anderson-Teixeira",
             role = "aut",
             email = "TeixeiraK@si.edu"))
Description: Tool to standardize and simplify the tree biomass
    estimation process across globally distributed extratropical forests.
License: GPL-3
URL: https://github.com/forestgeo/allodb
BugReports: https://github.com/forestgeo/allodb/issues
Depends: 
    R (>= 3.5.0)
Imports: 
    data.table (>= 1.12.8),
    ggplot2 (>= 3.3.2),
    kgc (>= 1.0.0.2),
    utils
Suggests: 
    covr (>= 3.2.1),
    knitr (>= 1.21),
    purrr,
    rmarkdown,
    spelling,
    testthat (>= 3.0.0)
VignetteBuilder: 
    knitr
Config/testthat/edition: 3
Encoding: UTF-8
Language: en-US
LazyData: true
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.1.1

Scope

  • Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):

    • data retrieval
    • data extraction
    • data munging
    • data deposition
    • workflow automation
    • version control
    • citation management and bibliometrics
    • scientific software wrappers
    • field and lab reproducibility tools
    • database software bindings
    • geospatial data
    • text analysis
  • Explain how and why the package falls under these categories (briefly, 1-2 sentences):
    allodb integrates compiled data and built-in functions to calculate tree above-ground biomass (AGB) which is a critical value used by forest ecologists to estimate the amount of carbon sequester and resealed by forests. allodb falls in the "field and laboratory reproducibility tools" category as our package intents to improve and standardize the calculation of a widely used forest metric.

  • Who is the target audience and what are scientific applications of this package?
    allodb is designed to be used by forest ecologists with basic or advanced knowledge of R. This scientific application is intended to be used by individuals interested in calculating forest metrics such as AGB in extratropical forest (subtropical, temperate, and boreal forests).

  • Are there other R packages that accomplish the same thing? If so, how does yours differ or meet our criteria for best-in-category?
    The BIOMASS package is designed to estimate biomass in tropical forest. The main difference between allodb and BIOMASS is the set of equations used to estimate AGB. There is an existing calibrated equation used globally for tropical environments but not for areas outside the tropics. Our goal is to provide a set of curated equations and the ability to add new and additional equations in order to have a more reliable value of AGB outside the tropics. We do not duplicate any functionality of the BIOMASS package and only agree on the ecological scope.

  • (If applicable) Does your package comply with our guidance around Ethics, Data Privacy and Human Subjects Research?
    N/A

  • If you made a pre-submission enquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
    @melvidoni or issue 427

Technical checks

Confirm each of the following by checking the box.

This package:

Publication options

  • Do you intend for this package to go on CRAN?

  • Do you intend for this package to go on Bioconductor?

  • Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:

MEE Options
  • The package is novel and will be of interest to the broad readership of the journal.
  • The manuscript describing the package is no longer than 3000 words.
  • You intend to archive the code for the package in a long-term repository which meets the requirements of the journal (see MEE's Policy on Publishing Code)
  • (Scope: Do consider MEE's Aims and Scope for your manuscript. We make no guarantee that your manuscript will be within MEE scope.)
  • (Although not required, we strongly recommend having a full manuscript prepared when you submit here.)
  • (Please do not submit your package separately to Methods in Ecology and Evolution)

Code of conduct

@noamross
Copy link
Contributor

@ropensci-review-bot assign @adamhsparks as editor

@ropensci-review-bot
Copy link
Collaborator

Assigned! @adamhsparks is now the editor

@adamhsparks
Copy link
Member

adamhsparks commented Mar 20, 2021

Editor checks:

  • Documentation: The package has sufficient documentation available online (README, pkgdown docs) to allow for an assessment of functionality and scope without installing the package.
  • Fit: The package meets criteria for fit and overlap
  • Automated tests: Package has a testing suite and is tested via a CI service.
  • License: The package has a CRAN or OSI accepted license
  • Repository: The repository link resolves correctly

Editor comments

G'day, Erika,
I'll be your guest editor for allodb.

The package passes all standard local checks and tests with flying colours. But checking the code syntax using lintr::lint_package() (I for some reason can't use goodpractice::gp() locally so I'm falling back to this), I get several suggestions for improving the code. I suggest going through them and addressing where appropriate and possible and perhaps noting in your submission those flags that are not appropriate to address for allodb. This will make the review process easier for the reviewers.

Also, I note that your package's website is incomplete, but once accepted to rOpenSci a website will be generated for you and I note that your current README contains the necessary information on how to install and use the package, so we're good there.

Once you've addressed the issues flagged with goodpractice::gp() or lintr::lint_package(), I'll move on to finding reviewers for you.


@adamhsparks
Copy link
Member

@ropensci-review-bot run goodpractice

@adamhsparks
Copy link
Member

Hi, @gonzalezeb, just checking in to see if there’s anything I can do to help you with your submission process?

@gonzalezeb
Copy link
Author

Hi Adam, thanks for checking in.
I am getting some errors after addressing some issues brought out by lintr::lint_package() and goodpractice::gp(), but my collaborator is not, so I am not sure if everything is good (at least not from my laptop).

When I run devtools::test() to test functionalities, I get that some tests are not running well. When I run goodpractice::goodpractice() on 3/26 I got "Supreme package..." but now I get a few errors. Maybe if you run some checks again you can let me know if you are still seeing issues..

@adamhsparks
Copy link
Member

adamhsparks commented Apr 5, 2021

Yes, it passes both the CRAN checks when I run them with, and devtools::test(), but the code style, etc. as suggested by the rOpenSci guidelines have some suggestions.

This is what I see when I run lint_package().

Some of these can easily be addressed, others like the first hit with names, probably have good reasons in your field to have the object name like that. That's fine, but it should be noted for the reviewers.

> lintr::lint_package()
................
R/est_params.R:46:24: style: Variable and function name style should be snake_case.
                       Nres = 1e4
                       ^~~~
R/est_params.R:64:51: style: Commas should always have a space after.
                       coords = dfobs[i, c("long","lat")],
                                                  ^
R/est_params.R:72:66: style: Put spaces around all infix operators.
    if (length(unique(df$equation_id)) == 1) df[1, 3] <- df[1, 3]*1.01
                                                                ~^~
R/get_biomass.R:56:25: style: Variable and function name style should be snake_case.
                        Nres = 1e4) {
                        ^~~~
R/illustrate_allodb.R:48:31: style: Variable and function name style should be snake_case.
                              Nres = 1e4) {
                              ^~~~
R/imports.R:6:1: style: Variable and function name style should be snake_case.
data.frame <- function(...) {
^~~~~~~~~~
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:71:27: style: Variable and function name style should be snake_case.
                          new_minDBH = NULL,
                          ^~~~~~~~~~
R/new_equations.R:72:27: style: Variable and function name style should be snake_case.
                          new_maxDBH = NULL,
                          ^~~~~~~~~~
R/new_equations.R:73:27: style: Variable and function name style should be snake_case.
                          new_sampleSize = NULL,
                          ^~~~~~~~~~~~~~
R/new_equations.R:74:27: style: Variable and function name style should be snake_case.
                          new_unitDBH = "cm",
                          ^~~~~~~~~~~
R/new_equations.R:75:27: style: Variable and function name style should be snake_case.
                          new_unitOutput = "kg",
                          ^~~~~~~~~~~~~~
R/new_equations.R:76:27: style: Variable and function name style should be snake_case.
                          new_inputVar = "DBH",
                          ^~~~~~~~~~~~
R/new_equations.R:77:27: style: Variable and function name style should be snake_case.
                          new_outputVar = "Total aboveground biomass",
                          ^~~~~~~~~~~~~
R/new_equations.R:88:34: style: Variable and function name style should be snake_case.
  suppressWarnings(new_equations$dbh_unit_CF <-
                                 ^~~~~~~~~~~
R/new_equations.R:90:34: style: Variable and function name style should be snake_case.
  suppressWarnings(new_equations$output_units_CF <-
                                 ^~~~~~~~~~~~~~~
R/new_equations.R:112:5: style: Variable and function name style should be snake_case.
    toMerge <- eq_jansen[, c("hsub", "equation_allometry")]
    ^~~~~~~
R/new_equations.R:113:64: style: Variable and function name style should be snake_case.
    eq_jansen$equation_allometry <- apply(toMerge, 1, function(X) {
                                                               ^
R/new_equations.R:181:30: style: There should be a space between right parenthesis and an opening curly brace.
    if (is.matrix(new_coords)){
                             ^~
R/new_equations.R:265:5: style: Variable and function name style should be snake_case.
    equationID <- paste0("new", seq_len(length(new_taxa)))
    ^~~~~~~~~~
R/new_equations.R:266:5: style: Variable and function name style should be snake_case.
    coordsEq <- cbind(long = new_coords[, 1],
    ^~~~~~~~
R/new_equations.R:268:5: style: Variable and function name style should be snake_case.
    rcoordsEq <- round(coordsEq * 2 - 0.5) / 2 + 0.25
    ^~~~~~~~~
R/new_equations.R:270:5: style: Variable and function name style should be snake_case.
    koppenZones <- apply(rcoordsEq, 1, function(X) {
    ^~~~~~~~~~~
R/new_equations.R:270:49: style: Variable and function name style should be snake_case.
    koppenZones <- apply(rcoordsEq, 1, function(X) {
                                                ^
R/new_equations.R:273:5: style: Variable and function name style should be snake_case.
    koppenZones <- as.character(unlist(koppenZones))
    ^~~~~~~~~~~
R/new_equations.R:301:5: style: Variable and function name style should be snake_case.
    dbhCF <-
    ^~~~~
R/new_equations.R:303:5: style: Variable and function name style should be snake_case.
    outputCF <-
    ^~~~~~~~
R/new_equations.R:305:28: style: Variable and function name style should be snake_case.
    suppressWarnings(dbhCF$dbh_unit_CF <-
                           ^~~~~~~~~~~
R/new_equations.R:307:31: style: Variable and function name style should be snake_case.
    suppressWarnings(outputCF$output_units_CF <-
                              ^~~~~~~~~~~~~~~
R/resample_agb.R:41:26: style: Variable and function name style should be snake_case.
                         Nres = 1e4) {
                         ^~~~
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
  list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
                                              ^
R/resample_agb.R:89:5: warning: local variablesampled_dbhassigned but may not be used
    sampled_dbh <- list_dbh
    ^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variablesampled_dbhassigned but may not be used
      sampled_dbh <- list_dbh[[j]]
      ^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
  suppressWarnings(dfequation$wN <-
                              ^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
  coordsSite <- t(as.numeric(coords))
  ^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
  rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
  ^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
  ^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
                                              ^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
  dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
             ^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
  dfequation$wT <- 1e-6
             ^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
  eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
  ^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
  eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
  ^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
  vecW <- dfequation$w
  ^~~~
tests/testthat/test-est_params.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-est_params.R:10:39: style: Commas should always have a space after.
                         long = c(-78,-85),
                                      ^
tests/testthat/test-get_biomass.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:25:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:92:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:106:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-get_biomass.R:122:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-illustrate_allodb.R:10:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-illustrate_allodb.R:31:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-new_equations.R:4:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^
tests/testthat/test-new_equations.R:127:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
  {
  ^
tests/testthat/test-resample_agb.R:19:3: style: Opening curly braces should never go on their own line and should always be followed by a new line.
  {
  ^
tests/testthat/test-weight_allom.R:5:11: style: Opening curly braces should never go on their own line and should always be followed by a new line.
          {
          ^

@adamhsparks
Copy link
Member

There is a note about UTF-8 strings as well that I'm seeing that you'll need to address as well.

* checking data for non-ASCII characters ... NOTE
  Note: found 156 marked UTF-8 strings

@gonzalezeb
Copy link
Author

Thanks @adamhsparks. Most of the suggestion after running lint_package() have already been applied, see for example here and here. I am not sure why they don't show to you. I still need to work on some thought. Let me know if after installing the package again you still see those.

@adamhsparks
Copy link
Member

adamhsparks commented Apr 5, 2021 via email

@adamhsparks
Copy link
Member

OK, this is what I get now.

> lintr::lint_package()
................
R/new_equations.R:62:1: style: functions should have cyclomatic complexity of less than 15, this has 24.
new_equations <- function(subset_taxa = "all",
^
R/new_equations.R:115:65: style: Variable and function name style should be snake_case.
    eq_jansen$equation_allometry <- apply(to_merge, 1, function(X) {
                                                                ^
R/new_equations.R:273:51: style: Variable and function name style should be snake_case.
    koppen_zones <- apply(rcoords_eq, 1, function(X) {
                                                  ^
R/resample_agb.R:78:47: style: Variable and function name style should be snake_case.
  list_dbh <- apply(dfsub[, 1:3], 1, function(X) {
                                              ^
R/resample_agb.R:89:5: warning: local variablesampled_dbhassigned but may not be used
    sampled_dbh <- list_dbh
    ^~~~~~~~~~~
R/resample_agb.R:97:7: warning: local variablesampled_dbhassigned but may not be used
      sampled_dbh <- list_dbh[[j]]
      ^~~~~~~~~~~
R/weight_allom.R:45:31: style: Variable and function name style should be snake_case.
  suppressWarnings(dfequation$wN <-
                              ^~
R/weight_allom.R:53:3: style: Variable and function name style should be snake_case.
  coordsSite <- t(as.numeric(coords))
  ^~~~~~~~~~
R/weight_allom.R:54:3: style: Variable and function name style should be snake_case.
  rcoordsSite <- round(coordsSite * 2 - 0.5) / 2 + 0.25
  ^~~~~~~~~~~
R/weight_allom.R:56:3: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
  ^~~~~~~~~
R/weight_allom.R:56:47: style: Variable and function name style should be snake_case.
  koppenObs <- apply(rcoordsSite, 1, function(X) {
                                              ^
R/weight_allom.R:64:14: style: Variable and function name style should be snake_case.
  dfequation$wE <- vapply(dfequation$koppen, compare_koppen, FUN.VALUE = 0.9)
             ^~
R/weight_allom.R:86:14: style: Variable and function name style should be snake_case.
  dfequation$wT <- 1e-6
             ^~
R/weight_allom.R:91:3: style: Variable and function name style should be snake_case.
  eqtaxaG <- data.table::tstrsplit(dfequation$taxa1, " ")[[1]]
  ^~~~~~~
R/weight_allom.R:92:3: style: Variable and function name style should be snake_case.
  eqtaxaS <- data.table::tstrsplit(dfequation$taxa1, " ")[[2]]
  ^~~~~~~
R/weight_allom.R:152:3: style: Variable and function name style should be snake_case.
  vecW <- dfequation$w
  ^~~~
tests/testthat/test-est_params.R:9:39: style: Commas should always have a space after.
                         long = c(-78,-85),

Are there any that I can provide assistance with?

@ropensci ropensci deleted a comment from ropensci-review-bot Apr 5, 2021
@ropensci ropensci deleted a comment from ropensci-review-bot Apr 5, 2021
@gonzalezeb
Copy link
Author

Hi @adamhsparks. Most issues brought out by lint have been addressed, only two remain but I am little stuck on them. Maybe you can give me some guidance.
image

In R/new_equations, the function contains a lot of 'if' because it does a lot of sanity checks on the user-provided data. We could remove these, at the risk of having undetected problems in the user-provided allometries; and it seems that moving the 'ifs' to another function would make much sense.

In R/weight_allom, the problem is with "wE" which is the original col name of a matrix (koppenMatrix) been used.

@adamhsparks
Copy link
Member

Thanks, @gonzalezeb, for addressing these changes. I think that both of these are acceptable given the reasons stated. I'll begin looking for reviewers now.

@adamhsparks
Copy link
Member

@gonzalezeb, can you please update your README badges for allodb using rodev::use_review_badge(436) and add a NEWS.md file, usethis::use_news_md()?

@adamhsparks
Copy link
Member

@gonzalezeb, can you also please address the errors that are encountered when running checks?

==> devtools::check()

ℹ Updating allodb documentationLoading allodb
Writing NAMESPACE
Writing NAMESPACE
── Building ────────────────────────────────────────────────────────── allodb ──
Setting env vars:CFLAGS    : -Wall -pedantic -fdiagnostics-color=alwaysCXXFLAGS  : -Wall -pedantic -fdiagnostics-color=alwaysCXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
────────────────────────────────────────────────────────────────────────────────
✓  checking for file/Users/adamsparks/Development/GitHub/Reviews/allodb/DESCRIPTION...preparingallodb:checking DESCRIPTION meta-information ...installing the package to build vignettes
E  creating vignettes (5.3s)
   --- re-buildingallodb-vignette.Rmdusing rmarkdown
   Quitting from lines 81-87 (allodb-vignette.Rmd) 
   Error: processing vignette 'allodb-vignette.Rmd' failed with diagnostics:
   arguments imply differing number of rows: 9991, 19
   --- failed re-buildingallodb-vignette.RmdSUMMARY: processing the following file failed:allodb-vignette.RmdError: Vignette re-building failed.
   Execution halted

Error: 'col_ed' is not an exported object from 'namespace:cli'
Execution halted

Exited with status 1.

Once that's dealt with, I'll look for reviewers. 🙏

@gonzalezeb
Copy link
Author

Hi @adamhsparks, allodb is passing all checks, please let me know if you see the same (I hope!)

devtools::check(remote = TRUE, manual = TRUE)
i Updating allodb documentation
i Loading allodb
Writing NAMESPACE
Writing NAMESPACE
-- Building --------------------------------------------------- allodb --
Setting env vars:
* CFLAGS    : -Wall -pedantic -fdiagnostics-color=always
* CXXFLAGS  : -Wall -pedantic -fdiagnostics-color=always
* CXX11FLAGS: -Wall -pedantic -fdiagnostics-color=always
-------------------------------------------------------------------------checking for file 'C:\Users\erikab\Dropbox (Smithsonian)\GitHub\allodb/DESCRIPTION' (349ms)
-  preparing 'allodb': (11.3s)
√  checking DESCRIPTION meta-information ... 
-  installing the package to build vignettescreating vignettes (39.3s)
-  checking for LF line-endings in source and make files and shell scripts (11.1s)
-  checking for empty or unneeded directories
-  looking to see if a 'data/datalist' file should be added
-  building 'allodb_0.0.0.9000.tar.gz'
   
-- Checking --------------------------------------------------- allodb --
Setting env vars:
* _R_CHECK_CRAN_INCOMING_USE_ASPELL_: TRUE
* _R_CHECK_CRAN_INCOMING_REMOTE_    : TRUE
* _R_CHECK_CRAN_INCOMING_           : TRUE
* _R_CHECK_FORCE_SUGGESTS_          : FALSE
* NOT_CRAN                          : true
-- R CMD check ----------------------------------------------------------
-  using log directory 'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck' (1s)
-  using R version 3.6.3 (2020-02-29)
-  using platform: x86_64-w64-mingw32 (64-bit)
-  using session charset: ISO8859-1
-  using option '--as-cran' (402ms)
√  checking for file 'allodb/DESCRIPTION'
-  this is package 'allodb' version '0.0.0.9000'
-  package encoding: UTF-8
N  checking CRAN incoming feasibility (35.1s)
   Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'
   
   New submission
   
   Version contains large components (0.0.0.9000)
√  checking package namespace information ...checking package dependencies (2.3s)
√  checking if this is a source package ...checking if there is a namespacechecking for executable files (1.2s)
√  checking for hidden files and directories ...checking for portable file names ...checking whether package 'allodb' can be installed (11.2s)
√  checking installed package size ...checking package directory (894ms)
√  checking for future file timestamps (624ms)
√  checking 'build' directory ...checking DESCRIPTION meta-information (718ms)
√  checking top-level files ...checking for left-over files ...checking index information (353ms)
√  checking package subdirectories (392ms)
√  checking R files for non-ASCII characters ...checking R files for syntax errors ...checking whether the package can be loaded (355ms)
√  checking whether the package can be loaded with stated dependencies ...checking whether the package can be unloaded cleanly (336ms)
√  checking whether the namespace can be loaded with stated dependencies (360ms)
√  checking whether the namespace can be unloaded cleanly (459ms)
√  checking loading without being on the library search path (780ms)
√  checking use of S3 registration (2.5s)
√  checking dependencies in R code (2s)
√  checking S3 generic/method consistency (783ms)
√  checking replacement functions (335ms)
√  checking foreign function calls (802ms)
√  checking R code for possible problems (5.4s)
√  checking Rd files (681ms)
√  checking Rd metadata ...checking Rd line widths ...checking Rd cross-references (343ms)
√  checking for missing documentation entries (473ms)
√  checking for code/documentation mismatches (1.5s)
√  checking Rd \usage sections (1.1s)
√  checking Rd contents ...checking for unstated dependencies in examples ...checking contents of 'data' directory (696ms)
√  checking data for non-ASCII characters (1s)
√  checking data for ASCII and uncompressed saves (588ms)
√  checking R/sysdata.rda (478ms)
√  checking installed files from 'inst/doc' ...checking files in 'vignettes' ...checking examples (4.7s)
√  checking for unstated dependencies in 'tests' ... 
-  checking tests (429ms)
√  Running 'testthat.R' (8.7s)
√  checking for unstated dependencies in vignettes (9.1s)
√  checking package vignettes in 'inst/doc' ...checking re-building of vignette outputs (23.6s)
√  checking PDF version of manual (3.6s)
√  checking for detritus in the temp directory
   
   See
     'C:/Users/erikab/AppData/Local/Temp/Rtmp6dUAvx/allodb.Rcheck/00check.log'
   for details.
   
   
-- R CMD check results --------------------------- allodb 0.0.0.9000 ----
Duration: 2m 0.2s

> checking CRAN incoming feasibility ... NOTE
  Maintainer: 'Erika Gonzalez-Akre <GonzalezEB@si.edu>'
  
  New submission
  
  Version contains large components (0.0.0.9000)

0 errors| 0 warnings| 1 note x

@adamhsparks
Copy link
Member

Hi @gonzalezeb, I'm getting a NOTE when I run R CMD check.

Note: found 156 marked UTF-8 strings

Can you check into this?

As it's just a note, not an error in the checks, I'll start looking for reviewers now.

@adamhsparks
Copy link
Member

@ropensci-review-bot assign @jeffreyhanson

@gonzalezeb
Copy link
Author

Hi @adamhsparks. Sorry, I am a little behind with this and have solved only minor issues. There are some comments on the conceptual aspects of the package (i.e. here) where I get the point but changing our approach will require some (lot's!) of work, so I am giving it a careful thought. How should I tackle issues like those?

@adamhsparks
Copy link
Member

adamhsparks commented Jul 19, 2021

Hi @gonzalezeb, which specific aspect were you linking to and asking for guidance on? Your link only links to the whole issue here, not one of the reviewers' comments or issue that they may have opened in your repository?

@adamhsparks
Copy link
Member

Hi @gonzalezeb, do you need more time to work on this? We can put a holding tag on it and revisit it after three months to see where things are for up to a year. Would that be useful to you?

@gonzalezeb
Copy link
Author

Hi @adamhsparks, thanks for checking. We have made some substantial progress but I still need to work in some minor stuff. The plan is to finish this before Sep 15 as we want to submit our revised manuscript to MEE also by then. Will that be ok with you all?

@adamhsparks
Copy link
Member

adamhsparks commented Aug 4, 2021

Sounds good, Erika! Thanks for updating me.

@gonzalezeb
Copy link
Author

Hi @adamhsparks, I wasn’t sure how to respond to the multiples reviews so I hope the following is sufficient. We are very grateful for the thoughtful suggestions by reviewers, and have implemented most.

Initial suggestions by @jeffreyhanson (5/19/21), have been addressed:

  1. The package website is working (https://forestgeo.github.io/allodb/)
  2. Documentation for multiple functions has been improved by using code formatting (eg. using ‘numeric vector’ instead of ‘numerical vector’, added (tibble::tibble() object), etc).
  3. Typos have been corrected
  4. Unnecessary R and non-R script files have been removed (eg. R/sysdata.rda)
  5. We agree with the reviewer. If a user passes a bad argument, it is best to fail fast with an informative message so that the user can quickly correct their mistake and move on. But the number of possible mistakes may be infinite; so we have particularly focused on adding an internal function validate_equations()to tests that all equations in equations$equation_allometry are a valid R expression of the dbh or h variables (since this could be one of the main mistakes on calculating biomass). This function can be called from inside any function within allodb and allows users to pass a character vector of equations. We also added consistency checks in the get_biomass function of dbh and coordinates provided by the user.
  6. Unused Travis badge in REAMDE has been removed
  7. The .buildignore folder has been removed
  8. We changed set.seed with withr::with_seed in the resample_agb function.
  9. About eval(parse(…)), see next comment.
  10. Parameters for many functions have been reordered.

Comprehensive review by @jeffreyhanson (6/17/21), has also been addressed:
About code:

  1. See note number 5 above. If the suggested change is truly necessary (validation of function arguments), then we may take the time to implement.
  2. All dataframes now have the tibble subclass.
    About documentation:
  3. Documentation has been improved for all functions (specially the get_biomass and weight_allom functions), added @seealso to help users navigate btw documentations, use quotation marks for character vectors, added example sections to the documentation for the built-in datasets, etc.
  4. We are now using \pkg{allodb} in documentation when referring to this package, and equations commands (eg. \deqn{}).
  5. Removed library(allodb) from all test scripts.

Suggestions by @jstillh (6/14/21), have been addressed:

  • Documentation for both the get_biomass and weight_allom core functions have been improved so the weighting concept is more transparent, specifically we give more details in the Details section. Regarding adding additional outputs to the get_biomass function, we think that would make the function too slow.
  • We improved the error message within the weight_allom function (i.e. now it deals with coordinates that are not associated with any koppen climate zone) to address the specific comments on the fact that the get_biomass function throws an error when the equation table does not contain equations corresponding to the Koppen climate at the provided coordinates (argument coords). The proposed solution is to change koppenMatrix, replacing climate combinations with weight we=0 (2 completely different climates) to we = 1e-6, similar to what is done with the taxonomic weight. This way, equations with we = 1e-6 will only be significantly used in the recalibrated equations when no better climate combination is found, therefore avoid getting an error. In regards to Forrester et al 2017 equations, we revised them and now many coordinates correspond to max or min values given in original publication (not the mean except for one site).
  • Vignette has been improved.
  • About eval(parse(…)), also see next comment.

@gonzalezeb
Copy link
Author

The reviewers advice to avoid the call eval(parse(text = ...)) (item 9 here, and “Code and documentation” here), mainly because (a) it can be unsafe, and (b) error prone.

Instead they suggest storing the equations as objects such as expressions or functions, and manipulate them with substitute().

We thank the advice and, after considering it carefully, we would like to share our updated opinion. We first discuss the problems of eval(parse()) and then the alternative solutions.

To make our discussion concrete we offer some code examples.

library(dplyr, warn.conflicts = FALSE)
library(allodb)

tl;dr

  • The use of eval(parse()) as its problems seem to be balanced by the benefits we see in this particular use case, and the alternative solutions seem too complex. This is in line with @jeffreyhanson’s comment on on Jun
    17
    :

I previously suggested that it would be good to avoid using eval(parse(...)) if possible. (…) However, [the alternatives] seem
needlessly complex compared to the current implementation? Maybe this could be one instance where eval(parse(...)) is not bad idea?

  • We are now testing the equations are correct or will trigger an informative error message.

Unsafe

Online we found that parsing text would be a risk if we hosted an application on a server we own and allow users to input arbitrary text. A malicious user could damage our server.

But this is not our case. The allodb package is to be used as an analysis tool, directly on the users’s computers. And the input text comes mainly not from users but from a dataset we built.

Error prone

We acknowledge that bugs can hide in the unevaluated text, as invalid code is still a valid text object.

valid <- "dbh + 1"
eval(parse(text = valid), envir = list(dbh = 1))
#> [1] 2

invalid <- "1 <- dbh"
try(eval(parse(text = invalid), envir = list(dbh = 1)))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment

But bugs can also hide in unevaluated expressions. Using substitute() to manipulate expressions does not ensure an expression is valid.

invalid <- quote(1 <- dbh)
substitute(x, list(x = invalid))
#> 1 <- dbh

try(eval(substitute(x, list(x = invalid))))
#> Error in 1 <- dbh : invalid (do_set) left-hand side to assignment

We are comfortable storing and manipulating text. We can inspect equations stored as text directly from the dataframe, and we can manipulate them with familiar tools.

# Storing text
equation <- "1.2927 * (dbh^2)^1.36723"
dbh <- "(sampled_dbh * 0.393701)"

# + We can inspect the equations directly from the dataframe
tibble(equation, dbh)
#> # A tibble: 1 x 2
#>   equation                 dbh                     
#>   <chr>                    <chr>                   
#> 1 1.2927 * (dbh^2)^1.36723 (sampled_dbh * 0.393701)

# - To manipulate the equations we use familiar tools
modified <- gsub("dbh", dbh, equation)
modified
#> [1] "1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723"

eval(parse(text = modified), list(sampled_dbh = 1))
#> [1] 0.1010408

In contrast, we are not comfortable storing or manipulating expressions with substitute(). We found great help in the first edition of Advanced R but we still think the approach is difficult to understand, which makes us more likely to introduce bugs than with eval(parse()).

# We need to create an escape hatch because substitute() doesn't work if we
# already have an expression saved in a variable
# http://adv-r.had.co.nz/Computing-on-the-language.html#substitute
substitute_q <- function(x, env) {
  call <- substitute(substitute(y, env), list(y = x))
  eval(call)
}

equation <- quote(1.2927 * (dbh^2)^1.36723)
dbh <- quote((sampled_dbh * 0.393701))

# - We can't inspect the equations directly from the dataframe :-(
tibble(equation = list(equation), dbh = list(dbh))
#> # A tibble: 1 x 2
#>   equation   dbh       
#>   <list>     <list>    
#> 1 <language> <language>

# - To manipulate the equations we need unfamiliar tools
modified <- substitute_q(equation, list(dbh = dbh))
modified
#> 1.2927 * ((sampled_dbh * 0.393701)^2)^1.36723

eval(modified, list(sampled_dbh = 1))
#> [1] 0.1010408

As @jeffreyhanson advised, instead of using expressions and substitute() we prefer to ensure all text-equations are valid or throw an informative error message.

validate_equation <- function(text, envir = parent.frame()) {
  tryCatch(
    eval(parse(text = text), envir = envir),
    error = function(e) {
      stop("`text` must be valid R code.\nx Invalid: ", text, call. = FALSE)
    }
  )
  
  invisible(text)
}

dbh <- 99
validate_equation(valid)
try(validate_equation(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh

f <- function(text) {
  validate_equation(text, envir = list(dbh = 1))
  # Some useful code
  
  text
}

f(valid)
#> [1] "dbh + 1"
try(f(invalid))
#> Error : `text` must be valid R code.
#> x Invalid: <-1dbh

@adamhsparks
Copy link
Member

@gonzalezeb, thank you for your comprehensive overview of the changes that you have made to the allodb package.

@jeffreyhanson and @jstillh could you please let me know if you are satisfied with the changes to the package or have any further suggestions that may improve it?

@jeffreyhanson
Copy link

Awesome - thank you very much @gonzalezeb. @adamhsparks, I'll try and take get back to you next week.

@jstillh
Copy link

jstillh commented Sep 10, 2021

@gonzalezeb thanks a lot for the overview of the changes - i'll try to go through them in the next week as well and get back to you, @adamhsparks .

@jeffreyhanson
Copy link

Thank you @gonzalezeb for providing such detailed responses to the points I raised earlier. I especially found your discussion on the costs and benefits of eval(parse(...)) really useful. I think the R package is looking absolutely brilliant! Most of the points I raised earlier have been addressed. I've made a couple of suggestions below that I think could help improve package.1 All things considered, these are pretty minor - just some documentation tweaks. If those could be addressed, I'd say that all my suggestions have been satisfied. I've also listed several additional minor comments too. Based on my understanding of rOpenSci guidelines, they won't/shouldn't influence whether the package should be accepted or not (but please correct me if I am wrong). I've listed them just to be thorough, and in case the package authors think they are worth addressing.

Suggestions

  • Package checks throw the following warning on my computer (i.e. running devtools::check()). Could you please fix it? Let me know if you're unable to reproduce this error and I can try to provide more information?

    Undocumented code objects:
    ‘equations’
    Undocumented data sets:
    ‘equations’
    All user-level objects in a package should have documentation entries.
    See chapter ‘Writing R documentation files’ in the ‘Writing R
    Extensions’ manual.

  • Thank you very much for looking into potential issues with using eval(parse(...)) and potential alternatives. I agree that the main issue would be if the code was run inside a web application that allowed individuals to enter in arbitrary text. Although additional issues could arise on standard desktop/laptop computers too (e.g. a malicious actor could potentially update the library files), I guess at this point malicious actors would have easier ways to accomplish their goals then modifying R library files. So, given the costs and benefits outlined by @gonzalezeb, I think it's perfectly reasonable for the package to use eval(paste(...)). Maybe it would be worth adding some warning text to the documentation for the function(s) that use eval(parse(...)) so that users are aware of potential issues? This way it is clearly the responsibility of the user to ensure that they take appropriate steps to avoid security issues.2

    Also, I noticed one of the links provided by @gonzalezeb mentioned that RAppArmor::eval.secure() can prevent system calls with administrator privileges. If it's not too much hassle, would it be worth using that where possible? It seems that this R package is only available on Linux though - so perhaps it's not worth the hassle (because then some additional code would be needed to conditionally use RAppArmor::eval.secure() if it's available, and otherwise use eval(parse(...))). I just thought I'd mention it just in case.

  • The README.md file might need re-knitting because the output displayed for est_params() is a data.table object (instead of a tibble() object per the current version).

Minor comments

  • The documentation doesn't seem to format code correctly (i.e. using back ticks). For example, NULL is referred to as "NULL" instead of NULL (e.g., see here, or here). Personally, I find syntax formatting helpful when reading documentation (e.g., https://readr.tidyverse.org/reference/read_delim.html). I don't think rOpensci guidelines have style mandates for code in package documentation, so I don't think this is critically important.

  • The main functions (e.g.est_params(), get_biomass(), resample_agb()) don't contain code to validate all arguments. Although @gonzalezeb mention that "the number of possible mistakes may be infinite", I would suggest that perhaps the number of expectations (e.g. properties or characteristics) for a valid input is much lower. For example, in order to verify that an argument is a single integer (e.g. 5), one does not need to verify that the parameter is (i) not a character, (ii) not a NULL, (iii) not a double, (Inf) etc. Instead, one could verify that it conforms to a set of expectations (e.g. inherits(x, "integer") && identical(length(x), 1) && all(is.finite(x))). This is made much easier using existing R packages developed for this purpose though. However, I don't think rOpensci or CRAN require argument validation for published packages, so I don't think this issue critically important. I'm just making this comment to be thorough :)

  • This is a super minor suggestion. The equation in the documentation for the est_params() function has the text " with " formatted as a maths term (rather than text), which makes it slightly difficult to read on the package website. To address this, I think you could change this line to define the equation as:

\deqn{AGB = a * dbh ^ b + e, \space \mathit{with} \space e ~ N(0, sigma^2)}{AGB = a * dbh ^ b + e, with e ~ N(0, sigma^2)}

1 Please note that I still (unsurprisingly) lack the forestry experience needed to evaluate the novelty and methodology of this package, and so continue to limit my reviews to code quality and documentation.

2 Just to be clear, I do think that -- where possible, feasible, and reasonable -- R package developers should ideally -- if only in principle -- minimize security risks, even if software licenses absolve developers of responsibility. It's just I think that eval(paste(..)) is not so terrible here, given the noted trade-offs that are specific to this package. Perhaps other will disagree, and that's fine. Please note that I do not have any experience with cybersecurity. If you want an informed view point on such matters, I suggest you talk to someone else :)

@gonzalezeb
Copy link
Author

Thanks @jeffreyhanson, I have adopted most of your comments, they are very helpful!

@jstillh
Copy link

jstillh commented Sep 19, 2021

Thank you @gonzalezeb for the very detailed response to the reviews. As @jeffreyhanson, I like your detailed response regarding the use of eval(parse(...)). In general, i am very pleased with the revisions and the changes you implemented in the functions. I do not have any more suggestions beyond the ones by @jeffreyhanson.

  • I appreciate the changes / improvements in the documentation of both the weight_allom and the get_biomass function and that you implemented an informative error message if coordinates are not covered by the koppen data.
  • Your proposition regarding the weighting of the climatic combinations sounds like a good idea.

Thanks for this contribution.

@adamhsparks
Copy link
Member

@ropensci-review-bot approve

@ropensci-review-bot
Copy link
Collaborator

Could not approve. Please, specify the name of the package.

@adamhsparks
Copy link
Member

@ropensci-review-bot approve allodb

@ropensci-review-bot
Copy link
Collaborator

ropensci-review-bot commented Sep 27, 2021

Approved! Thanks @gonzalezeb for submitting and @jeffreyhanson, @jstillh for your reviews! 😁

To-dos:

  • Transfer the repo to rOpenSci's "ropensci" GitHub organization under "Settings" in your repo. I have invited you to a team that should allow you to do so. You'll be made admin once you do.
  • Fix all links to the GitHub repo to point to the repo under the ropensci organization.
  • Delete your current code of conduct file if you had one since rOpenSci's default one will apply, see https://devguide.ropensci.org/collaboration.html#coc-file
  • If you already had a pkgdown website and are ok relying only on rOpenSci central docs building and branding,
    • deactivate the automatic deployment you might have set up
    • remove styling tweaks from your pkgdown config but keep that config file
    • replace the whole current pkgdown website with a redirecting page
    • replace your package docs URL with https://docs.ropensci.org/package_name
    • In addition, in your DESCRIPTION file, include the docs link in the URL field alongside the link to the GitHub repository, e.g.: URL: https://docs.ropensci.org/foobar (website) https://github.com/ropensci/foobar
  • Fix any links in badges for CI and coverage to point to the ropensci URL. We no longer transfer Appveyor projects to ropensci Appveyor account so after transfer of your repo to rOpenSci's "ropensci" GitHub organization the badge should be [![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/github/ropensci/pkgname?branch=master&svg=true)](https://ci.appveyor.com/project/individualaccount/pkgname). If Appveyor does not pick up new commits after transfer, you might need to delete and re-create the Appveyor project. (Repo transfers are smoother with GitHub Actions)
  • Please check you have updated the package version to a post-review version and that you documented all changes in NEWS.md
  • We're starting to roll out software metadata files to all ropensci packages via the Codemeta initiative, see https://github.com/ropensci/codemetar/#codemetar for how to include it in your package, after installing the package - should be easy as running codemetar::write_codemeta() in the root of your package.

Should you want to acknowledge your reviewers in your package DESCRIPTION, you can do so by making them "rev"-type contributors in the Authors@R field (with their consent). More info on this here.

Welcome aboard! We'd love to host a post about your package - either a short introduction to it with an example for a technical audience or a longer post with some narrative about its development or something you learned, and an example of its use for a broader readership. If you are interested, consult the blog guide, and tag @stefaniebutland in your reply. She will get in touch about timing and can answer any questions.

We've put together an online book with our best practice and tips, this chapter starts the 3d section that's about guidance for after onboarding. Please tell us what could be improved, the corresponding repo is here.

Last but not least, you can volunteer as a reviewer via filling a short form.

maurolepore added a commit to forestgeo/forestgeo.github.io that referenced this issue Sep 30, 2021
maurolepore added a commit to ropensci/allodb that referenced this issue Oct 2, 2021
>  Delete your current code of conduct file if you
had one since rOpenSci's default one will apply, see
https://devguide.ropensci.org/collaboration.html#coc-file

--ropensci/software-review#436 (comment)
maurolepore added a commit to ropensci/allodb that referenced this issue Oct 2, 2021
>  We're starting to roll out software metadata files
to all ropensci packages via the Codemeta initiative, see
https://github.com/ropensci/codemetar/#codemetar for how to include
it in your package, after installing the package - should be easy as
running codemetar::write_codemeta() in the root of your package.
--ropensci/software-review#436 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants