-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset: Create Data Frames that are Easier to Exchange and Reuse #553
Comments
Thanks for submitting to rOpenSci, our editors and @ropensci-review-bot will reply soon. Type |
🚀 The following problem was found in your submission template:
👋 |
Hi, @antaldaniel, could you please fix the repo URL by providing a link to the package’s repository, please? 🙏 |
@adamhsparks Apologies for the original issue problem, I hope all is fine now. I added both the github repo and the package website url |
@antaldaniel Then you can start the checks yourself by calling |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for dataset (v0.1.7)git hash: 2eb439b5
Important: All failing checks above must be addressed prior to proceeding Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. basenames (26), data.frame (14), class (12), paste (9), rep (7), sapply (7), unlist (6), which (6), attr (5), lapply (5), length (5), ncol (5), subset (4), as.character (3), attributes (3), c (3), logical (3), seq_along (3), vapply (3), as.data.frame (2), as.numeric (2), cbind (2), file (2), inherits (2), matrix (2), nrow (2), round (2), args (1), date (1), deparse (1), for (1), gsub (1), ifelse (1), is.null (1), paste0 (1), rbind (1), tolower (1), union (1), unique (1), url (1), UseMethod (1) datasetdimensions (6), attributes_measures (5), measures (5), all_unique (3), dataset_title (3), related_item (3), creator (2), datacite (2), dataset (2), dataset_source (2), description (2), geolocation (2), identifier (2), language (2), metadata_header (2), publication_year (2), publisher (2), related_item_identifier (2), resource_type (2), add_date (1), add_relitem (1), arg.names (1), attributes_names (1), bibentry_dataset (1), datacite_add (1), dataset_download (1), dataset_download_csv (1), dataset_export (1), dataset_export_csv (1), dataset_local_id (1), dataset_title_create (1), dataset_uri (1), dimensions_names (1), document_package_used (1), dot.names (1), dublincore (1), dublincore_add (1), extract_year (1), is.dataset (1), measures_names (1), print (1), print.dataset (1), resource_type_general (1), rights (1), subject (1), time_var_guess (1), version (1) statsdf (2), time (2) utilscitation (1), object.size (1), read.csv (1), sessionInfo (1) rlangget_expr (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
file | coverage |
---|---|
R/creator.R | 64.29% |
R/datacite_attributes.R | 0% |
R/datacite.R | 46.88% |
R/dataset_uri.R | 0% |
R/dataset.R | 48.36% |
R/document_package_used.R | 0% |
R/dublincore.R | 67.74% |
R/publication_year.R | 55.56% |
R/related_item.R | 66.67% |
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
datacite_add | 24 |
dublincore_add | 23 |
Static code analyses with lintr
lintr found the following 383 potential issues:
message | number of times |
---|---|
Avoid 1:ncol(...) expressions, use seq_len. | 4 |
Avoid library() and require() calls in packages | 20 |
Avoid using sapply, consider vapply instead, that's type safe | 4 |
Lines should not be more than 80 characters. | 352 |
Use <-, not =, for assignment. | 3 |
4. Other Checks
Details of other checks (click to open)
✖️ The following 10 function names are duplicated in other packages:
-
dataset
from assemblerr, febr, robis
-
description
from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
-
dimensions
from gdalcubes, openeo, sp, tiledb
-
identifier
from Ramble
-
is.dataset
from crunch
-
language
from sylly, wakefield
-
measures
from greybox, mlr3measures, tsibble
-
size
from acrt, BaseSet, container, crmPack, CVXR, datastructures, deal, disto, easyVerification, EFA.MRFA, flifo, gdalcubes, gWidgets2, hrt, iemisc, InDisc, kernlab, matlab2r, multiverse, optimbase, PopED, pracma, ramify, rEMM, rmonad, simplegraph, siren, tcltk2, UComp, unival, vampyr
-
subject
from DGM, emayili, gmailr, sendgridr
-
version
from BiocManager, garma, geoknife, mice, R6DS, rerddap, rsyncrosim, shiny.info, SMFilter
Package Versions
package | version |
---|---|
pkgstats | 0.1.1.20 |
pkgcheck | 0.1.0.3 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Hi again, @antaldaniel. If you could please address the issues that the bot flagged with the ✖️, then I can proceed with your submission. |
Hi @adamhsparks I hope I managed to add these things, with the following exception. ✔️does not have a 'codemeta.json' file -> added with codematar. I tried to avoid duplications while keeping in mind rOpenSci duplication guildelines, and at this point, I do not see which are the dupblications and if there is any sensible way to resolve them. Your guidelines state "Avoid function name conflicts with base packages or other popular ones (e.g. ggplot2, dplyr, magrittr, data.table)" The package currently has no name conflict with any packages that I was thinking of to be used together, and I do not know how to test for this. (Apolgoies if this is somewhere in the 1.3 Package API) ✔️ Package has no continuous integration checks -> added I do not see a sensible way to achieve 75%+ codecov coverage with a metadata package that is in an early development page, still has development questions open (see Motivation: Make Tidy Datasets Easier to Release Exchange and Reuse, hence the submission here before the first CRAN release). For example, in the target category, other metadata management pacakges like codemetar has a 42% coverage, EML has 65%, both below the current coverage before the first release of dataset. |
@antaldaniel You may indeed ignore the "Function names are duplicated in other packages." That will soon be changed from a failing check (:heavy_multiplication_x:) to an advisory note only. Sorry for any confusion there. @adamhsparks will comment further on the code coverage. |
@mpadge I do not seem to find the output where this informaiton is coming from, but I think that it is nevertheless a very useful reminder, and it would be good to see what conflicts your bot has found. Again, apologies if I ask the obvious, but where can I check what duplicates were flagged by your bot? |
It's in the check results. Under "4. Other Checks", you'll see a "Details of other checks (click to open)". You can also generate those yourself by running: library(pkgcheck)
checks <- pkgcheck("/<path>/<to>/<dataset-pkg>")
checks_md <- checks_to_markdown(checks, render = TRUE) That will automatically open a HTML-rendered version of the checks, just like the above. You can use that repeatedly as you work through the issues highlighted above. |
@mpadge Oh, really, sorry for asking the obvious. I would like to comment here on the issue then in substance. The main development question of the package, which aims to make R objects standard datasets (as defined by W3C and SDMX), is to add structural and referential metadata, is if the best way to do this is to create an s3 object or not (see the dilemma here.) In the current stage, it is a pseudo object inherited from data.frame, but it can be seen also as a utility to any data.frame, tibble, and data.table (or similar tabular format) R objects. The functions, which have duplicates in other packages, are following a very simple naming convention. I think that these is the cleanest API interface that I can think of, for example, the subject() gets the metadata attribute All these functions are lowercase to manipulate a camelCase standard attribute. Except for the SDMX attribute 'attribute', which would create a conflict with the base R 'attributes()' function. |
Hi @antaldaniel, For instance, Lines 40-43 are covered but Lines 44-45 aren't. These are seemingly the same except for checking on 2 or 3 letter ISO codes, unless I'm mistaken. Or the message response within the Could I ask that you have another look and see if you can't further improve the coverage a bit more? |
Hi @adamhsparks I went up to 71.27%, but further changes are not very productive. I did not extensively cover two areas, one is the constructor for the dataset() itself, where I expect potentially breaking changes, and in the file I/O areas, where I think I would like to come up with a more general solution, and also avoid test being run on CRAN later. As the overwrite function and its messages make the most branches, this is a bit of a play with %, as the very same copied test is tested again and again. Do you have a good solution to include download and file I/O tests that run fast enough or cause no disruption when later run on CRAN? |
@adamhsparks I am much above your treshold, and apologies for the trivial error. I wanted to omit some issues in the dataset() construtor, but I did not realize that it had some old code that had been rewritten - the test were omitting them, of course, but they sat at the bottom of the file. It is now 81.2% covered, I know that it has to improve, but I'd prefer to do it when some issues are resolved in a clear direction (see my comment above.) |
Hi @antaldaniel, that's great to see. Thank you for rechecking everything and updating. If you have tests that you feel are unconducive for CRAN, I'd just use (and do liberally use) |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for dataset (v0.1.7.0002)git hash: 93c03c54
Important: All failing checks above must be addressed prior to proceeding Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. basenames (21), class (12), data.frame (10), paste (9), vapply (9), rep (7), character (6), unlist (6), attr (5), lapply (5), length (5), ncol (5), subset (4), as.character (3), c (3), seq_along (3), as.data.frame (2), as.numeric (2), attributes (2), cbind (2), file (2), inherits (2), logical (2), matrix (2), nrow (2), round (2), which (2), date (1), for (1), ifelse (1), is.null (1), paste0 (1), rbind (1), seq_len (1), tolower (1), union (1), unique (1), url (1), UseMethod (1) datasetattributes_measures (5), dimensions (4), all_unique (3), dataset_title (3), measures (3), creator (2), datacite (2), dataset (2), dataset_source (2), description (2), geolocation (2), identifier (2), language (2), metadata_header (2), publication_year (2), publisher (2), related_item_identifier (2), resource_type (2), bibentry_dataset (1), datacite_add (1), dataset_download (1), dataset_download_csv (1), dataset_export (1), dataset_export_csv (1), dataset_local_id (1), dataset_title_create (1), dataset_uri (1), dublincore (1), dublincore_add (1), extract_year (1), is.dataset (1), print (1), print.dataset (1), related_item (1), resource_type_general (1), resource_type_general_allowed (1), rights (1), subject (1), time_var_guess (1), version (1) statsdf (2) utilsobject.size (1), read.csv (1) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
2891146042 | pkgcheck | failure | 93c03c | 17 | 2022-08-19 |
2891146050 | test-coverage | success | 93c03c | 20 | 2022-08-19 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following check_fail:
- no_description_date
Test coverage with covr
Package coverage: 82.12
Cyclocomplexity with cyclocomp
The following functions have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
datacite_add | 24 |
dublincore_add | 23 |
Static code analyses with lintr
lintr found the following 370 potential issues:
message | number of times |
---|---|
Avoid library() and require() calls in packages | 20 |
Lines should not be more than 80 characters. | 350 |
4. Other Checks
Details of other checks (click to open)
✖️ The following 10 function names are duplicated in other packages:
-
dataset
from assemblerr, febr, robis
-
description
from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
-
dimensions
from gdalcubes, openeo, sp, tiledb
-
identifier
from Ramble
-
is.dataset
from crunch
-
language
from sylly, wakefield
-
measures
from greybox, mlr3measures, tsibble
-
size
from acrt, BaseSet, container, crmPack, CVXR, datastructures, deal, disto, easyVerification, EFA.MRFA, flifo, gdalcubes, gWidgets2, hrt, iemisc, InDisc, kernlab, matlab2r, multiverse, optimbase, PopED, pracma, ramify, rEMM, rmonad, simplegraph, siren, tcltk2, UComp, unival, vampyr
-
subject
from DGM, emayili, gmailr, sendgridr
-
version
from BiocManager, garma, geoknife, mice, R6DS, rerddap, rsyncrosim, shiny.info, SMFilter
Package Versions
package | version |
---|---|
pkgstats | 0.1.1.20 |
pkgcheck | 0.1.0.3 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
@ropensci-review-bot assign @melvidoni as editor |
Assigned! @melvidoni is now the editor |
Hi @annakrystalli , just wanted to give a short update. The small changes suggested in this thread were implemented, and the early version of the package was released on CRAN. I am devising a 2-year development plan for the package and have a clear overview of planned milestones. When done, I will contact the other mentioned package owners/maintainers with this plan. With the main developers, who are not software engineers, but statisticians with statistical software development expertise, we will have a kick-off meeting in the last week of January. |
Ok great! Thanks for the update @antaldaniel |
Hello @antaldaniel ! Was wondering whether you had any updates on progress on the package? |
Hi @annakrystalli , there has been very little change, only in documentation; I have secured development funding and will publish a more detailed development concept and look for paid and volunteer contributors in the coming weeks. I would like to ask you what would be an excellent way to do so; apart from adding this as a vignette to this early-stage package, would it be possible to raise attention by a blog post or something similar? |
Great to hear you have secured development funding! You are always welcome to advertise on the rOpenSci slack, especially in the |
After a very long time, here is a conceptual working paper on the development with far more detailed specification than before, and some code ideas: Making Datasets Truly Interoperable in R is a working paper to accompany develop the package. The working paper can be referenced with: I am also looking for volunteer and potentially paid contributors to the package. The source file is usually more recent: dataset-working-paper.qmd` |
Thank you for the update @antaldaniel ! Good to hear you are making progress with the plans. Ultimately I feel the package will still remain on hold until it has been developed enough to be considered, if not ready, pretty close to release. That's when feedback from reviewers will be most useful and is also more aligned what is expected for reviewers to contribute their views on. Let us know when you feel you have reached that stage! |
@annakrystalli I think that the review would be useful now, because I am implementing this working paper Making Datasets Truly Interoperable now. I just sent a new version to CRAN, but there is still room to review. Also, if somebody wants to get involved in the development, I do have a public grant for it and could take on a co-developer. The new version (which is an entire rewrite since the first review) is on the dataset.dataobservatory.eu/ website with the connecting GitHub repo. I see a problem though with your CI attached to the package, it throws errors which to me look configuration errors and not real error, the package just builds fine on appveyor and r_hub. |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 The following problem was found in your submission template:
👋 |
Checks for dataset (v0.3.1)git hash: b1dca41e
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baseas.character (40), ifelse (40), is.null (38), list (30), c (16), data.frame (14), names (10), lapply (8), attr (7), paste0 (7), inherits (6), class (5), col (5), drop (4), invisible (4), seq_along (4), which (4), as.POSIXct (3), character (3), date (3), for (3), format (3), length (3), ncol (3), Sys.time (3), unlist (3), vapply (3), all (2), args (2), as.data.frame (2), as.numeric (2), dim (2), paste (2), rbind (2), round (2), substitute (2), t (2), url (2), with (2), apply (1), as.Date (1), cbind (1), comment (1), do.call (1), environment (1), get (1), if (1), max (1), nchar (1), new.env (1), range (1), rep (1), substr (1), switch (1), Sys.Date (1) datasetdataset_bibentry (28), dataset_title (10), dataset (8), rights (8), subject (8), creator (7), description (6), publisher (6), identifier (5), language (5), new_Subject (5), provenance (5), xsd_convert (5), DataStructure (4), convert_column (3), publication_year (3), as_bibentry (2), as_dublincore (2), dots_number (2), geolocation (2), get_type (2), getdata (2), idcol_find (2), is_person (2), is.dataset (2), provenance_add (2), related_item_identifier (2), size (2), subject_create (2), version (2), as_datacite (1), as_dataset (1), as_dataset.data.frame (1), datacite (1), dataset_download (1), dataset_download_csv (1), dataset_prov (1), dataset_title_create (1), dataset_to_triples (1), dataset_ttl_write (1), datasource_get (1), datasource_set (1), DataStructure_update (1), describe (1), describe.dataset (1), dublincore (1), get_prefix (1), get_resource_identifier (1), head.dataset (1), id_to_column (1), initialise_dsd (1), is.datacite (1), is.datacite.datacite (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), new_datacite (1), new_dataset (1), new_dublincore (1), old_function (1), print.dataset (1), related_item (1), set_var_labels (1), set_var_labels.dataset (1) assertthatassert_that (22) utilsbibentry (3), data (2), person (2), citation (1), object.size (1), read.csv (1), tail (1) statsdf (5), var (3), ar (1), family (1) graphicstitle (6) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
7677839674 | pkgcheck | failure | b1dca4 | 126 | 2024-01-27 |
7677839676 | R-CMD-check | failure | b1dca4 | 46 | 2024-01-27 |
7677839673 | test-coverage | failure | b1dca4 | 129 | 2024-01-27 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following check_fail:
- no_description_date
Test coverage with covr
Package coverage: 78.97
Cyclocomplexity with cyclocomp
The following function have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
[[.dataset | 17 |
Static code analyses with lintr
lintr found the following 417 potential issues:
message | number of times |
---|---|
Avoid 1:length(...) expressions, use seq_len. | 1 |
Avoid 1:ncol(...) expressions, use seq_len. | 2 |
Avoid 1:nrow(...) expressions, use seq_len. | 3 |
Avoid library() and require() calls in packages | 23 |
Lines should not be more than 80 characters. | 384 |
unexpected symbol | 2 |
Use <-, not =, for assignment. | 2 |
4. Other Checks
Details of other checks (click to open)
✖️ The following 12 function names are duplicated in other packages:
-
dataset
from assemblerr, febr, robis
-
describe
from AzureVision, Bolstad2, describer, dlookr, explore, Hmisc, iBreakDown, ingredients, lambda.r, MSbox, onewaytests, prettyR, psych, psych, psyntur, questionr, radiant.data, RCPA3, Rlab, scan, scorecard, sylly, tidycomm
-
description
from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
-
identifier
from Ramble
-
is.dataset
from crunch
-
language
from sylly, wakefield
-
provenance
from provenance
-
set_var_labels
from xpose
-
size
from acrt, BaseSet, container, crmPack, CVXR, datastructures, deal, disto, easyVerification, EFA.MRFA, flifo, gdalcubes, gWidgets2, hrt, iemisc, InDisc, kernlab, matlab2r, multiverse, optimbase, PopED, pracma, ramify, rEMM, rmonad, simplegraph, siren, tcltk2, UComp, unival, vampyr
-
subject
from DGM, emayili, gmailr, sendgridr
-
var_labels
from formatters, sjlabelled
-
version
from BiocManager, garma, geoknife, mice, R6DS, rerddap, rsyncrosim, shiny.info, SMFilter
Package Versions
package | version |
---|---|
pkgstats | 0.1.3.11 |
pkgcheck | 0.1.2.15 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Hi @antaldaniel Since you mentioned "The new version (which is an entire rewrite since the first review) ", we're going to treat this as a new submission and get new reviewers. Can you work on the 2 outstanding issues above while I look for a new editor? Thanks @annakrystalli for the initial work! |
@ldecicco-USGS thank you for the head up, and indeed, I will fix those issues. |
Let me know when you've updated the package (or go ahead and rerun the "bot" command to check package. Once we've got that taken care of I'll assign a new editor. Thanks! |
@ropensci-review-bot check package |
Thanks, about to send the query. |
🚀 Editor check started 👋 |
Checks for dataset (v0.3.1)git hash: b1dca41e
Important: All failing checks above must be addressed prior to proceeding (Checks marked with 👀 may be optionally addressed.) Package License: GPL (>= 3) 1. Package DependenciesDetails of Package Dependency Usage (click to open)
The table below tallies all function calls to all packages ('ncalls'), both internal (r-base + recommended, along with the package itself), and external (imported and suggested packages). 'NA' values indicate packages to which no identified calls to R functions could be found. Note that these results are generated by an automated code-tagging system which may not be entirely accurate.
Click below for tallies of functions used in each package. Locations of each call within this package may be generated locally by running 's <- pkgstats::pkgstats(<path/to/repo>)', and examining the 'external_calls' table. baseas.character (40), ifelse (40), is.null (38), list (30), c (16), data.frame (14), names (10), lapply (8), attr (7), paste0 (7), inherits (6), class (5), col (5), drop (4), invisible (4), seq_along (4), which (4), as.POSIXct (3), character (3), date (3), for (3), format (3), length (3), ncol (3), Sys.time (3), unlist (3), vapply (3), all (2), args (2), as.data.frame (2), as.numeric (2), dim (2), paste (2), rbind (2), round (2), substitute (2), t (2), url (2), with (2), apply (1), as.Date (1), cbind (1), comment (1), do.call (1), environment (1), get (1), if (1), max (1), nchar (1), new.env (1), range (1), rep (1), substr (1), switch (1), Sys.Date (1) datasetdataset_bibentry (28), dataset_title (10), dataset (8), rights (8), subject (8), creator (7), description (6), publisher (6), identifier (5), language (5), new_Subject (5), provenance (5), xsd_convert (5), DataStructure (4), convert_column (3), publication_year (3), as_bibentry (2), as_dublincore (2), dots_number (2), geolocation (2), get_type (2), getdata (2), idcol_find (2), is_person (2), is.dataset (2), provenance_add (2), related_item_identifier (2), size (2), subject_create (2), version (2), as_datacite (1), as_dataset (1), as_dataset.data.frame (1), datacite (1), dataset_download (1), dataset_download_csv (1), dataset_prov (1), dataset_title_create (1), dataset_to_triples (1), dataset_ttl_write (1), datasource_get (1), datasource_set (1), DataStructure_update (1), describe (1), describe.dataset (1), dublincore (1), get_prefix (1), get_resource_identifier (1), head.dataset (1), id_to_column (1), initialise_dsd (1), is.datacite (1), is.datacite.datacite (1), is.dublincore (1), is.dublincore.dublincore (1), is.subject (1), new_datacite (1), new_dataset (1), new_dublincore (1), old_function (1), print.dataset (1), related_item (1), set_var_labels (1), set_var_labels.dataset (1) assertthatassert_that (22) utilsbibentry (3), data (2), person (2), citation (1), object.size (1), read.csv (1), tail (1) statsdf (5), var (3), ar (1), family (1) graphicstitle (6) NOTE: Some imported packages appear to have no associated function calls; please ensure with author that these 'Imports' are listed appropriately. 2. Statistical PropertiesThis package features some noteworthy statistical properties which may need to be clarified by a handling editor prior to progressing. Details of statistical properties (click to open)
The package has:
Statistical properties of package structure as distributional percentiles in relation to all current CRAN packages
All parameters are explained as tooltips in the locally-rendered HTML version of this report generated by the The final measure (
2a. Network visualisationClick to see the interactive network visualisation of calls between objects in package 3.
|
id | name | conclusion | sha | run_number | date |
---|---|---|---|---|---|
7677839674 | pkgcheck | failure | b1dca4 | 126 | 2024-01-27 |
7677839676 | R-CMD-check | failure | b1dca4 | 46 | 2024-01-27 |
7677839673 | test-coverage | failure | b1dca4 | 129 | 2024-01-27 |
3b. goodpractice
results
R CMD check
with rcmdcheck
R CMD check generated the following check_fail:
- no_description_date
Test coverage with covr
Package coverage: 78.97
Cyclocomplexity with cyclocomp
The following function have cyclocomplexity >= 15:
function | cyclocomplexity |
---|---|
[[.dataset | 17 |
Static code analyses with lintr
lintr found no issues with this package!
4. Other Checks
Details of other checks (click to open)
✖️ The following 12 function names are duplicated in other packages:
-
dataset
from assemblerr, febr, robis
-
describe
from AzureVision, Bolstad2, describer, dlookr, explore, Hmisc, iBreakDown, ingredients, lambda.r, MSbox, onewaytests, prettyR, psych, psych, psyntur, questionr, radiant.data, RCPA3, Rlab, scan, scorecard, sylly, tidycomm
-
description
from dataMaid, dataPreparation, dataReporter, dcmodify, memisc, metaboData, PerseusR, ritis, rmutil, rsyncrosim, stream, synchronicity, timeSeries, tis, validate
-
identifier
from Ramble
-
is.dataset
from crunch
-
language
from sylly, wakefield
-
provenance
from provenance
-
set_var_labels
from xpose
-
size
from acrt, BaseSet, container, crmPack, CVXR, datastructures, deal, disto, easyVerification, EFA.MRFA, flifo, gdalcubes, gWidgets2, hrt, iemisc, InDisc, kernlab, matlab2r, multiverse, optimbase, PopED, pracma, ramify, rEMM, rmonad, simplegraph, siren, tcltk2, UComp, unival, vampyr
-
subject
from DGM, emayili, gmailr, sendgridr
-
var_labels
from formatters, sjlabelled
-
version
from BiocManager, garma, geoknife, mice, R6DS, rerddap, rsyncrosim, shiny.info, SMFilter
Package Versions
package | version |
---|---|
pkgstats | 0.2.0 |
pkgcheck | 0.1.2.61 |
Editor-in-Chief Instructions:
Processing may not proceed until the items marked with ✖️ have been resolved.
Hi @antaldaniel, just checking in, the editor checks indicate a few minor issues that could be addressed fairly easily I think. Are you in a position to fix these issues so we can resume this review? |
Yes, I am. I just created in the last days a plan to improve this package, and add an inheritated package for a specific use, because I think that the mass use was missing that would have created interest and contributions to the package. I will make these small changes, but also include for review a new conceptual vignette to explain better the mission statement. |
Great, thank you for the update, @antaldaniel! |
Submitting Author Name: Daniel Antal
Due date for @msperlin: 2022-09-19Submitting Author Github Handle: @antaldaniel
Repository: https://github.com/dataobservatory-eu/dataset/
Version submitted: 0.1.7
Submission type: Standard
Editor: @annakrystalli
Reviewers: @msperlin, @romanflury
Due date for @romanflury: 2022-09-21
Archive: TBD
Version accepted: TBD
Language: en
You can find the package website on dataset.dataobservatory.eu. The article Motivation: Make Tidy Datasets Easier to Release Exchange and Reuse will eventually be condensed into a JOSS paper. It has a major development dilemma.
Scope
Please indicate which category or categories from our package fit policies this package falls under: (Please check an appropriate box below. If you are unsure, we suggest you make a pre-submission inquiry.):
Explain how and why the package falls under these categories (briefly, 1-2 sentences):
Open science repositories and analyst comupters are full with datasets that have no provenance, structural or referential data. We believe that whenever possible, metadata should be machine-recorded when possible, and should not be detached from an R object.
There are several R packages that have overalapping goals or functionality to
dataset
, but they use a different philosophy. When exporting to different files, they should be written as exported, but no sooner, and preferably into the file that contains the data.Who is the target audience and what are scientific applications of this package?
This package is intended to give a common foundation to the rOpenGov reproducible research packages. It mainly serves communities that want to reuse statistical data (using the SDMX statistical (meta)data exchange sources, like Eurostat, IMF, World Bank, OECD...) or release new datasets from primary social sciences data that can be integrated into an SDMX compatible API or placed on a knowledge graph. Our main aim is to provide a clear publication workflow to the European open science repository Zenodo, and clear serialization strategies to RDF application.
The dataspice package aims to create well-defined and referenced datasets, but follows a different schema and a different publication strategy. The dataset package follows the more restrictive W3C/SDMX "DataSet" definition within the datacube model, which is better suited to synchronize with statistical data sources. Unlike dataset, it uses a manual metadata entry from CSV files. (See the documentation of the dataspice package.)
The
dataset
package aims for a higher level of reproducibality, and does not detach the metadata from the R object's attributes (it is aimed to be used in other reproducible research pacakges that will directly record provenance and other transactional metadata into the attributes.) We aim to bind togetherdataspice
anddataset
by creating export functions to csv files that contain the same metadata that dataspice records. Generally, dataspice seems to be better suited to raw, observational data, while dataset for statistically processed data.The intended use of
dataset
is to start correctly record referential, structural and provenance metadata retrieved by various reproducible science packages that interact with statistical data (such as the rOpenGov packages eurostat and iotables, or the oecd package.Neither
dataset
ordataspice
are very suitable of or documenting social sciences survey data, which are usually held in datasets. Our aim is to connectdataset
, declared and DDIwR to create such datasets with DDI codebook metadata. They will create a stable new foundation of the retroharmonize package to create new, well-documented and harmonized statistical datasets from the observational datasets of social sciences surveys.The zen4R package provides reproducible export functionality to the zenodo open science repository. Interacting with
zen4R
may be intimidating for the casual R user as it uses R6 classes. Our aim to provide an export function that completely wraps the workings ofzen4R
when releasing the dataset.In our experience, while the tidy data standards make reuse more efficient by eliminating unnecessary data processing steps before analysis or placement in a relational database, the application of DataSet definition and the datacube model with the information science metadata standards make reuse more efficient with exchanging and combining the data with other data in different datasets.
Yes
If you made a pre-submission inquiry, please paste the link to the corresponding issue, forum post, or other discussion, or @tag the editor you contacted.
Explain reasons for any
pkgcheck
items which your package is unable to pass.Technical checks
Confirm each of the following by checking the box.
This package:
Publication options
[x ] Do you intend for this package to go on CRAN? -> Yes, I started the CRAN publication process, but opted to stop and get feedback from rOpenSic first
Do you intend for this package to go on Bioconductor? -> Don't know.
Do you wish to submit an Applications Article about your package to Methods in Ecology and Evolution? If so:
MEE Options
Code of conduct
The text was updated successfully, but these errors were encountered: