Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: An R Companion for Introduction to Data Mining #223

Open
editorialbot opened this issue Feb 13, 2024 · 78 comments
Open

[REVIEW]: An R Companion for Introduction to Data Mining #223

editorialbot opened this issue Feb 13, 2024 · 78 comments
Assignees
Labels
CSS HTML JavaScript recommend-accept Papers recommended for acceptance in JOSE. review TeX

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Feb 13, 2024

Submitting author: @mhahsler (Michael Hahsler)
Repository: https://github.com/mhahsler/Introduction_to_Data_Mining_R_Examples
Branch with paper.md (empty if default branch):
Version: 1.0.1
Editor: @stats-tgeorge
Reviewers: @rudeboybert, @stats-tgeorge
Archive: 10.6084/m9.figshare.26750404.v1
Paper kind: learning module

Status

status

Status badge code:

HTML: <a href="https://jose.theoj.org/papers/1c21e289cb9c887b65cb58740f947f07"><img src="https://jose.theoj.org/papers/1c21e289cb9c887b65cb58740f947f07/status.svg"></a>
Markdown: [![status](https://jose.theoj.org/papers/1c21e289cb9c887b65cb58740f947f07/status.svg)](https://jose.theoj.org/papers/1c21e289cb9c887b65cb58740f947f07)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@hughshanahan & @rudeboybert, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://openjournals.readthedocs.io/en/jose/reviewer_guidelines.html. Any questions/concerns please let @stats-tgeorge know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @rudeboybert

📝 Checklist for @stats-tgeorge

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.21105/joss.01686 is OK
- 10.18637/jss.v091.i01 is OK
- 10.18637/jss.v025.i03 is OK
- 10.32614/RJ-2017-047 is OK
- 10.18637/jss.v028.i05 is OK

MISSING DOIs

- None

INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.88  T=5.92 s (11.8 files/s, 4540.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            19            695             45          11004
JavaScript                      18            853           1223           6185
Rmd                              9           1116           2249           1284
CSS                             10             96             81            856
TeX                              3             81              0            665
SVG                              1              0              0            288
Markdown                         3             28              0             93
YAML                             3              1              0             19
R                                2              1              3              5
Bourne Shell                     1              1              0              2
JSON                             1              0              0              1
-------------------------------------------------------------------------------
SUM:                            70           2872           3601          20402
-------------------------------------------------------------------------------


gitinspector failed to run statistical information for the repository

@editorialbot
Copy link
Collaborator Author

Wordcount for paper.md is 492

@stats-tgeorge
Copy link

Thank you both @hughshanahan and @rudeboybert for agreeing to review this submission. Let's aim to complete your checklists by March 15th, 2024.

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@stats-tgeorge
Copy link

Hello @hughshanahan and @rudeboybert, this is a friendly reminder to review this submission. Thank you!

@stats-tgeorge
Copy link

Hello @hughshanahan, are you able to review this?

@stats-tgeorge
Copy link

Hello @rudeboybert, are you able to review this?

@rudeboybert
Copy link

rudeboybert commented May 22, 2024

So sorry @stats-tgeorge to have dropped the ball on this. It was a brutal semester. Do you still need this? If so, I can get it to you by the evening of Thu 6/30.

@stats-tgeorge
Copy link

Hello @rudeboybert. I am sorry to hear of the rough semester. I'm sure you are excited about summer then! I am still looking for people to review this. This is also my area so I can be a reviewer if necessary. I may see if others are available now that it is summer. Thank you for following up!

@rudeboybert
Copy link

rudeboybert commented Jun 15, 2024

Review checklist for @rudeboybert

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source for this learning module available at the https://github.com/mhahsler/Introduction_to_Data_Mining_R_Examples?
  • License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
  • Version: Does the release version given match the repository release?
  • Authorship: Has the submitting author (@mhahsler) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

  • A statement of need: Do the authors clearly state the need for this module and who the target audience is?
  • Installation instructions: Is there a clearly stated list of dependencies?
  • Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

  • Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
  • Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
  • Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
  • Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
  • Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Does the paper clearly state the need for this module and who the target audience is?
  • Description: Does the paper describe the learning materials and sequence?
  • Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
  • Could someone else teach with this module, given the right expertise?
  • Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

@rudeboybert
Copy link

@mhahsler a few outstanding items need to be addressed in my checklist above

@mhahsler
Copy link

@rudeboybert Thank you for the review. I have addressed the three issues:

  1. Added a section called "Software Requirements" in README.md with installation instructions.
  2. Added a section called "Statement of Need" in README.md
  3. Created a release with version number 1.0.0 for the GitHub repository.

I am not sure if creating the release is sufficient to address "Version: Does the release version given match the repository release?".

Best regards,
Michael

@stats-tgeorge
Copy link

@editorialbot remove @hughshanahan as reviewer

@editorialbot
Copy link
Collaborator Author

@hughshanahan removed from the reviewers list!

@stats-tgeorge
Copy link

@editorialbot add @stats-tgeorge as reviewer

@editorialbot
Copy link
Collaborator Author

@stats-tgeorge added to the reviewers list!

@stats-tgeorge
Copy link

@editorialbot commands

@editorialbot
Copy link
Collaborator Author

Hello @stats-tgeorge, here are the things you can ask me to do:


# List all available commands
@editorialbot commands

# Add to this issue's reviewers list
@editorialbot add @username as reviewer

# Remove from this issue's reviewers list
@editorialbot remove @username from reviewers

# Get a list of all editors's GitHub handles
@editorialbot list editors

# Assign a user as the editor of this submission
@editorialbot assign @username as editor

# Remove the editor assigned to this submission
@editorialbot remove editor

# Remind an author, a reviewer or the editor to return to a review after a 
# certain period of time (supported units days and weeks)
@editorialbot remind @reviewer in 2 weeks

# Adds a checklist for the reviewer using this command
@editorialbot generate my checklist

# Set a value for version
@editorialbot set v1.0.0 as version

# Set a value for branch
@editorialbot set jose-paper as branch

# Set a value for repository
@editorialbot set https://github.com/organization/repo as repository

# Set a value for the archive DOI
@editorialbot set 10.5281/zenodo.6861996 as archive

# Mention the EiCs for the correct track
@editorialbot ping track-eic

# Run checks and provide information on the repository and the paper file
@editorialbot check repository

# Check the references of the paper for missing DOIs
@editorialbot check references

# Generates the pdf paper
@editorialbot generate pdf

# Recommends the submission for acceptance
@editorialbot recommend-accept

# Generates a LaTeX preprint file
@editorialbot generate preprint

# Flag submission with questionable scope
@editorialbot query scope

# Get a link to the complete list of reviewers
@editorialbot list reviewers

# Creates a post-review checklist with editor and authors tasks
@editorialbot create post-review checklist

# Open the review issue
@editorialbot start review

@stats-tgeorge
Copy link

stats-tgeorge commented Jun 21, 2024

Review checklist for @stats-tgeorge

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source for this learning module available at the https://github.com/mhahsler/Introduction_to_Data_Mining_R_Examples?
  • License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
  • Version: Does the release version given match the repository release?
  • Authorship: Has the submitting author (@mhahsler) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

  • A statement of need: Do the authors clearly state the need for this module and who the target audience is?
  • Installation instructions: Is there a clearly stated list of dependencies?
  • Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

  • Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
  • Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
  • Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
  • Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
  • Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Does the paper clearly state the need for this module and who the target audience is?
  • Description: Does the paper describe the learning materials and sequence?
  • Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
  • Could someone else teach with this module, given the right expertise?
  • Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

@stats-tgeorge
Copy link

@editorialbot check repository

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=0.18 s (383.6 files/s, 147397.0 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                            19            695             45          11004
JavaScript                      18            853           1223           6185
Rmd                              9           1116           2249           1284
CSS                             10             96             81            856
TeX                              3             81              0            665
SVG                              1              0              0            288
Markdown                         3             34              0            108
YAML                             3              1              0             19
R                                2              1              3              5
Bourne Shell                     1              1              0              2
JSON                             1              0              0              1
-------------------------------------------------------------------------------
SUM:                            70           2878           3601          20417
-------------------------------------------------------------------------------

Commit count by author:

   144	Michael Hahsler
    41	mhahsler
     2	vz-ai
     1	Juanjo Bazán

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 492

✅ The paper includes a Statement of need section

@mhahsler
Copy link

mhahsler commented Aug 15, 2024

@stats-tgeorge Additional Author Tasks After Review is Complete

  • Double check authors and affiliations (including ORCIDs)
  • Make a release of the software with the latest changes from the review and post the version number here. This is the version that will be used in the JOSE paper.
  • Archive the release on Zenodo/figshare/etc and post the DOI here.
  • Make sure that the title and author list (including ORCIDs) in the archive match those in the JOSE paper.
  • Make sure that the license listed for the archive is the same as the software license.

Version Number: 1.0.1 (https://github.com/mhahsler/Introduction_to_Data_Mining_R_Examples/releases/tag/1.0.1)
Archive: figshare DOI: 10.6084/m9.figshare.26750404.v1
Note on license: figshare only gives a choice for CC BY 4.0, while the book uses CC BY-NC 4.0. I can change everything to CC BY 4.0, if that is necessary.

@mhahsler
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@stats-tgeorge
Copy link

@editorialbot set 10.6084/m9.figshare.26750404.v1 as archive

@editorialbot
Copy link
Collaborator Author

Done! archive is now 10.6084/m9.figshare.26750404.v1

@stats-tgeorge
Copy link

@editorialbot set 1.0.1 as version

@editorialbot
Copy link
Collaborator Author

Done! version is now 1.0.1

@stats-tgeorge
Copy link

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@stats-tgeorge
Copy link

@mhahsler the DOI needs updating on the JOSE paper. Once that is fixed I will recommend accepting.

@mhahsler
Copy link

@stats-tgeorge That is geat news! Thanks for helping me through this process.

About the DOI: I assume you are asking for this missing DOI

MISSING DOIs

- 10.1002/0471687545.ch1 may be a valid DOI for title: Introduction to Data Mining

The referenced book is Tan, P.-N., Steinbach, M. S., Karpatne, A., & Kumar, V. (2017). Introduction to data
mining (2nd Edition).
Pearson. ISBN: 978-0133128901

The suggested DOI is for a different book: Discovering Knowledge in Data: An Introduction to Data Mining
by Daniel T. Larose Ph.D.

It seems like not all books have assigned DOIs. There is no DOI on the publisher's page
https://www.pearson.com/en-us/subject-catalog/p/introduction-to-data-mining/P200000003204/9780137506286

I also cannot find one registered with crossref
https://search.crossref.org/search/works?q=%22Introduction+to+Data+Mining%22+Tan+Steinbach+Kumar&from_ui=yes

As far as I can tell, the book does not have a DOI at this point.

@stats-tgeorge
Copy link

@stats-tgeorge That is geat news! Thanks for helping me through this process.

About the DOI: I assume you are asking for this missing DOI


MISSING DOIs



- 10.1002/0471687545.ch1 may be a valid DOI for title: Introduction to Data Mining

The referenced book is Tan, P.-N., Steinbach, M. S., Karpatne, A., & Kumar, V. (2017). _Introduction to data

mining (2nd Edition)._ Pearson. ISBN: 978-0133128901

The suggested DOI is for a different book: Discovering Knowledge in Data: An Introduction to Data Mining

by Daniel T. Larose Ph.D.

It seems like not all books have assigned DOIs. There is no DOI on the publisher's page

https://www.pearson.com/en-us/subject-catalog/p/introduction-to-data-mining/P200000003204/9780137506286

I also cannot find one registered with crossref

https://search.crossref.org/search/works?q=%22Introduction+to+Data+Mining%22+Tan+Steinbach+Kumar&from_ui=yes

As far as I can tell, the book does not have a DOI at this point.

Good morning! It's what I'm here for - I am sorry this took so long!

I saw you mentioned that DOI before. I am referring to the DOI on the left side of the rendered JOSE paper. Looks like you had a placeholder there.

@stats-tgeorge
Copy link

@mhahsler This is my misunderstanding. That DOI is assigned once accepted. Working on moving forward!

@stats-tgeorge
Copy link

stats-tgeorge commented Aug 28, 2024

@openjournals/jose-eics I believe we are ready to move forward to publish. TY!

@mhahsler
Copy link

@openjournals/jose-eics Hi. Is there anything that I need to do?

@mhahsler
Copy link

@stats-tgeorge Hi George. Is the @openjournals/jose-eics handle correct?

@stats-tgeorge
Copy link

@mhahsler It links to our EIC so it appears to work correctly. I know she is very backed up at the moment.

@mhahsler
Copy link

Thanks for letting me know.

@stats-tgeorge
Copy link

@editorialbot recommend-accept

@editorialbot
Copy link
Collaborator Author

Attempting dry run of processing paper acceptance...

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.18637/jss.v014.i15 is OK
- 10.21105/joss.01686 is OK
- 10.18637/jss.v091.i01 is OK
- 10.18637/jss.v025.i03 is OK
- 10.32614/RJ-2017-047 is OK
- 10.18637/jss.v028.i05 is OK
- 10.32614/CRAN.package.tidymodels is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Introduction to Data Mining

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

👋 @openjournals/jose-eics, this paper is ready to be accepted and published.

Check final proof 👉📄 Download article

If the paper PDF and the deposit XML files look good in openjournals/jose-papers#160, then you can now move forward with accepting the submission by compiling again with the command @editorialbot accept

@editorialbot editorialbot added the recommend-accept Papers recommended for acceptance in JOSE. label Nov 11, 2024
@stats-tgeorge
Copy link

@stats-tgeorge Hi George. Is the @openjournals/jose-eics handle correct?

You were correct. I was following our guide (which suggested what I did) and not my checklist. My mistake.

@labarba
Copy link
Member

labarba commented Nov 11, 2024

Should the archive DOI be updated to v2?

@stats-tgeorge
Copy link

stats-tgeorge commented Nov 11, 2024

It looks like version 2 was created after the DOI was updated here. Yes, it now needs to be updated again.

@labarba
Copy link
Member

labarba commented Nov 11, 2024

I'm confused. This archive:

Hahsler, Michael (2024). An R Companion for Introduction to Data Mining. figshare. Book. https://doi.org/10.6084/m9.figshare.26750404.v2

...contains a 357-page PDF.

v1 of that archive seems to hold the file contents of a website.

In the JOSE paper, it says that the materials are available at https://mhahsler.github.io/Introduction_to_Data_Mining_R_Examples/book/

The "View book source" button sends me to this GitHub repo: https://github.com/mhahsler/Introduction_to_Data_Mining_R_Examples

So why don't we have as the archive a deposit of the GitHub repository, which is in fact the source for the book? If a user wants to fork and modify, they need the source.

I also notice that the Figshare deposit (both versions) show a CC-BY license, while the website and GitHub repo indicate CC-BY-SA. Shouldn't the licenses be the same? (I realize these are different objects, per my inquiry above, but still…)

@stats-tgeorge
Copy link

@labarba should @mhahsler then create a V3 of the archive that has the appropriate items (archive for the GitHub repo)? Then pick a singular license?

@labarba
Copy link
Member

labarba commented Nov 12, 2024

I notice the comment above where the author says:

Note on license: figshare only gives a choice for CC BY 4.0, while the book uses CC BY-NC 4.0. I can change everything to CC BY 4.0, if that is necessary.

The online book shows a CC-BY license, so this matches. I had seen on the repository a CC-BY-SA license, but the author has just changed it. Indeed, the license should match everywhere.

The issue remains that the archive on Figshare is a collection of website files (v1) or a PDF (v2). What JOSE wants is an archive of the source, which is the only thing that guarantees future reuse and derivative works. Having a Fighsare deposit of the PDF is nice (users can immediately read), but it is not conducive to derivative works.

I would suggest a Zenodo deposit of the GitHub repo. If the author prefers Figshare, then a zip of the repo is the only way.

@mhahsler
Copy link

@labarba and @stats-tgeorge: I agree this isn't very clear.

I have now done the following:

  1. Figshare does not give the option of using CC-BY-SA only CC-BY, so I have changed the license for all the material to the more permissive CS-BY license. This needs to be updated in the paper.
  2. I have made a new version on Figshare (v3) which contains the books PDF and a archived version of the complete GitHub repository + a README with a link to GitHub. The paper should only use https://github.com/mhahsler/Introduction_to_Data_Mining_R_Examples and https://doi.org/10.6084/m9.figshare.26750404.v3

@labarba Please let me know if I should update the paper in my GitHub repository.

BTW: I tried Zenodo first, but it did not like the repository with so many non-code files and died in the process of importing the repository.

Thanks for your help,
Michael

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CSS HTML JavaScript recommend-accept Papers recommended for acceptance in JOSE. review TeX
Projects
None yet
Development

No branches or pull requests

5 participants