Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

can reviewers ask for best practices in the installation of packages? #680

Open
pdebuyl opened this issue Feb 12, 2020 · 3 comments
Open

Comments

@pdebuyl
Copy link
Contributor

pdebuyl commented Feb 12, 2020

My question stems from the recommendation, for Python packages, to use

pip install package

instead of

pip install --user package

The first version will error except if the user uses a virtualenvironment because of permissions. Googling the issue will pop up the bad advice to just use sudo. This can damage system installations of Python.

May reviewers ask to change the instruction to either use the --user flag or to suggest using virtualenvironments?

The issue of best practices was already discussed in #469

I realize that this is a Python specific question, but the general principle should be useful for other tools as well.

@arfon
Copy link
Member

arfon commented Jun 4, 2020

I think this is probably a good idea. Are there existing best-practices documented somewhere that we could reference rather than having to write (and maintain) our own here?

@pdebuyl
Copy link
Contributor Author

pdebuyl commented Jun 4, 2020

From https://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745

The paper "Best Practices for Scientific Computing"

Box 1. Summary of Best Practices
Write programs for people, not computers.
A program should not require its readers to hold more than a handful of facts in memory at once.
Make names consistent, distinctive, and meaningful.
Make code style and formatting consistent.
Let the computer do the work.
Make the computer repeat tasks.
Save recent commands in a file for re-use.
Use a build tool to automate workflows.
Make incremental changes.
Work in small steps with frequent feedback and course correction.
Use a version control system.
Put everything that has been created manually in version control.
Don't repeat yourself (or others).
Every piece of data must have a single authoritative representation in the system.
Modularize code rather than copying and pasting.
Re-use code instead of rewriting it.
Plan for mistakes.
Add assertions to programs to check their operation.
Use an off-the-shelf unit testing library.
Turn bugs into test cases.
Use a symbolic debugger.
Optimize software only after it works correctly.
Use a profiler to identify bottlenecks.
Write code in the highest-level language possible.
Document design and purpose, not mechanics.
Document interfaces and reasons, not implementations.
Refactor code in preference to explaining how it works.
Embed the documentation for a piece of software in that software.
Collaborate.
Use pre-merge code reviews.
Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems.
Use an issue tracking tool.

or https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005510

the paper "Good enough practices in scientific computing"

Box 1. Summary of practices
Data management
Save the raw data.
Ensure that raw data are backed up in more than one location.
Create the data you wish to see in the world.
Create analysis-friendly data.
Record all the steps used to process data.
Anticipate the need to use multiple tables, and use a unique identifier for every record.
Submit data to a reputable DOI-issuing repository so that others can access and cite it.
Software
Place a brief explanatory comment at the start of every program.
Decompose programs into functions.
Be ruthless about eliminating duplication.
Always search for well-maintained software libraries that do what you need.
Test libraries before relying on them.
Give functions and variables meaningful names.
Make dependencies and requirements explicit.
Do not comment and uncomment sections of code to control a program's behavior.
Provide a simple example or test data set.
Submit code to a reputable DOI-issuing repository.
Collaboration
Create an overview of your project.
Create a shared "to-do" list for the project.
Decide on communication strategies.
Make the license explicit.
Make the project citable.
Project organization
Put each project in its own directory, which is named after the project.
Put text documents associated with the project in the doc directory.
Put raw data and metadata in a data directory and files generated during cleanup and analysis in a results directory.
Put project source code in the src directory.
Put external scripts or compiled programs in the bin directory.
Name all files to reflect their content or function.
Keeping track of changes
Back up (almost) everything created by a human being as soon as it is created.
Keep changes small.
Share changes frequently.
Create, maintain, and use a checklist for saving and sharing changes to the project.
Store each project in a folder that is mirrored off the researcher's working machine.
Add a file called CHANGELOG.txt to the project's docs subfolder.
Copy the entire project whenever a significant change has been made.
Use a version control system.
Manuscripts
Write manuscripts using online tools with rich formatting, change tracking, and reference management.
Write the manuscript in a plain text format that permits version control.

@cmaimone
Copy link

Similar question wrt R packages. While not every package needs to be submitted to a package repository, it generally makes it easier for packages to be 1) discovered, and 2) installed by a wide set of users if they are part of CRAN or bioconductor (or maybe others I'm unaware of?). Is there any position on R packages submitted to JOSS being available in such repositories, or alternatively, an acknowledgement of why the package is not in a major repository?

A smaller issue, but with R packages on github, recommending remotes::install_git() instead of devtools::install_git() can also save folks from potential installation issues getting devtools installed. remotes has fewer dependencies and is less likely to cause installation issues on systems not set up for software development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants