Skip to content
This repository has been archived by the owner on Jun 21, 2023. It is now read-only.

Add all Python packages (versioned!) to Dockerfile #1008

Merged
merged 17 commits into from
Apr 16, 2021

Conversation

jashapiro
Copy link
Member

Purpose/implementation Section

This PR addresses the problems we were having where EPN subtyping scripts were failing due to an underlying update to jupyter notebook packages (we think). Unfortunately, we had not specified versions of every python package.

What was your approach?

Luckily, I had an older version of the OpenPBTA image on my machine, so I was able to use pip3 freeze to get the list of packages installed with versions, and I copied that into the Dockerfile. This also resulted in consolidating most of the python installation steps, though because of the way that pip3 does its installation, we still need to have a separate step for some of the low-level packages.

I am including the output of pip3 freeze in a requirements.txt file for reference, but I have not yet added any instructions on how to keep that up to date.

What GitHub issue does your pull request address?

Closes #1006
Closes #991

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Just hoping everything passes CI!

Is there anything that you want to discuss further?

I may add a little script to run in CI to check that the packages installed match requirements.txt; does this seem like it might be useful?

Reproducibility Checklist

  • The dependencies required to run the code in this pull request have been added to the project Dockerfile.
  • This analysis has been added to continuous integration.

Documentation Checklist

  • This analysis module has a README and it is up to date.
  • This analysis is recorded in the table in analyses/README.md and the entry is up to date.
  • The analytical code is documented and contains comments.

@jashapiro
Copy link
Member Author

I added a CI script to check that the packages match the requirements.txt file, which should catch any future problems like the one we ran into in #991, by enforcing that all dependencies are installed and versioned if any new python packages are added. With that, and a small addition to docs, I think this should be ready to go.

@jashapiro jashapiro marked this pull request as ready for review April 16, 2021 11:44
@jashapiro
Copy link
Member Author

I added a CI script to check that the packages match the requirements.txt file, which should catch any future problems like the one we ran into in #991, by enforcing that all dependencies are installed and versioned if any new python packages are added. With that, and a small addition to docs, I think this should be ready to go.

@jashapiro jashapiro requested a review from jaclyn-taroni April 16, 2021 11:44
Copy link
Member

@jaclyn-taroni jaclyn-taroni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with the approach! One comment about the docs though. No need for me to re-review unless desired.

README.md Outdated
@@ -292,6 +292,7 @@ To add dependencies that are required for your analysis to the project Docker im
* Installing most packages, from CRAN or Bioconductor, should be done with our `install_bioc.R` script, which will ensure that the proper MRAN snapshot is used. `BiocManager::install()` should *not* be used, as it will not install from MRAN.
* R packages that are not available in the MRAN snapshot can be installed via github with the `remotes::install_github()` function, with the commit specified by the `ref` argument.
* Python packages should be installed with `pip3 install` with version numbers for all packages and dependencies specified.
* As a secondary check, we maintain a `requirements.txt` file to check versions of all python packages and dependencies.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is someone supposed to do if this fails? Their knee-jerk reaction could be to roll forward requirements.txt, which is not necessarily what we want them to do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that this will cause some discussion & be evident in the review. The idea is really to make sure that all added packages are in both the Docker & requirements file, mostly because it is easy to add a python package and not realize that it is also adding a dependency that may not be versioned. So a reviewer would hopefully catch the discrepancy... Also, if rolling forward requirements.txt results in all tests passing, maybe that is okay?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are on the same page, but I was wondering if we wanted to note that the best course of action for updating requirements will need to be evaluated on a case-by-case basis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a bit more to the comment here: 4d3fc46
I hope that we do not have any trouble with this at this stage, but maybe I am too optimistic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works for me!

@jashapiro jashapiro merged commit 51733f4 into AlexsLemonade:master Apr 16, 2021
@jashapiro jashapiro deleted the jashapiro/pip-freeze branch April 16, 2021 15:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Better pin Python dependencies in project Docker image CI failure: Molecular Subtyping - Ependymoma
2 participants