Skip to content

Commit

Permalink
Add all Python packages (versioned!) to Dockerfile (AlexsLemonade#1008)
Browse files Browse the repository at this point in the history
* Add requirements.txt for reference

* consolidate python and add version numbers for everything

* Add back setuptools at start

* Add Cython to starting set

* Add back EPN subtyping back to CI

* print python package list for debugging

* No really, use the right packages

* Test a python check script

* Test making the python package test fail

* Revert "Test making the python package test fail"

This reverts commit 8a221e6.

* Make the python package test better

* Add one more error check

* Fix filename second time

* Use tmp and clean up

* One more failure check

* Revert test & add docs

* One more doc comment
  • Loading branch information
jashapiro authored Apr 16, 2021
1 parent 84a3247 commit 51733f4
Show file tree
Hide file tree
Showing 5 changed files with 204 additions and 26 deletions.
10 changes: 7 additions & 3 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ jobs:
name: List Data Directory Contents
command: ./scripts/run_in_ci.sh ls data/testing

- run:
name: Check python packages
command: ./scripts/run_in_ci.sh bash scripts/check-python.sh

- run:
name: High level histology grouping for plot labels
command: ./scripts/run_in_ci.sh Rscript -e "rmarkdown::render('figures/mapping-histology-labels.Rmd', clean = TRUE)"
Expand Down Expand Up @@ -56,9 +60,9 @@ jobs:
name: Molecular subtyping Chordoma
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-chordoma/run-molecular-subtyping-chordoma.sh

# - run:
# name: Molecular subtyping - Ependymoma
# command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-EPN/run-molecular-subtyping-EPN.sh
- run:
name: Molecular subtyping - Ependymoma
command: OPENPBTA_SUBSET=0 ./scripts/run_in_ci.sh bash analyses/molecular-subtyping-EPN/run-molecular-subtyping-EPN.sh

- run:
name: Molecular Subtyping - LGAT
Expand Down
99 changes: 77 additions & 22 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,10 +27,14 @@ RUN apt-get -y --no-install-recommends install \
RUN apt-get -y --no-install-recommends install \
libpoppler-cpp-dev

# Install pip3 and instalation tools
# Install pip3 and low-level python installation reqs
RUN apt-get -y --no-install-recommends install \
python3-pip python3-dev
RUN pip3 install "setuptools==46.3.0" "six==1.14.0" "wheel==0.34.2"
RUN pip3 install \
"Cython==0.29.15" \
"setuptools==46.3.0" \
"six==1.14.0" \
"wheel==0.34.2"

# Install java
RUN apt-get -y --no-install-recommends install \
Expand Down Expand Up @@ -237,40 +241,94 @@ RUN R -e "remotes::install_github('wilkox/treemapify', ref = 'e70adf727f4d13223d
# Need this specific version of circlize so it has hg38
RUN R -e "remotes::install_github('jokergoo/circlize', ref = 'b7d86409d7f893e881980b705ba1dbc758df847d', dependencies = TRUE)"

# Install python libraries
# Install python packages
##########################

# Install python3 data science tools
# Install python3 tools and ALL dependencies
RUN pip3 install \
"cycler==0.10.0" "kiwisolver==1.1.0" "pyparsing==2.4.5" "python-dateutil==2.8.1" "pytz==2019.3" \
"cython==0.29.15" \
"appdirs==1.4.4" \
"attrs==20.3.0" \
"backcall==0.2.0" \
"bleach==3.3.0" \
"bx-python==0.8.8" \
"certifi==2020.12.5" \
"chardet==4.0.0" \
"ConfigArgParse==1.4" \
"CrossMap==0.3.9" \
"cycler==0.10.0" \
"datrie==0.8.2" \
"decorator==4.4.2" \
"defusedxml==0.7.1" \
"docutils==0.16" \
"entrypoints==0.3" \
"gitdb==4.0.7" \
"GitPython==3.1.14" \
"idna==2.10" \
"importlib-metadata==2.1.1" \
"ipykernel==4.8.1" \
"ipython==7.9.0" \
"ipython-genutils==0.2.0" \
"jedi==0.17.2" \
"Jinja2==2.11.3" \
"jsonschema==3.2.0" \
"jupyter-client==6.1.12" \
"jupyter-core==4.6.3" \
"kiwisolver==1.1.0" \
"MarkupSafe==1.1.1" \
"matplotlib==3.0.3" \
"mistune==0.8.4" \
"mizani==0.5.4" \
"nbconvert==5.6.1" \
"nbformat==5.1.2" \
"notebook==6.0.0" \
"numpy==1.17.3" \
"packaging==20.9" \
"palettable==3.3.0" \
"pandas==0.25.3" \
"pandocfilters==1.4.3" \
"parso==0.7.1" \
"patsy==0.5.1" \
"pexpect==4.8.0" \
"pickleshare==0.7.5" \
"plotnine==0.3.0" \
"prometheus-client==0.9.0" \
"prompt-toolkit==2.0.10" \
"psutil==5.8.0" \
"ptyprocess==0.7.0" \
"pyarrow==0.16.0" \
"pybedtools==0.8.1" \
"pyBigWig==0.3.17" \
"Pygments==2.8.1" \
"pyparsing==2.4.5" \
"pyreadr==0.2.1" \
"pyrsistent==0.17.3" \
"pysam==0.15.4" \
"python-dateutil==2.8.1" \
"pytz==2019.3" \
"PyYAML==5.3.1" \
"pyzmq==20.0.0" \
"ratelimiter==1.2.0.post0" \
"requests==2.25.1" \
"rpy2==2.9.3" \
"scikit-learn==0.19.1" \
"scipy==1.3.2" \
"seaborn==0.8.1" \
"Send2Trash==1.5.0" \
"six==1.14.0" \
"smmap==4.0.0" \
"snakemake==5.8.1" \
"statsmodels==0.10.2" \
"tzlocal==2.0" \
"terminado==0.8.3" \
"testpath==0.4.4" \
"tornado==6.1" \
"traitlets==4.3.3" \
"tzlocal==2.0.0" \
"urllib3==1.26.4" \
"wcwidth==0.2.5" \
"webencodings==0.5.1" \
"widgetsnbextension==2.0.0" \
&& rm -rf /root/.cache/pip/wheels

# Install Rpy2
RUN pip3 install "rpy2==2.9.3" \
&& rm -rf /root/.cache/pip/wheels

# Install CrossMap for liftover
RUN pip3 install \
"bx-python==0.8.8" \
"pybigwig==0.3.17" \
"pysam==0.15.4" \
"CrossMap==0.3.9" \
"wrapt==1.12.1" \
"zipp==1.2.0" \
&& rm -rf /root/.cache/pip/wheels


Expand Down Expand Up @@ -312,9 +370,6 @@ RUN ./install_bioc.r \
multipanelfigure \
gplots

# pybedtools for D3B TMB analysis
RUN pip3 install "pybedtools==0.8.1"

# Molecular subtyping MB
RUN R -e "remotes::install_github('d3b-center/medullo-classifier-package', ref = 'e3d12f64e2e4e00f5ea884f3353eb8c4b612abe8', dependencies = TRUE, upgrade = FALSE)" \
&& ./install_bioc.r MM2S
Expand Down
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -192,7 +192,7 @@ Artifacts include both vector or high-resolution figures sufficient for inclusio

#### Software Dependencies

Analyses should be performed within the project's [Docker container](https://github.com/AlexsLemonade/OpenPBTA-analysis#docker-container).
Analyses should be performed within the project's [Docker container](https://github.com/AlexsLemonade/OpenPBTA-analysis#docker-image).
We use a single monolithic container in these analyses for ease of use.
If you need software that is not included, please edit the Dockerfile to install the relevant software or file a [new issue on this repository](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/new) requesting assistance.

Expand Down Expand Up @@ -292,6 +292,8 @@ To add dependencies that are required for your analysis to the project Docker im
* Installing most packages, from CRAN or Bioconductor, should be done with our `install_bioc.R` script, which will ensure that the proper MRAN snapshot is used. `BiocManager::install()` should *not* be used, as it will not install from MRAN.
* R packages that are not available in the MRAN snapshot can be installed via github with the `remotes::install_github()` function, with the commit specified by the `ref` argument.
* Python packages should be installed with `pip3 install` with version numbers for all packages and dependencies specified.
* As a secondary check, we maintain a `requirements.txt` file to check versions of all python packages and dependencies.
* When adding a new package, make sure that all dependencies are also added; every package should appear with a specified version **both** in the `Dockerfile` and `requirements.txt`.
* Other software can be installed with `apt-get`, but this should *never* be used for R packages.

If you need assistance adding a dependency to the Dockerfile, [file a new issue on this repository](https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/new) to request help.
Expand Down
84 changes: 84 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
appdirs==1.4.4
attrs==20.3.0
backcall==0.2.0
bleach==3.3.0
bx-python==0.8.8
certifi==2020.12.5
chardet==4.0.0
ConfigArgParse==1.4
CrossMap==0.3.9
cycler==0.10.0
Cython==0.29.15
datrie==0.8.2
decorator==4.4.2
defusedxml==0.7.1
docutils==0.16
entrypoints==0.3
gitdb==4.0.7
GitPython==3.1.14
idna==2.10
importlib-metadata==2.1.1
ipykernel==4.8.1
ipython==7.9.0
ipython-genutils==0.2.0
jedi==0.17.2
Jinja2==2.11.3
jsonschema==3.2.0
jupyter-client==6.1.12
jupyter-core==4.6.3
kiwisolver==1.1.0
MarkupSafe==1.1.1
matplotlib==3.0.3
mistune==0.8.4
mizani==0.5.4
nbconvert==5.6.1
nbformat==5.1.2
notebook==6.0.0
numpy==1.17.3
packaging==20.9
palettable==3.3.0
pandas==0.25.3
pandocfilters==1.4.3
parso==0.7.1
patsy==0.5.1
pexpect==4.8.0
pickleshare==0.7.5
plotnine==0.3.0
prometheus-client==0.9.0
prompt-toolkit==2.0.10
psutil==5.8.0
ptyprocess==0.7.0
pyarrow==0.16.0
pybedtools==0.8.1
pyBigWig==0.3.17
Pygments==2.8.1
pyparsing==2.4.5
pyreadr==0.2.1
pyrsistent==0.17.3
pysam==0.15.4
python-dateutil==2.8.1
pytz==2019.3
PyYAML==5.3.1
pyzmq==20.0.0
ratelimiter==1.2.0.post0
requests==2.25.1
rpy2==2.9.3
scikit-learn==0.19.1
scipy==1.3.2
seaborn==0.8.1
Send2Trash==1.5.0
six==1.14.0
smmap==4.0.0
snakemake==5.8.1
statsmodels==0.10.2
terminado==0.8.3
testpath==0.4.4
tornado==6.1
traitlets==4.3.3
tzlocal==2.0.0
urllib3==1.26.4
wcwidth==0.2.5
webencodings==0.5.1
widgetsnbextension==2.0.0
wrapt==1.12.1
zipp==1.2.0
33 changes: 33 additions & 0 deletions scripts/check-python.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/bash

set -e
set -o pipefail

# This script checks that all python files in the docker image match
# requirements.txt

# run from this file location but move up to root
cd "$(dirname "${BASH_SOURCE[0]}")"
cd ..

req_diff=/tmp/package_diffs.txt

## diff will exit code 1 with differences, so we need to pass true
pip3 freeze | diff requirements.txt - > $req_diff || true

# check if there are any differences in the file
if [ -s $req_diff ]
then
cat $req_diff && rm $req_diff
echo "Python packages do not match requirements.txt, please check."
exit 1
fi

# if the diffs file was not produced for some reason, we should be sure to fail the same way
if [ ! -e $req_diff ]
then
pip3 freeze | diff requirements.txt -
fi

# clean up
rm $req_diff

0 comments on commit 51733f4

Please sign in to comment.