split testing libraries from run libraries in dependencies.yaml#5447
split testing libraries from run libraries in dependencies.yaml#5447jayavenkatesh19 wants to merge 2 commits intorapidsai:mainfrom
Conversation
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
There was a problem hiding this comment.
Thanks for looking into this @jayavenkatesh19 and I'm so sorry it took so long to review!
I think the run_python_common name is a bit confusing, and I'm worried it'll be hard for contributors to understand which lists to update when (and that that'll lead to testing dependencies getting back into the images accidentally).
I put up an alternative proposal, happy to pair on it if you have questions.
It'd also be helpful for this PR to show the exact contents of the environment. You can get that by running the same rapids-dependency-file-generator call that the docker repo does (code link), like this:
rapids-dependency-file-generator \
--file-key run_notebooks \
--matrix "cuda=13.1;arch=x86_64;py=3.13" \
--output condaOn the current state of this branch, that produces:
# This file is generated by `rapids-dependency-file-generator`.
# To make changes, edit ../../dependencies.yaml and run `rapids-dependency-file-generator`.
channels:
- rapidsai-nightly
- rapidsai
- conda-forge
dependencies:
- certifi
- cuda-version=13.1
- cugraph==26.4.*,>=0.0.0a0
- ipython
- libcugraph==26.4.*,>=0.0.0a0
- networkx>=2.5.1
- notebook>=0.5.0
- numpy>=1.23,<3.0
- packaging
- pandas
- pylibcugraph==26.4.*,>=0.0.0a0
- python-louvain
- python=3.13
- scikit-learn>=0.23.1
- scipy
name: run_notebooks_cuda-131_arch-x86_64_py-313
I can see there some outputs we don't need. scikit-learn isn't used at all in this project and packaging isn't imported in the notebooks, for example.
| - test_python_cugraph | ||
| - depends_on_libcugraph | ||
| - depends_on_pylibcugraph | ||
| - depends_on_cugraph |
There was a problem hiding this comment.
I do like creating a new top-level list called just run_notebooks, but think it's confusing to see anything named test_* showing up in this list, especially to see something called test_notebook showing up in run_notebooks when there is also a test_notebooks.
I have an alternative proposal that I think might make this a bit easier to maintain and understand, let me know what you think:
- leave
test_python_commoncompletely unchanged - create a new list under
dependencies:calledrun_notebooks. Add to that list only packages that are needed for the notebooks and aren't already runtime dependencies ofcugraph,pylibcugraph, andlibcugraph. Use YAML anchors to reference other things already independencies.yaml. Add a code comment above it explaining its purpose. - make this top-level
run_notebookslist here containcuda_version,py_version,run_notebooks, and thedepends_on_*cugraphgroups - pass
--file-key test_notebooks --file-key run_notebookshere https://github.com/rapidsai/docker/blob/2637556074567056aa69a7cb1850ac844ef89748/context/notebooks.sh#L34 (multiple--file-keyare allowed)
That way, we could fine-tune this to exactly the libraries that are needed and nothing else, and the purpose will be clear.
Towards rapidsai/docker#835
Splits
test_python_commondependency group into runtime and testing parts:run_python_common: pandas and scipytest_python_common: pytest and its dependentsAdds
run_python_commonto all 5 file-keys that containtest_python_commonto keep output ofdependency-file-generatorunchanged.Adds a new
run_notebooksfile-key that includesrun_python_commonbut NOTtest_python_commonwhich givesrapidsai/dockera clean set of runtime-only notebook dependencies without the testing libraries.