Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add controls for numpy version #678

Merged
merged 5 commits into from
Jul 29, 2024

Conversation

robertbartel
Copy link
Contributor

@robertbartel robertbartel commented Jul 17, 2024

Add support for environment-config-controlled constraints on the version of numpy installed by the image build process for ngen-related Docker images. (Note: scrapping original idea in favor of a different approach.)

Add support for pip-constraints-file based control over the version of numpy installed by the image build process for ngen-related Docker images.

Slightly related to NOAA-OWP/ngen#837.

Note that this blocks #675, as that revision of the image can't be built for testing due to the problem being fixed here.

Additions

  • New environment config variable for numpy constraint, with default value and comments in example.env
  • New constraints.txt file defining pip constraints for ngen (and related) image builds.

Removals

  • Removes a deprecated and unused Dockerfile ARG MIN_NUMPY for clarity.

Changes

  • New Dockerfile ARG for numpy constraint, applied appropriately where numpy gets installed by image build
  • Setting of aforementioned ARG in Compose build yaml configs, for all ngen-based images/services (e.g., model exec, calibration, etc.)
  • COPY and ENV statements in ngen image Dockerfile to bring in new pip constraint and then set PIP_CONSTRAINT environment variable so that pip uses it.

Testing

  1. ngen worker image builds successfully, whereas previously it would fail due to a failed ngen unit test testing part of the Python BMI integration

Updating Dockerfile and Compose build config file for ngen-based worker
images (calibration, partitioning, testing, etc.) to utilize ARG sourced
from environment config to control the version of numpy that is
installed.
Updating example for DMOD deployment environment config with section for
numpy, new numpy constraint environmental variable used by worker image
builds, safe default value for new variable, and comments explaining.
Updating project requirements.txt and GUI service requirements file
(python/gui/dependencies.txt) to constrain numpy version consistent with
default used by worker image builds set up in example.env.
@robertbartel robertbartel added enhancement New feature or request maas MaaS Workstream Blocks Another issue is blocked until this is resolved labels Jul 17, 2024
docker/main/docker-build.yml Outdated Show resolved Hide resolved
@@ -59,6 +60,7 @@ services:
context: ./ngen
target: rocky_ngen_build_testing
args:
NUMPY_CONSTRAINT: ${NUMPY_INSTALL_CONSTRAINT:?Please set global NUMPY_INSTALL_CONSTRAINT in environment config}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an abuse of the args parameter. Do not use this to inject python versions. Stick to setup.py and requirements.txt.

docker/main/docker-build.yml Outdated Show resolved Hide resolved
docker/main/docker-build.yml Outdated Show resolved Hide resolved
docker/main/docker-build.yml Outdated Show resolved Hide resolved
docker/main/ngen/Dockerfile Outdated Show resolved Hide resolved
docker/main/ngen/Dockerfile Outdated Show resolved Hide resolved
docker/main/ngen/Dockerfile Outdated Show resolved Hide resolved
docker/main/ngen/Dockerfile Outdated Show resolved Hide resolved
example.env Outdated Show resolved Hide resolved
Copy link
Member

@aaraney aaraney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think was should instead add a requirements.txt for the ngen image and put things like this there. I think that will easiest to maintain and understand in the long run. I think it also results in a simpler solution. Thoughts?

python/gui/dependencies.txt Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@robertbartel
Copy link
Contributor Author

I think was should instead add a requirements.txt for the ngen image and put things like this there. I think that will easiest to maintain and understand in the long run. I think it also results in a simpler solution. Thoughts?

@aaraney, @christophertubbs , there are subtle implications to this in the context of building a Docker image. While there may be others as well, at minimum:

  • Combining all the installed Python packages in the Dockerfile into a single requirements file is going to inevitably mess up build caching for some earlier stages and make the build less efficient
  • Combining everything inside the Dockerfile into a single requirements file would force things to be installed into stages (and therefore, images) when they aren't actually needed; e.g., ngen-cal

We can, of course, add subtle complexity to the way we design what goes into to requirement files, whether we use different ones for different build stages, etc. But if we start going down that road, I think what we end up with is not any better of a solution.

Honestly, if allowing configurable control of the version really seems like a problem, what are both of your thoughts on just hard-coding the numpy version constraints into the Dockerfile in the prior places where it was being installed?

@christophertubbs
Copy link
Contributor

If the issue is with a single requirements file... there's nothing keeping us from having a single requirements file. We can also produce new build 'hooks' where extra files may be COPYed in if behavior needs to be altered in different stages.

What we have, right now, is already too complicated. We need to start moving in another direction to find an easier to follow build pipeline. If not making something configurable makes the build process prohibitively resource intensive, something is wrong and we need to take a step back and evaluate our process(es).

For now, hard code the value since it would be global anyways. If a user needs to color outside the lines and use a different value, they may change it themselves.

Ideally, DMOD needs to get out of the game of setting up compilation rules for another product. We don't set compilation flags or environment variables just to prepare pandas for use. DMOD is a consumer of NextGen and compatible formulations. Instructions and constraints for use need to be provided from them, not developed by us. If the DMOD image creation process is responsible for setting this up we're putting ourselves in a position of eternal catch up.

@robertbartel
Copy link
Contributor Author

What we have, right now, is already too complicated. We need to start moving in another direction to find an easier to follow build pipeline.

You aren't wrong, but that's out of scope for what is essentially a bug fix.

If not making something configurable makes the build process prohibitively resource intensive, something is wrong and we need to take a step back and evaluate our process(es).

Just to be clear, the lack of configurability doesn't make the builds less efficient. But artificially tethering together the installation of all the Python packages used throughout the worker images would.

For now, hard code the value since it would be global anyways. If a user needs to color outside the lines and use a different value, they may change it themselves.

Sure, I can make that change for here, and we can separate off the idea of adding (or not) configurability to numpy versioning for workers.

Ideally, DMOD needs to get out of the game of setting up compilation rules for another product. We don't set compilation flags or environment variables just to prepare pandas for use. DMOD is a consumer of NextGen and compatible formulations.

Providing compatible, consistent, reproducible compute environments (whether these be Docker images/containers or some other mechanism) for model execution is a fundamental feature of DMOD, and that means it has to be concerned with compilation flags and environment details and anything else users would have to deal with to set up an environment if they were just trying to use ngen directly. There is certainly room to discuss which of these DMOD opens up via control/configure mechanisms versus what DMOD just handles automagically behind the scenes. And if/when ngen can add its own automation for installing its dependencies like numpy or begins distributing installation packages, great; DMOD can leverage those instead. Until it does though, for DMOD to be able to run ngen, there’s just no way around dealing with this kind of stuff.

@aaraney
Copy link
Member

aaraney commented Jul 17, 2024

I think NOAA-OWP/ngen#856 will solve our issues an obsolete this PR. We will still at some point need to revisit how we handle external bmi module dependency versioning, but that is a whole can of worms that I think we should kick down the road!

@robertbartel
Copy link
Contributor Author

robertbartel commented Jul 17, 2024

@aaraney, no, I think NOAA-OWP/ngen#856 will just cause DMOD image builds to fail for a different reason (the CMake error instead of the unit test failure). Unless ngen actually handles installing numpy itself, we are going to have to address what version of numpy gets installed in the image.

As discussed above, I'll make changes so that we hard code this for now to the latest 1.x.

@aaraney
Copy link
Member

aaraney commented Jul 17, 2024

Extra context, there is already a check in ngen that if the version of numpy that was used at build time does not match the one that is used at runtime, an exception is thrown. So, if ngen is built with numpy<=2 and then numpy in the environment gets updated, we are still covered.

@aaraney
Copy link
Member

aaraney commented Jul 17, 2024

Alright, so after a side channel conversation with @robertbartel, at least IMO, I think we can achieve the goals of this PR using a pip constraints.txt file (very similar to requirements.txt). More specifically, the path to the constraints.txt file would be set to the PIP_CONSTRAINT environment variable in any build stages that build / install python dependencies. In doing so, any packages that are installed via pip will be constrained to the version specifiers in constraints.txt.

In doing so, this lets us control what version or version ranges of certain packages are installed in the image. Likewise, incompatibilities will results in a build failure.

@@ -290,14 +292,15 @@ RUN if [[ "${NETCDF_CXX_VERSION}" == "latest" ]]; then \
FROM rocky-base as rocky-ngen-packaged-deps

ARG ROCKY_NGEN_DEPS_REQUIRED
ARG NUMPY_CONSTRAINT

# TODO: later, go back and change all pip3/python3 to just pip/python (but leave for now to limit scope)
# Note that this includes numpy, which is needed for Python BMI support, regardless of BMI module
USER root
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figure we could move the required python packages to a requirements.txt file (so, pip, wheel, packaging, and numpy) and then create a constraints.txt that we use to enforce the allowed versions of the required and other packages in constraints.txt. Thoughts?

Suggested change
USER root
USER root
COPY requirements.txt ${WORKDIR}/requirements.txt
COPY constraints.txt ${WORKDIR}/constraints.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! I think it's a good solution.

It's been a while since I've had to worry about this, but I believe that editing the constraints.txt and creating a new hash for the file would cause a new docker build to start at the line for the constraints.txt file , so I think we can be sure that a change there will just about always be reflected after a build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like the idea of coupling all the Python packages together in a requirements.txt, as I don't think that fits with how the Dockerfile is set up. It's totally fair to discuss whether that needs to change - e.g., should we really support building the image both with or without Python support (we don't really need numpy if in the without option)) - but it needs to be discussed separately.

But the PIP_CONSTRAINT and constraints.txt settings sound fine. Those alone should be sufficient to resolve the immediate issue.

Copy link
Contributor Author

@robertbartel robertbartel Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@christophertubbs, you raise a good point I hadn't consider: depending on when the constraints.txt file is setup in the image that will once again have implications on build caching and efficiency. That isn't an absolute blocker, but I want to think on it a little more to see if there's anyway to avoid that tradeoff. For the sake of getting the immediate issue fixed, I'm going to go ahead and prepare the changes to go this route, but I'm not sure I want it to be permanent just yet. But that can be spun off to a separate issue and dealt with later.

Copy link
Member

@aaraney aaraney Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't like the idea of coupling all the Python packages together in a requirements.txt, as I don't think that fits with how the Dockerfile is set up.

Yeah, I agree. Im just talking about the required packages to build ngen with python support. So, numpy, pip, wheel and packaging for now. It just seems like a nice place to keep those baseline required packages rather than inlining them in the Dockerfile.


# TODO: later, go back and change all pip3/python3 to just pip/python (but leave for now to limit scope)
# Note that this includes numpy, which is needed for Python BMI support, regardless of BMI module
USER root
RUN dnf update -y && dnf install -y ${ROCKY_NGEN_DEPS_REQUIRED} && dnf clean -y all \
&& ln -s $(which python3) $(which python3 | sed 's/python3/python/') \
&& pip install --no-cache-dir "pip>=23.0,<23.1" wheel packaging \
&& if [ "${NGEN_WITH_PYTHON}" == "ON" ]; then pip install --no-cache-dir numpy; fi
&& if [ "${NGEN_WITH_PYTHON}" == "ON" ]; then pip install --no-cache-dir numpy${NUMPY_CONSTRAINT}; fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would become:

Suggested change
&& if [ "${NGEN_WITH_PYTHON}" == "ON" ]; then pip install --no-cache-dir numpy${NUMPY_CONSTRAINT}; fi
&& if [ "${NGEN_WITH_PYTHON}" == "ON" ]; then pip install --no-cache-dir --requirement requirements.txt --constraint constraints.txt ; fi

Rolling back initial changes in favor of a different approach.
Adding constraints file to ngen (and related) image build to control
numpy and pip versions as needed for compatibility, using this instead
of doing directly within the Dockerfile itself in pip commands.
@@ -0,0 +1,5 @@
### Constrain pip
pip>=23.0,<23.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is necessary after NOAA-OWP/t-route#804 was merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have to be addressed here, just putting it on your radar @robertbartel!

Copy link
Member

@aaraney aaraney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me! Thanks, @robertbartel!

@robertbartel
Copy link
Contributor Author

@christophertubbs, any other concerns before this moves forward? I might either need explicit approval from you or to dismiss your previous review, in order to comply with branch protections (the latter of which I don't want to do until you've had a chance to raise any remaining discussion items).

@robertbartel robertbartel dismissed christophertubbs’s stale review July 29, 2024 17:29

Changes reengineered using different approach based on previous review discussion.

@robertbartel robertbartel merged commit b8dcc03 into NOAA-OWP:master Jul 29, 2024
4 of 8 checks passed
robertbartel added a commit that referenced this pull request Jul 29, 2024
Rolling back initial changes in favor of a different approach.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Another issue is blocked until this is resolved enhancement New feature or request maas MaaS Workstream
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants