Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circumvent GWLF-E SegFaults due to NumPy #3090

Merged
merged 2 commits into from
Feb 26, 2019

Conversation

rajadain
Copy link
Member

@rajadain rajadain commented Feb 22, 2019

Overview

A recent release of NumPy introduced a new module _multiarray_umath. Unfortunately, there are some tools that depend on NumPy and pull in the latest version as a build dependency. This latest version is then replaced with the version specified in our requirements.txt, thus the module is not available at run-time causing the segfaults.

By installing NumPy separately before requirements.txt is gathered, we ensure that whatever was using the most recent version of NumPy in its installation now uses the pre-installed one.

We'll have to ensure that the version of NumPy in requirements.txt matches the one in Ansible.

For details see:

numpy/numpy#11871
https://stackoverflow.com/q/54153886

Connects #3089

Demo

image

Testing Instructions

  • Download and extract this file: celery-segfault.zip

  • Check out this branch and destroy and rebuild the Worker VM

  • Transfer the files into the Worker VM, and execute test.py

    $ python test.py
    
    • Ensure there are no segfaults
  • Compare the generated output.json with the included output.verify.json:

    $ diff output.json output.verify.json
    
    • Ensure they match
  • Go to :8000 and create a MapShed project.

    • Ensure the modeling stages complete successfully

@rajadain rajadain added NSF Funding Source: National Science Foundation in progress labels Feb 22, 2019
Copy link
Contributor

@mmcfarland mmcfarland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the numpy version where this module was introduced? I'm curious, if it within a compatible release range, that we should just upgrade? Are you concerned with introducing breaking changes?

If we stick with this approach, rather than keep the numpy version in sync with requirements.txt, can numpy just be removed from that file altogether since ansible is now installing it?

@rajadain
Copy link
Member Author

What is the numpy version where this module was introduced?

According to https://stackoverflow.com/a/54153887, it was numpy-1.16.0rc2 in January. The current latest release of NumPy is 1.16.1.

I'm curious, if it within a compatible release range, that we should just upgrade? Are you concerned with introducing breaking changes?

I know that GWLF-E and Ulmo both require NumPy, and may transitively bring in their versions. I think the amount of testing required to ensure that the new version works would exhaust the available budget, and this seems the most light-weight solution.

If we stick with this approach, rather than keep the numpy version in sync with requirements.txt, can numpy just be removed from that file altogether since ansible is now installing it?

Good observation. Currently Ansible only installs NumPy for the Worker. In App, it is installed via requirements.txt. I can update the App's role definition to install NumPy and remove it from requirements.txt so we don't have to keep track.

A recent release of NumPy introduced a new module _multiarray_umath.
Unfortunately, there are some tools that depend on NumPy and pull
in the latest version as a build dependency. This latest version is
then replaced with the version specified in our requirements.txt,
thus the module is not available at run-time causing the segfaults.

By installing NumPy separately before requirements.txt is gathered,
we ensure that whatever was using the most recent version of NumPy
in its installation now uses the pre-installed one.

We'll have to ensure that the version of NumPy in requirements.txt
matches the one in Ansible.

For details see:

numpy/numpy#11871
https://stackoverflow.com/q/54153886
These are installed separately due to build issues. Having them
here was redundant, and a cause of potential confusion if the
version numbers in Ansible were to change away from these.
@rajadain rajadain force-pushed the tt/circumvent-gwlfe-segfaults branch from 4352ee5 to 8aa7b57 Compare February 22, 2019 17:30
@rajadain
Copy link
Member Author

Applied the suggested fixes, confirmed that it still works:

image

Copy link
Contributor

@hectcastro hectcastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had pretty much the same two questions. Since I didn't have to write them out, I used that bandwidth to create an issue to remember to upgrade NumPy (#3093). 👍

Copy link
Contributor

@mmcfarland mmcfarland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates. I ran through the test script and also was able to multi-year model after rebuilding.

@mmcfarland mmcfarland assigned rajadain and unassigned mmcfarland Feb 22, 2019
@rajadain rajadain merged commit 48d5ca0 into release/1.24.2 Feb 26, 2019
@rajadain rajadain deleted the tt/circumvent-gwlfe-segfaults branch February 26, 2019 22:48
caseycesari added a commit that referenced this pull request Aug 14, 2019
Undos work done in #3090 to get around an issue with numpy,
and installs an updated version of numpy that is compatible
with ulmo and other dependencies.

Refs #3093
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
NSF Funding Source: National Science Foundation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants