[MXNET-311] Change the docker image for Installation Guide Test - needs sudo #10510

mbaijal · 2018-04-11T18:51:15Z

Description

The installation guide test used to pull a Cuda docker image from dockerhub and run tests inside this container. Earlier this test was running in cuda7.5 containers which had 'sudo' installed inside it. I changed this to cuda9 a couple of days back.

However, the latest cuda docker images - nvidia/cuda:9.0-cudnn7-devel and nvidia/cuda:8.0-cudnn5-devel do not have sudo installed which is needed by the scripts in the mxnet installation page.

Hence I have created a simple docker image on the host called mxnet/base-cuda9 which only has the following -

nvidia/cuda:9.0-cudnn7-devel
sudo

Please Note:

These tests are going to run on the official MXNet CI soon and these docker images will not be needed at that point since I plan to fix the script itself.
The dockerfile and docker Image are only present on the Jenkins setup where this nightly test is running. Please add a comment below if you would like me to add this image to the MXNet dockerhub repo.

@anirudh2290 This needs to go into 1.2 to make a nightly test pass.
@gautamkmr @b0noI Can you please review!

Checklist

Essentials

Changes are complete (i.e. I finished coding on this PR)
All changes have test coverage:
Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
Code is well-documented:
For user-facing API changes, API doc string has been updated.
For new C++ functions in header files, their functionalities and arguments are documented.
For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
Check the API doc at http://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

Created a new docker image on the slave where these nightly tests run, I can publish this image to dockerhub if needed.
Changed the installation guide GPU tests to run in this new docker container

Comments

If this change is a backward incompatible change, why must this change be made.
Interesting edge cases to note here

goswamig

If you are putting the docker in official mxnet docker hub. People might get confused, with base-cuda9.

Try changing the name but its up to you.

mbaijal · 2018-04-11T19:03:19Z

I don't think pushing to dockerhub is needed but if there is a request, I will change the name and create a new PR.

anirudh2290 · 2018-04-11T19:55:48Z

tests/jenkins/run_test_installation_docs.sh

 echo
 echo "### Testing Virtualenv ###"
 echo "${virtualenv_commands}"
 echo
-nvidia-docker run --rm nvidia/cuda:9.0-cudnn7-devel bash -c "${virtualenv_commands}"
+nvidia-docker run --rm mxnet/base-cuda9 bash -c "${virtualenv_commands}"


I think it would be more maintainable if we push this docker image for nightly tests to the docker registry (maybe with a different name as @gautamkmr suggested). The release doesn't have to block on this though.

Where can I see the contents of this image? Also it would be good to keep consistency with nvidia images (I assume they are taken as base anyway) and name and branch them with minor versions of cuda: mxnet/base:cuda9.0 mxnet/base:cuda9.1

marcoabreu

Could you elaborate why we need to maintain two parallel solutions and environments? We've already taken care of problems like these in the docker images that are running in our CI - why not just use those?

marcoabreu · 2018-04-11T21:57:59Z

Ah, is it because you're trying to replicate what we're basically doing in our containers, just from a users point of view and that's why a pretty much empty docker container is required?

marcoabreu · 2018-04-11T22:01:02Z

I think pushing to dockerhub could cause confusion since this is basically an empty cuda9 container and is not really related to mxnet itself. Why can't we just create a simple Dockerfile (exactly what you did in order to create that image locally), put it here and re-use it instead of relying on the local cache or the dockerhub?

non-blocking

mbaijal · 2018-04-11T22:04:54Z

Currently, this change is to run the installation guide test locally for the 1.2 release.
Yes as you pointed out, it should be an (almosyt) empty container for this test similar to the user and hence I do not want to test it on the existing containers.

As a part of migration to new CI, all of these tests will use the docker images available in this repo and I will add a new one to the ci folder if needed.

mbaijal · 2018-04-11T22:12:50Z

I will add a dockerfile to the repo for this test but only during migration. Doing so now would require temporary changes to the test script to build the docker image. There is no point of doing that when we move to using the pipeline soon.

marcoabreu · 2018-04-11T22:14:30Z

I see, thanks for elaborating. So just to clarify: This means if a new slave for the internal CI gets deployed or if the docker cache gets cleaned, this task is going to fail, right?

Fine with me as a temporary solution, we just have to be aware of that risk.

mbaijal · 2018-04-11T22:18:25Z

Correct, it is far from ideal. But hoping that this CI setup will be deprecated soon and 1.2 is the last release happening on this one. So we just need to run this test once to validate the release branch.

marcoabreu · 2018-04-11T22:38:10Z

Alrighty, fine with me :)

Could you please add a jira ticket before we merge?

anirudh2290 · 2018-04-12T00:30:24Z

@marcoabreu jira is added. Can we merge this ?

…apache#10510)

* Remove Fermi from cmake (#10486) * updated R docs (#10473) * [MXNET-120] Float16 support for distributed training (#10183) * send as char * fix bug on pull response, and rowsparse on worker side * three modes * default to mode 0 and add support for row sparse * refactor sparse * rowsparse numbytes fixes * WIP tests * update test sync * remove prints * refactoring * Revert "refactoring" This reverts commit 05ffa1b. * undo refactoring to keep PR simple * add wait to stored in pull default * lint fixes * undo static cast for recvblob * lint fixes * mode 1 changes * sparse bug fix dtype * mshadow default * remove unused var * remove debug statements * clearer variables, reduced multiplication, const vars * add const for more vars, comments * comment syntax, code watcher, test default val * remove unnecessary print in test * trigger ci * multi precision mode (debugging race condition) * working rsp pushes * finish multiprecision for row sparse * rename num-bytes * fix bug due to rename of numbytes, and remove debug logs * address comments * add integration test * trigger ci * integration test * integration test * fix path of script * update mshadow * disable f16c for amalgamation * fix amalgamation build * trigger ci * disable f16c for jetson * Fix rat excludes (#10499) * MXNET-308 added missing license (#10497) * refactored example (#10484) * [MXNET-298] Scala Infer API docs landing page (#10474) * changed url references from dmlc to apache/incubator-mxnet * prepping scala landing pages * infer api info added * Fix infer storage type (#10507) * Fix infer_storage_type * Add test * Fix lint * Trigger CI * [MXNET-306] Add slice_like operator (#10491) * add slice_like and doc * pass unittest and lint * Minor simplifications in ci/build.py (#10496) * [MXNET-305] Scala tutorial table fix (#10488) * initial update on setting up scala ide with mxnet * moving images to web-data project * updated links to images; added readme for root folder * scala hello world feature added * workaround for make transitive error * fixed systempath * minor updates * table fix * added some spacing * more spacing * added ability to set search path for Accelerate library * [MXNET-311] change test needs a docker with sudo, hence image changed (#10510) * added new metric * changed back from branch to upstream * lint changed * Fixed typo * Clarified interpretation * Changes for variable names * fixed variable names * replay the unit tests * changed comment

…e#10524) * Remove Fermi from cmake (apache#10486) * updated R docs (apache#10473) * [MXNET-120] Float16 support for distributed training (apache#10183) * send as char * fix bug on pull response, and rowsparse on worker side * three modes * default to mode 0 and add support for row sparse * refactor sparse * rowsparse numbytes fixes * WIP tests * update test sync * remove prints * refactoring * Revert "refactoring" This reverts commit 05ffa1b. * undo refactoring to keep PR simple * add wait to stored in pull default * lint fixes * undo static cast for recvblob * lint fixes * mode 1 changes * sparse bug fix dtype * mshadow default * remove unused var * remove debug statements * clearer variables, reduced multiplication, const vars * add const for more vars, comments * comment syntax, code watcher, test default val * remove unnecessary print in test * trigger ci * multi precision mode (debugging race condition) * working rsp pushes * finish multiprecision for row sparse * rename num-bytes * fix bug due to rename of numbytes, and remove debug logs * address comments * add integration test * trigger ci * integration test * integration test * fix path of script * update mshadow * disable f16c for amalgamation * fix amalgamation build * trigger ci * disable f16c for jetson * Fix rat excludes (apache#10499) * MXNET-308 added missing license (apache#10497) * refactored example (apache#10484) * [MXNET-298] Scala Infer API docs landing page (apache#10474) * changed url references from dmlc to apache/incubator-mxnet * prepping scala landing pages * infer api info added * Fix infer storage type (apache#10507) * Fix infer_storage_type * Add test * Fix lint * Trigger CI * [MXNET-306] Add slice_like operator (apache#10491) * add slice_like and doc * pass unittest and lint * Minor simplifications in ci/build.py (apache#10496) * [MXNET-305] Scala tutorial table fix (apache#10488) * initial update on setting up scala ide with mxnet * moving images to web-data project * updated links to images; added readme for root folder * scala hello world feature added * workaround for make transitive error * fixed systempath * minor updates * table fix * added some spacing * more spacing * added ability to set search path for Accelerate library * [MXNET-311] change test needs a docker with sudo, hence image changed (apache#10510) * added new metric * changed back from branch to upstream * lint changed * Fixed typo * Clarified interpretation * Changes for variable names * fixed variable names * replay the unit tests * changed comment

Docs test needs a docker with sudo, hence image changed

bd409a5

mbaijal changed the title ~~[WIP] Change the docker image for Installation Guide Test - needs sudo~~ Change the docker image for Installation Guide Test - needs sudo Apr 11, 2018

b0noI approved these changes Apr 11, 2018

View reviewed changes

goswamig approved these changes Apr 11, 2018

View reviewed changes

anirudh2290 reviewed Apr 11, 2018

View reviewed changes

marcoabreu previously requested changes Apr 11, 2018

View reviewed changes

anirudh2290 approved these changes Apr 11, 2018

View reviewed changes

marcoabreu approved these changes Apr 11, 2018

View reviewed changes

mbaijal changed the title ~~Change the docker image for Installation Guide Test - needs sudo~~ [MXNET-311] Change the docker image for Installation Guide Test - needs sudo Apr 12, 2018

marcoabreu merged commit ceb810c into apache:master Apr 12, 2018

dabraude pushed a commit to dabraude/incubator-mxnet that referenced this pull request Apr 12, 2018

[MXNET-311] change test needs a docker with sudo, hence image changed (…

e8d6b8d

…apache#10510)

rahul003 pushed a commit to rahul003/mxnet that referenced this pull request Jun 4, 2018

[MXNET-311] change test needs a docker with sudo, hence image changed (…

42955d6

…apache#10510)

zheng-da pushed a commit to zheng-da/incubator-mxnet that referenced this pull request Jun 28, 2018

[MXNET-311] change test needs a docker with sudo, hence image changed (…

61a9e36

…apache#10510)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MXNET-311] Change the docker image for Installation Guide Test - needs sudo #10510

[MXNET-311] Change the docker image for Installation Guide Test - needs sudo #10510

mbaijal commented Apr 11, 2018 •

edited

Loading

goswamig left a comment

mbaijal commented Apr 11, 2018

anirudh2290 Apr 11, 2018 •

edited

Loading

lebeg Apr 12, 2018

marcoabreu left a comment

marcoabreu commented Apr 11, 2018

marcoabreu commented Apr 11, 2018 •

edited

Loading

mbaijal commented Apr 11, 2018

mbaijal commented Apr 11, 2018

marcoabreu commented Apr 11, 2018

mbaijal commented Apr 11, 2018

marcoabreu commented Apr 11, 2018

anirudh2290 commented Apr 12, 2018

[MXNET-311] Change the docker image for Installation Guide Test - needs sudo #10510

[MXNET-311] Change the docker image for Installation Guide Test - needs sudo #10510

Conversation

mbaijal commented Apr 11, 2018 • edited Loading

Description

Checklist

Essentials

Changes

Comments

goswamig left a comment

Choose a reason for hiding this comment

mbaijal commented Apr 11, 2018

anirudh2290 Apr 11, 2018 • edited Loading

Choose a reason for hiding this comment

lebeg Apr 12, 2018

Choose a reason for hiding this comment

marcoabreu left a comment

Choose a reason for hiding this comment

marcoabreu commented Apr 11, 2018

marcoabreu commented Apr 11, 2018 • edited Loading

mbaijal commented Apr 11, 2018

mbaijal commented Apr 11, 2018

marcoabreu commented Apr 11, 2018

mbaijal commented Apr 11, 2018

marcoabreu commented Apr 11, 2018

anirudh2290 commented Apr 12, 2018

mbaijal commented Apr 11, 2018 •

edited

Loading

anirudh2290 Apr 11, 2018 •

edited

Loading

marcoabreu commented Apr 11, 2018 •

edited

Loading