Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Updates #308

Merged
merged 13 commits into from
Mar 1, 2017
Merged

Docker Updates #308

merged 13 commits into from
Mar 1, 2017

Conversation

jssmith
Copy link
Contributor

@jssmith jssmith commented Feb 22, 2017

This change helps improve Docker usability:

  • Fixes broken paths in Dockerfile
  • Use git archive to export Ray sources to container
  • start_ray.sh does not exit when used within Docker
  • Updated "Install on Docker" instructions
  • Created "Using Ray on a Docker Cluster" instructions

Note that this change does not include presently linking to Docker instructions from the top-level README. This is something we should do as soon as we have confidence the instructions work smoothly for people.

@robertnishihara
Copy link
Collaborator

We should link to these instructions from README.md, right?

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/69/
Test PASSed.

@jssmith
Copy link
Contributor Author

jssmith commented Feb 22, 2017

@robertnishihara, ah, I was just updating the description. I'd like to see a few people try it before we link to the README.md, but perhaps we can pass that mark in this code review cycle.

Copy link
Collaborator

@robertnishihara robertnishihara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I haven't had a chance to try out the cluster version yet.

You can install Ray on any platform that runs Docker. We do not presently publish Docker images for Ray, but you can build them yourself using the Ray distribution. Using Docker can provide a reliable way to get up and running quickly.
You can install Ray on any platform that runs Docker. We do not presently publish Docker images for Ray, but you can build them yourself using the Ray distribution.

Using Docker can streamline the build process reliable way to get up and running quickly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the build process in a reliable way"?

The Docker Platform release is available for Mac, Windows, and Linux platforms. Please download the appropriate version from the [Docker website](https://www.docker.com/products/overview#/install_the_platform).
### Mac, Linux, Windows platforms

The Docker Platform release is available for Mac, Windows, and Linux platforms. Please download the appropriate version from the [Docker website](https://www.docker.com/products/overview#/install_the_platform) and follow the corresponding installation instructions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember thinking/hearing that the official docker installation instructions for Ubuntu are bad and that the ones to follow are here https://www.digitalocean.com/community/tutorials/how-to-install-and-use-docker-on-ubuntu-16-04

Have you ever had any trouble with the official docker instructions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the official Linux instructions are not very relevant to our use case. I'll add a link to the alternate instructions.


The Docker Platform release is available for Mac, Windows, and Linux platforms. Please download the appropriate version from the [Docker website](https://www.docker.com/products/overview#/install_the_platform) and follow the corresponding installation instructions.

### Docker installation on EC2 with Ubuntu
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These instructions are different from the ones on the Docker website. E.g., the official Docker instructions do not do sudo apt-get -y dist-upgrade.

Why the difference?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The official Linux instructions seem to focus on connecting to Dockerhub and getting a repository set up. These are simple instructions to get Docker installed and to get it to run without sudo.

The dist-upgrade command here is likely optional. It' something I always do just so I'm running the latest patched version of the OS. I know things have at times broken without it but I don't remember whether it is necessary right now. I think it's probably safest to leave it in here.


## Launch Ray in Docker

Start out by launching the deployment container.

```
docker run --shm-size=1024m -t -i ray-project/ray:deploy
docker run --shm-size=<shm-size> -t -i ray-project/ray:deploy
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This command doesn't work, I think you need to remove the ray: and just use ray-project/deploy

## Test if the installation succeeded

To test if the installation was successful, try running some tests.
To test if the installation was successful, try running some tests. Within the container shell enter the following commands:

```
python test/runtest.py # This tests basic functionality.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried this and it told me

Traceback (most recent call last):
  File "test/runtest.py", line 127, in <module>
    raise Exception("You have an older version of cloudpickle that is not able to serialize namedtuples. Try running \n\n{}\n\n".format(cloudpickle_command))
Exception: You have an older version of cloudpickle that is not able to serialize namedtuples. Try running 

pip install --upgrade cloudpickle

We should make sure that we're installing the correct version of cloudpickle in the docker image. For that it should be sufficient to do pip install cloudpickle somewhere.

FROM ray-project/deploy
RUN git clone https://github.com/my-user/my-project.git
RUN ./my-project/install.sh
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this example. What is my-project supposed to be? And where would the installation of Ray happen?

<repository-uri> \
/ray/scripts/start_ray.sh --head \
--redis-port=6379 \
--num-workers=<num-workers>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also have people use a fixed object store port?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both here and on the worker nodes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a port number we have standardized on?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not yet. Maybe just choose one? E.g., 7183.

@@ -59,7 +126,7 @@ docker run --shm-size=1024m -t -i ray-project/ray:examples

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the command above

docker run --shm-size=1024m -t -i ray-project/ray:examples

should be

docker run --shm-size=1024m -t -i ray-project/examples

@@ -10,5 +10,5 @@ RUN echo 'export PATH=/opt/conda/bin:$PATH' > /etc/profile.d/conda.sh \
&& /bin/bash /tmp/anaconda.sh -b -p /opt/conda \
&& rm /tmp/anaconda.sh
ENV PATH "/opt/conda/bin:$PATH"
RUN conda install libgcc
RUN conda install -y libgcc
RUN pip install --upgrade pip
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want do pip install some stuff (at least cloudpickle) here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add cloudpickle.

@@ -68,7 +135,7 @@ See the [Hyperparameter optimization documentation](../examples/hyperopt/README.
### Batch L-BFGS

```
cd ~/ray/examples/lbfgs/
cd /ray/examples/lbfgs/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run this example I get a lot of errors like the following.

W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
[INFO] (/ray/src/photon/photon_scheduler.c:202) Started worker with pid 4463
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at forums a bit and these warnings seem to come from some recent changes in Tensorflow. I don't believe this is related to Ray or Docker and it something we may want to come back to.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/123/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/124/
Test PASSed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/127/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/128/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/130/
Test FAILed.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/132/
Test FAILed.

@robertnishihara
Copy link
Collaborator

Btw, I ran the docker instructions without the sudo apt-get -y dist-upgrade and it worked fine.

@robertnishihara robertnishihara merged commit ad4b03b into ray-project:master Mar 1, 2017
@robertnishihara robertnishihara deleted the fixdocker2 branch March 1, 2017 02:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants