Skip to content

Commit

Permalink
Merge branch 'master' of https://github.com/ray-project/ray into traj…
Browse files Browse the repository at this point in the history
…ectory_view_api_enable_by_default_for_some_tf
  • Loading branch information
sven1977 committed Nov 9, 2020
2 parents 03b558c + 407a212 commit 2786d7c
Show file tree
Hide file tree
Showing 147 changed files with 4,057 additions and 1,392 deletions.
80 changes: 80 additions & 0 deletions .github/stale.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
# Configuration for probot-stale - https://github.com/probot/stale

# Number of days of inactivity before an Issue or Pull Request becomes stale
daysUntilStale: 120

# Number of days of inactivity before an Issue or Pull Request with the stale label is closed.
# Set to false to disable. If disabled, issues still need to be closed manually, but will remain marked as stale.
daysUntilClose: 14

# Only issues or pull requests with all of these labels are check if stale. Defaults to `[]` (disabled)
onlyLabels: []

# Issues or Pull Requests with these labels will never be considered stale. Set to `[]` to disable
exemptLabels:
- P0
- P1
- P2
- P3
- good first issue
- release-blocker
- fix-docs
- regression

# Set to true to ignore issues in a project (defaults to false)
exemptProjects: false

# Set to true to ignore issues in a milestone (defaults to false)
exemptMilestones: true

# Set to true to ignore issues with an assignee (defaults to false)
exemptAssignees: false

# Label to use when marking as stale
staleLabel: stale

# Comment to post when marking as stale. Set to `false` to disable
markComment: >
Hi, I'm a bot from the Ray team :)
To help human contributors to focus on more relevant issues, I will automatically add the stale label to issues that have had no activity for more than 4 months.
If there is no further activity in the 14 days, the issue will be closed!
- If you'd like to keep the issue open, just leave any comment, and the stale label will be removed!
- If you'd like to get more attention to the issue, please tag one of Ray's contributors.
You can always ask for help on our `discussion forum <https://discuss.ray.io/>`_ or `Ray's public slack channel <https://github.com/ray-project/ray#getting-involved>`_.
# Comment to post when removing the stale label.
# unmarkComment: >
# Your comment here.

# Comment to post when closing a stale Issue or Pull Request.
closeComment: >
Hi again! The issue will be closed because there has been no more activity in the 14 days since the last message.
Please feel free to reopen or open a new issue if you'd still like it to be addressed.
Again, you can always ask for help on our `discussion forum <https://discuss.ray.io/>`_ or `Ray's public slack channel <https://github.com/ray-project/ray#getting-involved>`_.
Thanks again for opening the issue!
# Limit the number of actions per hour, from 1-30. Default is 30
# It will check 120 issues per day.
limitPerRun: 5

# Limit to only `issues` or `pulls`
only: issues

# Optionally, specify configuration settings that are specific to just 'issues' or 'pulls':
# pulls:
# daysUntilStale: 30
# markComment: >
# This pull request has been automatically marked as stale because it has not had
# recent activity. It will be closed if no further activity occurs. Thank you
# for your contributions.

# issues:
# exemptLabels:
# - confirmed
1 change: 1 addition & 0 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -1861,6 +1861,7 @@ filegroup(
"//src/ray/protobuf:gcs_py_proto",
"//src/ray/protobuf:gcs_service_py_proto",
"//src/ray/protobuf:node_manager_py_proto",
"//src/ray/protobuf:ray_client_py_proto",
"//src/ray/protobuf:reporter_py_proto",
],
)
Expand Down
2 changes: 1 addition & 1 deletion ci/travis/install-bazel.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
#!/usr/bin/env bash

set -x
set -euo pipefail

ROOT_DIR=$(cd "$(dirname "${BASH_SOURCE:-$0}")"; pwd)
Expand Down
57 changes: 53 additions & 4 deletions doc/source/cluster/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,13 @@ The command will print out the address of the Redis server that was started
``<address>`` with the value printed by the command on the head node (it
should look something like ``123.45.67.89:6379``).

Note that if your compute nodes are on their own subnetwork with Network
Address Translation, to connect from a regular machine outside that subnetwork,
the command printed by the head node will not work. You need to find the
address that will reach the head node from the second machine. If the head node
has a domain address like compute04.berkeley.edu, you can simply use that in
place of an IP address and rely on the DNS.

.. code-block:: bash
$ ray start --address=<address> --redis-password='<password>'
Expand All @@ -115,11 +122,53 @@ should look something like ``123.45.67.89:6379``).
If you wish to specify that a machine has 10 CPUs and 1 GPU, you can do this
with the flags ``--num-cpus=10`` and ``--num-gpus=1``. See the :ref:`Configuration <configuring-ray>` page for more information.

If you see ``Unable to connect to Redis. If the Redis instance is on a
different machine, check that your firewall is configured properly.``,
this means the ``--port`` is inaccessible at the given IP address (because, for
example, the head node is not actually running Ray, or you have the wrong IP
address).

If you see ``Ray runtime started.``, then the node successfully connected to
the ``<address>``. If the ``<address>`` is inaccessible (because, for example,
the head node is not actually running), then you will get an error such as
``Unable to connect to Redis. If the Redis instance is on a different machine,
check that your firewall is configured properly.``
the IP address at the ``--port``. You should now be able to connect to the
cluster with ``ray.init(address='auto')``.

If ``ray.init(address='auto')`` keeps repeating
``redis_context.cc:303: Failed to connect to Redis, retrying.``, then the node
is failing to connect to some other port(s) besides the main port.

.. code-block:: bash
If connection fails, check your firewall settings and network configuration.
If the connection fails, to check whether each port can be reached from a node,
you can use a tool such as ``nmap`` or ``nc``.

.. code-block:: bash
$ nmap -sV --reason -p $PORT $HEAD_ADDRESS
Nmap scan report for compute04.berkeley.edu (123.456.78.910)
Host is up, received echo-reply ttl 60 (0.00087s latency).
rDNS record for 123.456.78.910: compute04.berkeley.edu
PORT STATE SERVICE REASON VERSION
6379/tcp open redis syn-ack ttl 60 Redis key-value store
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
$ nc -vv -z $HEAD_ADDRESS $PORT
Connection to compute04.berkeley.edu 6379 port [tcp/*] succeeded!
If the node cannot access that port at that IP address, you might see

.. code-block:: bash
$ nmap -sV --reason -p $PORT $HEAD_ADDRESS
Nmap scan report for compute04.berkeley.edu (123.456.78.910)
Host is up (0.0011s latency).
rDNS record for 123.456.78.910: compute04.berkeley.edu
PORT STATE SERVICE REASON VERSION
6379/tcp closed redis reset ttl 60
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
$ nc -vv -z $HEAD_ADDRESS $PORT
nc: connect to compute04.berkeley.edu port 6379 (tcp) failed: Connection refused
Stopping Ray
~~~~~~~~~~~~
Expand Down
42 changes: 41 additions & 1 deletion doc/source/configure.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,9 +135,49 @@ Head Node
In addition to ports specified above, the head node needs to open several more ports.

- ``--port``: Port of GCS. Default: 6379.
- ``--dashboard-port``: Port for accessing the dashboard. Default: 8265
- ``--redis-shard-ports``: Comma-separated list of ports for non-primary Redis shards. Default: Random values.
- ``--gcs-server-port``: GCS Server port. GCS server is a stateless service that is in charge of communicating with the GCS. Default: Random value.

- If ``--include-dashboard`` is true (the default), then the head node must open ``--dashboard-port``. Default: 8265.

If ``--include-dashboard`` is true but the ``--dashboard-port`` is not open on
the head node, you will repeatedly get

.. code-block:: bash
WARNING worker.py:1114 -- The agent on node <hostname of node that tried to run a task> failed with the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/grpc/aio/_call.py", line 285, in __await__
raise _create_rpc_error(self._cython_call._initial_metadata,
grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "failed to connect to all addresses"
debug_error_string = "{"description":"Failed to pick subchannel","file":"src/core/ext/filters/client_channel/client_channel.cc","file_line":4165,"referenced_errors":[{"description":"failed to connect to all addresses","file":"src/core/ext/filters/client_channel/lb_policy/pick_first/pick_first.cc","file_line":397,"grpc_status":14}]}"
(Also, you will not be able to access the dashboard.)
If you see that error, check whether the ``--dashboard-port`` is accessible
with ``nc`` or ``nmap`` (or your browser).
.. code-block:: bash
$ nmap -sV --reason -p 8265 $HEAD_ADDRESS
Nmap scan report for compute04.berkeley.edu (123.456.78.910)
Host is up, received reset ttl 60 (0.00065s latency).
rDNS record for 123.456.78.910: compute04.berkeley.edu
PORT STATE SERVICE REASON VERSION
8265/tcp open http syn-ack ttl 60 aiohttp 3.7.2 (Python 3.8)
Service detection performed. Please report any incorrect results at https://nmap.org/submit/ .
Note that the dashboard runs as a separate subprocess which can crash invisibly
in the background, so even if you checked port 8265 earlier, the port might be
closed *now* (for the prosaic reason that there is no longer a service running
on it). This also means that if that port is unreachable, if you ``ray stop``
and ``ray start``, it may become reachable again due to the dashboard
restarting.
If you don't want the dashboard, set ``--include-dashboard=false``.
Redis Port Authentication
-------------------------
Expand Down
1 change: 1 addition & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,7 @@ Papers
:caption: Ray Observability
ray-metrics.rst
ray-debugging.rst
.. toctree::
:hidden:
Expand Down
11 changes: 11 additions & 0 deletions doc/source/package-ref.rst
Original file line number Diff line number Diff line change
Expand Up @@ -196,6 +196,13 @@ Histogram
.. autoclass:: ray.util.metrics.Histogram
:members:

.. _package-ref-debugging-apis:

Debugger APIs
-------------

.. autofunction:: ray.util.pdb.set_trace

Experimental APIs
-----------------

Expand Down Expand Up @@ -271,3 +278,7 @@ The Ray Command Line API
.. click:: ray.scripts.scripts:timeline
:prog: ray timeline
:show-nested:

.. click:: ray.scripts.scripts:debug
:prog: ray debug
:show-nested:
Loading

0 comments on commit 2786d7c

Please sign in to comment.