Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add password authentication to Redis ports #2952

Merged
merged 27 commits into from
Oct 17, 2018
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,7 @@ matrix:

- python -m pytest -v python/ray/test/test_global_state.py
- python -m pytest -v python/ray/test/test_queue.py
- python -m pytest -v python/ray/test/test_ray_init.py
- python -m pytest -v test/xray_test.py

- python -m pytest -v test/runtest.py
Expand Down Expand Up @@ -208,6 +209,7 @@ script:

- python -m pytest -v python/ray/test/test_global_state.py
- python -m pytest -v python/ray/test/test_queue.py
- python -m pytest -v python/ray/test/test_ray_init.py
- python -m pytest -v test/xray_test.py

- python -m pytest -v test/runtest.py
Expand Down
1 change: 1 addition & 0 deletions doc/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -135,6 +135,7 @@ Ray comes with libraries that accelerate deep learning and reinforcement learnin

troubleshooting.rst
user-profiling.rst
security.rst
development.rst
profiling.rst
contact.rst
55 changes: 55 additions & 0 deletions doc/source/security.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
Security
========

This document describes best security practices for using Ray.

Intended Use and Threat Model
-----------------------------

Ray instances should run on a secure network without public facing ports.
The most common threat for Ray instances is unauthorized access to Redis,
which can be exploited to gain shell access and run arbitray code.
The best fix is to run Ray instances on a secure, trusted network.

Running Ray on a secured network is not always feasible, so Ray
provides some basic security features:


Redis Port Authentication
-------------------------

To prevent exploits via unauthorized Redis access, Ray provides the option to
password-protect Redis ports. While this is not a replacement for running Ray
behind a firewall, this feature is useful for instances exposed to the internet
where configuring a firewall is not possible. Because Redis is
very fast at serving queries, the chosen password should be long.

Redis authentication is only supported on the raylet code path.

To add authentication via the Python API, start Ray using:

.. code-block:: python

ray.init(redis_password="password")

To add authentication via the CLI, or connect to an existing Ray instance with
password-protected Redis ports:

.. code-block:: bash

ray start [--head] --redis-password="password"

While Redis port authentication may protect against external attackers,
Ray does not encrypt traffic between nodes so man-in-the-middle attacks are
possible for clusters on untrusted networks.

Cloud Security
--------------

Launching Ray clusters on AWS or GCP using the ``ray up`` command
automatically configures security groups that prevent external Redis access.

References
----------

- The `Redis security documentation <https://redis.io/topics/security>`
9 changes: 7 additions & 2 deletions python/ray/experimental/state.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ def _check_connected(self):
def _initialize_global_state(self,
redis_ip_address,
redis_port,
redis_password=None,
timeout=20):
"""Initialize the GlobalState object by connecting to Redis.

Expand All @@ -89,9 +90,10 @@ def _initialize_global_state(self,
redis_ip_address: The IP address of the node that the Redis server
lives on.
redis_port: The port that the Redis server is listening on.
redis_password: The password of the redis server.
"""
self.redis_client = redis.StrictRedis(
host=redis_ip_address, port=redis_port)
host=redis_ip_address, port=redis_port, password=redis_password)

start_time = time.time()

Expand Down Expand Up @@ -143,7 +145,10 @@ def _initialize_global_state(self,
for ip_address_port in ip_address_ports:
shard_address, shard_port = ip_address_port.split(b":")
self.redis_clients.append(
redis.StrictRedis(host=shard_address, port=shard_port))
redis.StrictRedis(
host=shard_address,
port=shard_port,
password=redis_password))

def _execute_command(self, key, *args):
"""Execute a Redis command on the appropriate Redis shard based on key.
Expand Down
21 changes: 17 additions & 4 deletions python/ray/log_monitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,15 @@ class LogMonitor(object):
handle for that file.
"""

def __init__(self, redis_ip_address, redis_port, node_ip_address):
def __init__(self,
redis_ip_address,
redis_port,
node_ip_address,
redis_password=None):
"""Initialize the log monitor object."""
self.node_ip_address = node_ip_address
self.redis_client = redis.StrictRedis(
host=redis_ip_address, port=redis_port)
host=redis_ip_address, port=redis_port, password=redis_password)
self.log_files = {}
self.log_file_handles = {}
self.files_to_ignore = set()
Expand Down Expand Up @@ -130,6 +134,12 @@ def run(self):
required=True,
type=str,
help="The IP address of the node this process is on.")
parser.add_argument(
"--redis-password",
required=False,
type=str,
default=None,
help="the password to use for Redis")
parser.add_argument(
"--logging-level",
required=False,
Expand All @@ -151,6 +161,9 @@ def run(self):
redis_ip_address = get_ip_address(args.redis_address)
redis_port = get_port(args.redis_address)

log_monitor = LogMonitor(redis_ip_address, redis_port,
args.node_ip_address)
log_monitor = LogMonitor(
redis_ip_address,
redis_port,
args.node_ip_address,
redis_password=args.redis_password)
log_monitor.run()
27 changes: 22 additions & 5 deletions python/ray/monitor.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,13 +70,18 @@ class Monitor(object):
managers that were up at one point and have died since then.
"""

def __init__(self, redis_address, redis_port, autoscaling_config):
def __init__(self,
redis_address,
redis_port,
autoscaling_config,
redis_password=None):
# Initialize the Redis clients.
self.state = ray.experimental.state.GlobalState()
self.state._initialize_global_state(redis_address, redis_port)
self.state._initialize_global_state(
redis_address, redis_port, redis_password=redis_password)
self.use_raylet = self.state.use_raylet
self.redis = redis.StrictRedis(
host=redis_address, port=redis_port, db=0)
host=redis_address, port=redis_port, db=0, password=redis_password)
# Setup subscriptions to the primary Redis server and the Redis shards.
self.primary_subscribe_client = self.redis.pubsub(
ignore_subscribe_messages=True)
Expand Down Expand Up @@ -118,7 +123,9 @@ def __init__(self, redis_address, redis_port, autoscaling_config):
else:
addr_port = addr_port[0].split(b":")
self.redis_shard = redis.StrictRedis(
host=addr_port[0], port=addr_port[1])
host=addr_port[0],
port=addr_port[1],
password=redis_password)
try:
self.redis_shard.execute_command("HEAD.FLUSH 0")
except redis.exceptions.ResponseError as e:
Expand Down Expand Up @@ -773,6 +780,12 @@ def run(self):
required=False,
type=str,
help="the path to the autoscaling config file")
parser.add_argument(
"--redis-password",
required=False,
type=str,
default=None,
help="the password to use for Redis")
parser.add_argument(
"--logging-level",
required=False,
Expand All @@ -798,7 +811,11 @@ def run(self):
else:
autoscaling_config = None

monitor = Monitor(redis_ip_address, redis_port, autoscaling_config)
monitor = Monitor(
redis_ip_address,
redis_port,
autoscaling_config,
redis_password=args.redis_password)

try:
monitor.run()
Expand Down
39 changes: 29 additions & 10 deletions python/ray/scripts/scripts.py
Original file line number Diff line number Diff line change
Expand Up @@ -89,6 +89,11 @@ def cli(logging_level, logging_format):
type=int,
help=("If provided, attempt to configure Redis with this "
"maximum number of clients."))
@click.option(
"--redis-password",
required=False,
type=str,
help="If provided, secure Redis ports with this password")
@click.option(
"--redis-shard-ports",
required=False,
Expand Down Expand Up @@ -190,10 +195,11 @@ def cli(logging_level, logging_format):
default=None,
help="manually specify the root temporary dir of the Ray process")
def start(node_ip_address, redis_address, redis_port, num_redis_shards,
redis_max_clients, redis_shard_ports, object_manager_port,
object_store_memory, num_workers, num_cpus, num_gpus, resources,
head, no_ui, block, plasma_directory, huge_pages, autoscaling_config,
use_raylet, no_redirect_worker_output, no_redirect_output,
redis_max_clients, redis_password, redis_shard_ports,
object_manager_port, object_store_memory, num_workers, num_cpus,
num_gpus, resources, head, no_ui, block, plasma_directory,
huge_pages, autoscaling_config, use_raylet,
no_redirect_worker_output, no_redirect_output,
plasma_store_socket_name, raylet_socket_name, temp_dir):
# Convert hostnames to numerical IP address.
if node_ip_address is not None:
Expand All @@ -205,6 +211,11 @@ def start(node_ip_address, redis_address, redis_port, num_redis_shards,
# This environment variable is used in our testing setup.
logger.info("Detected environment variable 'RAY_USE_XRAY'.")
use_raylet = True
if not use_raylet and redis_password is not None:
raise Exception("Setting the 'redis-password' argument is not "
"supported in legacy Ray. To run Ray with "
"password-protected Redis ports, pass "
"the '--use-raylet' flag.")

try:
resources = json.loads(resources)
Expand Down Expand Up @@ -269,6 +280,7 @@ def start(node_ip_address, redis_address, redis_port, num_redis_shards,
num_redis_shards=num_redis_shards,
redis_max_clients=redis_max_clients,
redis_protected_mode=False,
redis_password=redis_password,
include_webui=(not no_ui),
plasma_directory=plasma_directory,
huge_pages=huge_pages,
Expand All @@ -281,16 +293,20 @@ def start(node_ip_address, redis_address, redis_port, num_redis_shards,
logger.info(
"\nStarted Ray on this node. You can add additional nodes to "
"the cluster by calling\n\n"
" ray start --redis-address {}\n\n"
" ray start --redis-address {}{}{}\n\n"
"from the node you wish to add. You can connect a driver to the "
"cluster from Python by running\n\n"
" import ray\n"
" ray.init(redis_address=\"{}\")\n\n"
" ray.init(redis_address=\"{}{}{}\")\n\n"
"If you have trouble connecting from a different machine, check "
"that your firewall is configured properly. If you wish to "
"terminate the processes that have been started, run\n\n"
" ray stop".format(address_info["redis_address"],
address_info["redis_address"]))
" ray stop".format(
address_info["redis_address"], " --redis-password "
if redis_password else "", redis_password if redis_password
else "", address_info["redis_address"], "\", redis_password=\""
if redis_password else "", redis_password
if redis_password else ""))
else:
# Start Ray on a non-head node.
if redis_port is not None:
Expand All @@ -315,10 +331,12 @@ def start(node_ip_address, redis_address, redis_port, num_redis_shards,

# Wait for the Redis server to be started. And throw an exception if we
# can't connect to it.
services.wait_for_redis_to_start(redis_ip_address, int(redis_port))
services.wait_for_redis_to_start(
redis_ip_address, int(redis_port), password=redis_password)

# Create a Redis client.
redis_client = services.create_redis_client(redis_address)
redis_client = services.create_redis_client(
redis_address, password=redis_password)

# Check that the verion information on this node matches the version
# information that the cluster was started with.
Expand All @@ -339,6 +357,7 @@ def start(node_ip_address, redis_address, redis_port, num_redis_shards,
object_manager_ports=[object_manager_port],
num_workers=num_workers,
object_store_memory=object_store_memory,
redis_password=redis_password,
cleanup=False,
redirect_worker_output=not no_redirect_worker_output,
redirect_output=not no_redirect_output,
Expand Down
Loading