Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[autoscaler] Kubernetes autoscaler backend #5492

Merged
merged 48 commits into from
Oct 3, 2019
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
86d9c6f
Add Kubernetes NodeProvider to autoscaler
edoakes Aug 12, 2019
be1b96b
Split off SSHCommandRunner
edoakes Aug 19, 2019
ffe00a6
Add KubernetesCommandRunner
edoakes Aug 20, 2019
0915df9
Cleanup
edoakes Aug 22, 2019
4558a76
More config options
edoakes Aug 22, 2019
15d1856
Check if auth present
edoakes Aug 22, 2019
8ed1ed9
More auth checks
edoakes Aug 22, 2019
8ec5f23
Better output
edoakes Aug 22, 2019
3c454c2
Always bootstrap config
edoakes Aug 22, 2019
56e3bf0
All working
edoakes Aug 23, 2019
4379355
Add k8s-rsync comment
edoakes Aug 23, 2019
44327bb
Clean up manual k8s examples
edoakes Aug 23, 2019
2d6fcc3
Fix up submit.yaml
edoakes Aug 23, 2019
9c98da7
Automatically configure permissisons
edoakes Aug 24, 2019
a7265ad
Fix get_node_provider arg
edoakes Aug 24, 2019
3174aa1
Fix permissions
edoakes Aug 24, 2019
aa13c22
Fill in empty auth
edoakes Aug 26, 2019
158876f
Merge branch 'master' into k8s
edoakes Aug 26, 2019
6ff3c2a
Remove ray-cluster from this PR
edoakes Aug 26, 2019
c218965
No hard dep on kubernetes library
edoakes Aug 26, 2019
f904982
Move permissions into autoscaler config
edoakes Aug 26, 2019
2452e1b
lint
edoakes Aug 26, 2019
a4e16d6
Fix indentation
edoakes Aug 26, 2019
e13f4a1
namespace validation
edoakes Aug 28, 2019
0fc7a0f
Use cluster name tag
edoakes Aug 28, 2019
9dd310f
Remove kubernetes from setup.py
edoakes Aug 28, 2019
da0ad01
Comment in example configs
edoakes Aug 28, 2019
34b6592
Same default autoscaling config as aws
edoakes Aug 28, 2019
8a0bb24
Add Kubernetes quickstart
edoakes Aug 28, 2019
6df676c
lint
edoakes Aug 28, 2019
15d3573
Revert changes to submit.yaml (other PR)
edoakes Aug 28, 2019
c610f34
Install kubernetes in travis
edoakes Aug 29, 2019
0cc97d6
address comments
edoakes Aug 29, 2019
e1c6fa2
Improve autoscaling doc
edoakes Aug 29, 2019
3bb44d6
Merge remote-tracking branch 'upstream/master' into k8s
edoakes Aug 29, 2019
2e449ea
kubectl command in setup
edoakes Sep 3, 2019
8fa6126
Force use_internal_ips
edoakes Sep 3, 2019
6559635
Merge remote-tracking branch 'upstream/master' into k8s
edoakes Sep 3, 2019
3fdb850
comments
edoakes Sep 3, 2019
c01d471
backend env in docs
edoakes Sep 4, 2019
670d819
Merge remote-tracking branch 'upstream/master' into k8s
edoakes Sep 4, 2019
86d5cc4
Change namespace config
edoakes Sep 4, 2019
1b80017
comments
edoakes Sep 18, 2019
16adb26
Merge remote-tracking branch 'upstream/master' into k8s
edoakes Sep 18, 2019
e88ec5f
comments
edoakes Sep 18, 2019
fee41e4
Merge branch 'master' into k8s
edoakes Sep 20, 2019
734320a
Merge branch 'master' into k8s
edoakes Oct 2, 2019
b29d172
Fix yaml test
edoakes Oct 3, 2019
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
Prev Previous commit
Next Next commit
Merge branch 'master' into k8s
  • Loading branch information
edoakes committed Aug 26, 2019
commit 158876f8d679d8fce98186cc723f99d9694deb61
1 change: 1 addition & 0 deletions .bazelrc
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,4 @@ build --per_file_copt='\\.pb\\.cc$@-w'
build --per_file_copt='external*@-w'
# This workaround is needed due to https://github.com/bazelbuild/bazel/issues/4341
build --per_file_copt="external/com_github_grpc_grpc/.*@-DGRPC_BAZEL_BUILD"
build --http_timeout_scaling=5.0
4 changes: 2 additions & 2 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

## What do these changes do?

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

Expand Down
6 changes: 0 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -172,12 +172,6 @@ script:
# `cluster_tests.py` runs on Jenkins, not Travis.
- if [ $RAY_CI_TUNE_AFFECTED == "1" ]; then python -m pytest --durations=10 --timeout=300 --ignore=python/ray/tune/tests/test_cluster.py --ignore=python/ray/tune/tests/test_tune_restore.py --ignore=python/ray/tune/tests/test_actor_reuse.py python/ray/tune/tests; fi

# ray rllib tests
- if [ $RAY_CI_RLLIB_AFFECTED == "1" ]; then ./ci/suppress_output python python/ray/rllib/tests/test_catalog.py; fi
- if [ $RAY_CI_RLLIB_AFFECTED == "1" ]; then ./ci/suppress_output python python/ray/rllib/tests/test_filters.py; fi
- if [ $RAY_CI_RLLIB_AFFECTED == "1" ]; then ./ci/suppress_output python python/ray/rllib/tests/test_optimizers.py; fi
- if [ $RAY_CI_RLLIB_AFFECTED == "1" ]; then ./ci/suppress_output python python/ray/rllib/tests/test_evaluators.py; fi

# ray tests
# Python3.5+ only. Otherwise we will get `SyntaxError` regardless of how we set the tester.
- if [ $RAY_CI_PYTHON_AFFECTED == "1" ]; then python -c 'import sys;exit(sys.version_info>=(3,5))' || python -m pytest -v --durations=5 --timeout=300 python/ray/experimental/test/async_test.py; fi
Expand Down
69 changes: 24 additions & 45 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -65,20 +65,6 @@ cc_proto_library(
deps = [":object_manager_proto"],
)

proto_library(
name = "raylet_proto",
srcs = ["src/ray/protobuf/raylet.proto"],
deps = [
":common_proto",
":gcs_proto",
],
)

cc_proto_library(
name = "raylet_cc_proto",
deps = [":raylet_proto"],
)

proto_library(
name = "worker_proto",
srcs = ["src/ray/protobuf/worker.proto"],
Expand Down Expand Up @@ -182,34 +168,6 @@ cc_library(
],
)

# Raylet gRPC lib.
cc_grpc_library(
name = "raylet_cc_grpc",
srcs = [":raylet_proto"],
grpc_only = True,
deps = [":raylet_cc_proto"],
)

# Raylet rpc server and client.
cc_library(
name = "raylet_rpc",
srcs = glob([
"src/ray/rpc/raylet/*.cc",
]),
hdrs = glob([
"src/ray/rpc/raylet/*.h",
"src/ray/raylet/*.h",
]),
copts = COPTS,
deps = [
":grpc_common_lib",
":ray_common",
":raylet_cc_grpc",
"@boost//:asio",
"@com_github_grpc_grpc//:grpc++",
],
)

# Worker gRPC lib.
cc_grpc_library(
name = "worker_cc_grpc",
Expand Down Expand Up @@ -263,6 +221,7 @@ cc_library(
copts = COPTS,
deps = [
":common_cc_proto",
":node_manager_fbs",
":ray_util",
"@boost//:asio",
"@com_github_grpc_grpc//:grpc++",
Expand Down Expand Up @@ -361,11 +320,11 @@ cc_library(
deps = [
":common_cc_proto",
":gcs",
":node_manager_fbs",
":node_manager_rpc",
":object_manager",
":ray_common",
":ray_util",
":raylet_rpc",
":stats_lib",
":worker_rpc",
"@boost//:asio",
Expand Down Expand Up @@ -461,6 +420,7 @@ cc_test(
srcs = ["src/ray/raylet/lineage_cache_test.cc"],
copts = COPTS,
deps = [
":node_manager_fbs",
":raylet_lib",
"@com_google_googletest//:gtest_main",
],
Expand All @@ -471,6 +431,7 @@ cc_test(
srcs = ["src/ray/raylet/reconstruction_policy_test.cc"],
copts = COPTS,
deps = [
":node_manager_fbs",
":object_manager",
":raylet_lib",
"@com_google_googletest//:gtest_main",
Expand Down Expand Up @@ -670,6 +631,7 @@ cc_library(
deps = [
":gcs_cc_proto",
":hiredis",
":node_manager_fbs",
":node_manager_rpc",
":ray_common",
":ray_util",
Expand All @@ -678,6 +640,7 @@ cc_library(
],
)

# TODO(micafan) Replace cc_binary with cc_test for GCS test.
cc_binary(
name = "redis_gcs_client_test",
testonly = 1,
Expand All @@ -700,6 +663,17 @@ cc_binary(
],
)

cc_binary(
name = "subscription_executor_test",
testonly = 1,
srcs = ["src/ray/gcs/subscription_executor_test.cc"],
copts = COPTS,
deps = [
":gcs",
"@com_google_googletest//:gtest_main",
],
)

cc_binary(
name = "asio_test",
testonly = 1,
Expand All @@ -725,6 +699,13 @@ flatbuffer_cc_library(
out_prefix = "src/ray/common/",
)

flatbuffer_cc_library(
name = "node_manager_fbs",
srcs = ["src/ray/raylet/format/node_manager.fbs"],
flatc_args = FLATC_ARGS,
out_prefix = "src/ray/raylet/format/",
)

flatbuffer_cc_library(
name = "object_manager_fbs",
srcs = ["src/ray/object_manager/format/object_manager.fbs"],
Expand All @@ -748,8 +729,6 @@ cc_binary(
srcs = glob([
"src/ray/core_worker/lib/java/*.h",
"src/ray/core_worker/lib/java/*.cc",
"src/ray/raylet/lib/java/*.h",
"src/ray/raylet/lib/java/*.cc",
]) + [
"@bazel_tools//tools/jdk:jni_header",
] + select({
Expand Down
2 changes: 2 additions & 0 deletions bazel/BUILD.plasma
Original file line number Diff line number Diff line change
Expand Up @@ -132,13 +132,15 @@ cc_library(
"cpp/src/plasma/eviction_policy.cc",
"cpp/src/plasma/external_store.cc",
"cpp/src/plasma/plasma_allocator.cc",
"cpp/src/plasma/quota_aware_policy.cc",
"cpp/src/plasma/thirdparty/ae/ae.c",
],
hdrs = [
"cpp/src/plasma/events.h",
"cpp/src/plasma/eviction_policy.h",
"cpp/src/plasma/external_store.h",
"cpp/src/plasma/plasma_allocator.h",
"cpp/src/plasma/quota_aware_policy.h",
"cpp/src/plasma/store.h",
"cpp/src/plasma/thirdparty/ae/ae.h",
"cpp/src/plasma/thirdparty/ae/ae_epoll.c",
Expand Down
2 changes: 1 addition & 1 deletion bazel/ray_deps_setup.bzl
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def ray_deps_setup():
new_git_repository(
name = "plasma",
build_file = "@//bazel:BUILD.plasma",
commit = "f976629a54f5518f6285a311c45c5957281b1ee7",
commit = "141a213a54f4979ab0b94b94928739359a2ee9ad",
remote = "https://github.com/apache/arrow",
)

Expand Down
2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ pushd "$BUILD_DIR"
# the commit listed in the command.
$PYTHON_EXECUTABLE -m pip install -q \
--target="$ROOT_DIR/python/ray/pyarrow_files" pyarrow==0.14.0.RAY \
--find-links https://s3-us-west-2.amazonaws.com/arrow-wheels/50f14adecbb83228599a2dc57859e4ecbe054b92/index.html
--find-links https://s3-us-west-2.amazonaws.com/arrow-wheels/516e15028091b5e287200b5df77d77f72d9a6c9a/index.html
export PYTHON_BIN_PATH="$PYTHON_EXECUTABLE"

if [ "$RAY_BUILD_JAVA" == "YES" ]; then
Expand Down
25 changes: 0 additions & 25 deletions ci/jenkins_tests/perf_integration_tests/run_perf_integration.sh

This file was deleted.

27 changes: 0 additions & 27 deletions ci/jenkins_tests/run_asv.sh

This file was deleted.

6 changes: 6 additions & 0 deletions ci/jenkins_tests/run_multi_node_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,9 @@ $SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE}

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/experimental/sgd/examples/train_example.py

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/experimental/sgd/examples/train_example.py --num-replicas=2

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/experimental/sgd/examples/train_example.py --tune
29 changes: 0 additions & 29 deletions ci/jenkins_tests/run_rllib_asv.sh

This file was deleted.

27 changes: 24 additions & 3 deletions ci/jenkins_tests/run_rllib_tests.sh
100644 → 100755
Original file line number Diff line number Diff line change
@@ -1,3 +1,18 @@
docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/tests/test_catalog.py

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/tests/test_optimizers.py

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/tests/test_filters.py

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/tests/test_evaluators.py

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/tests/test_eager_support.py

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output /ray/rllib/train.py \
--env PongDeterministic-v0 \
Expand Down Expand Up @@ -386,9 +401,6 @@ docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/multiagent_cartpole.py --num-iters=2

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/multiagent_cartpole.py --num-iters=2 --simple

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/multiagent_two_trainers.py --num-iters=2

Expand Down Expand Up @@ -428,6 +440,12 @@ docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/contrib/random_agent/random_agent.py

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/centralized_critic.py --stop=2000

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/centralized_critic_2.py --stop=2000

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/twostep_game.py --stop=2000 --run=contrib/MADDPG

Expand All @@ -440,6 +458,9 @@ docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/twostep_game.py --stop=2000 --run=APEX_QMIX

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output python /ray/rllib/examples/autoregressive_action_dist.py --stop=150

docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
/ray/ci/suppress_output /ray/rllib/train.py \
--env PongDeterministic-v4 \
Expand Down
14 changes: 14 additions & 0 deletions ci/jenkins_tests/run_tune_tests.sh
Original file line number Diff line number Diff line change
Expand Up @@ -70,10 +70,19 @@ $SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE}
python /ray/python/ray/tune/examples/tune_mnist_async_hyperband.py \
--smoke-test

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/lightgbm_example.py

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/xgboost_example.py

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/logging_example.py \
--smoke-test

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/mlflow_example.py

$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/bayesopt_example.py \
--smoke-test
Expand Down Expand Up @@ -109,3 +118,8 @@ $SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE}
$SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
python /ray/python/ray/tune/examples/skopt_example.py \
--smoke-test

# uncomment once statsmodels is updated.
# $SUPPRESS_OUTPUT docker run --rm --shm-size=${SHM_SIZE} --memory=${MEMORY_SIZE} $DOCKER_SHA \
# python /ray/python/ray/tune/examples/bohb_example.py \
# --smoke-test
Loading
You are viewing a condensed version of this merge commit. You can view the full changes here.