Skip to content

Connecting a node after removal will fail #2878

@richardliaw

Description

@richardliaw

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux 16.04
  • Ray installed from (source or binary): Source
  • Ray version: Master
  • Python version: 3.6

Describe the problem

Run the following commands, perhaps with some gap in between:

ray start --head --redis-port 6311 --num-cpus 4 --use-raylet
ray start  --redis-address localhost:6311 --num-cpus 4 --use-raylet
ps -ax | grep /ray/python/ray/core/src/ray/raylet/raylet 
kill 1234 # kill the worker raylet

ray start  --redis-address localhost:6311 --num-cpus 4 --use-raylet
# Soon, this raylet will die by itself.

On a separate shell, run in interpreter:

import ray
ray.init(redis_address="localhost:6311")

Source code / logs

(With the client table duplicates removed in source code):

In [11]: ray.global_state.client_table()
Out[11]:
[{'ClientID': '3eb3c8e2d0f6b134a0df9fff457e624280fda4a4',
  'IsInsertion': True,
  'NodeManagerAddress': '169.229.49.172',
  'NodeManagerPort': 40619,
  'ObjectManagerPort': 45861,
  'ObjectStoreSocketName': '/tmp/plasma_store92988567',
  'RayletSocketName': '/tmp/raylet64899235',
  'Resources': {'GPU': 1.0, 'CPU': 4.0}},
 {'ClientID': '58bd566ebb1da7d2b8f938c96d509dcd48f771b2',
  'IsInsertion': False,
  'NodeManagerAddress': '',
  'NodeManagerPort': 0,
  'ObjectManagerPort': 0,
  'ObjectStoreSocketName': '',
  'RayletSocketName': '',
  'Resources': {}},
 {'ClientID': 'a74807e0d1a3a4064baf88c734641a07751505cc',
  'IsInsertion': False,
  'NodeManagerAddress': '',
  'NodeManagerPort': 0,
  'ObjectManagerPort': 0,
  'ObjectStoreSocketName': '',
  'RayletSocketName': '',
  'Resources': {}}]

Output in /tmp/raylogs/raylet*:

E0914 21:35:56.401674 12738 io.cc:119] Connection to socket failed for pathname /tmp/raylet71207356
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0914 21:35:56.401732 12739 io.cc:119] Connection to socket failed for pathname /tmp/raylet71207356
F0914 21:36:01.397866 12737 io.cc:127] Could not connect to socket /tmp/raylet71207356
*** Check failure stack trace: ***
F0914 21:36:01.406622 12740 io.cc:127] Could not connect to socket /tmp/raylet71207356
*** Check failure stack trace: ***
F0914 21:36:01.407189 12739 io.cc:127] Could not connect to socket /tmp/raylet71207356
F0914 21:36:01.407200 12738 io.cc:127] Could not connect to socket /tmp/raylet71207356
*** Check failure stack trace: ***
*** Check failure stack trace: ***
F0914 21:46:24.084065 26213 node_manager.cc:349]  Check failed: _s.ok() Bad status: IOError: Connection refused
*** Check failure stack trace: ***
    @           0x56fbea  google::LogMessage::Fail()
    @           0x56fb2e  google::LogMessage::SendToLog()
    @           0x56f470  google::LogMessage::Flush()
    @           0x56f26b  google::LogMessage::~LogMessage()
    @           0x4cbd00  ray::RayLog::~RayLog()
    @           0x4fb40f  ray::raylet::NodeManager::ClientAdded()
    @           0x49d805  ray::gcs::ClientTable::HandleNotification()
    @           0x49e1cb  _ZNSt17_Function_handlerIFvPN3ray3gcs14AsyncGcsClientERKNS0_8UniqueIDERKSt6vectorI16ClientTableDataTSaIS8_EEEZZNS1_11ClientTable7ConnectERKS8_ENKUlS3_S6_SG_E_clES3_S6_SG_EUlS3_S6_SC_E_E9_M_invokeERKSt9_Any_dataOS3_S6_SC_
    @           0x4ad364  _ZZN3ray3gcs3LogINS_8UniqueIDE15ClientTableDataE9SubscribeERKS2_S6_RKSt8functionIFvPNS0_14AsyncGcsClientES6_RKSt6vectorI16ClientTableDataTSaISB_EEEERKS7_IFvS9_EEENKUlRKSsE_clESP_
    @           0x4c8359  (anonymous namespace)::ProcessCallback()
    @           0x4c9b83  ray::gcs::SubscribeRedisCallback()
    @           0x514ded  redisProcessCallbacks
    @           0x4cb75d  RedisAsioClient::handle_read()
    @           0x4cbbf5  boost::asio::detail::reactive_null_buffers_op<>::do_complete()
    @           0x48211d  boost::asio::detail::epoll_reactor::descriptor_state::do_complete()
    @           0x482a27  boost::asio::detail::task_io_service::run()
    @           0x47a4bc  main
    @     0x7f66a8681830  __libc_start_main
    @           0x47e449  _start
    @              (nil)  (unknown)
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0914 21:46:25.054061 26271 io.cc:119] Connection to socket failed for pathname /tmp/raylet48552207
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0914 21:46:25.105113 26270 io.cc:119] Connection to socket failed for pathname /tmp/raylet48552207
WARNING: Logging before InitGoogleLogging() is written to STDERR
WARNING: Logging before InitGoogleLogging() is written to STDERR
E0914 21:46:25.127391 26268 io.cc:119] Connection to socket failed for pathname /tmp/raylet48552207
E0914 21:46:25.127399 26269 io.cc:119] Connection to socket failed for pathname /tmp/raylet48552207
F0914 21:46:30.060098 26271 io.cc:127] Could not connect to socket /tmp/raylet48552207
*** Check failure stack trace: ***
F0914 21:46:30.111260 26270 io.cc:127] Could not connect to socket /tmp/raylet48552207
*** Check failure stack trace: ***
F0914 21:46:30.133474 26269 io.cc:127] Could not connect to socket /tmp/raylet48552207
F0914 21:46:30.133491 26268 io.cc:127] Could not connect to socket /tmp/raylet48552207
*** Check failure stack trace: ***
*** Check failure stack trace: ***

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions