Skip to content

[Core] ray job stop Fails with 500 Error in CLI #49889

Open
@robbypambudi

Description

What happened + What you expected to happen

Description:

Attempting to stop a Ray job using the ray job stop command fails with a 500 error. The detailed traceback indicates a TimeoutError during the job termination process.

Command Executed:

ray job submit --runtime-env-json='{"pip": ["requests==2.26.0"]}' --working-dir ./ -- python script.py
ray job stop raysubmit_MFG6KpQRRaGKBawC

Output

Job submission server address: http://**.**.73.122:8265
Attempting to stop job 'raysubmit_VYbBGNJeavjmHdyy'
Traceback (most recent call last):
  File "/home/ray/anaconda3/bin/ray", line 8, in <module>
    sys.exit(main())
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2668, in main
    return cli()
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
    return func(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
    return f(*args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 389, in stop
    client.stop_job(job_id)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 284, in stop_job
    self._raise_error(r)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
    raise RuntimeError(
RuntimeError: Request failed with status code 500: Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 390, in stop_job
    resp = await job_agent_client.stop_job_internal(job.submission_id)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 88, in stop_job_internal
    async with self._session.post(
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 1197, in __aenter__
    self._resp = await self._coro
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 608, in _request
    await resp.start(conn)
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 976, in start
    message, payload = await protocol.read()  # type: ignore[union-attr]
  File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/streams.py", line 640, in read
    await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected

Issue Details:

Job Submission Server Address: http://..73.122:8265

Versions / Dependencies

Ray Version: 2.40.0

Reproduction script


runtime_env = {"pip": ["emoji"]}

ray.init(runtime_env=runtime_env)

@ray.remote
def f():
  import emoji
  return emoji.emojize('Python is :thumbs_up:')

print(ray.get(f.remote()))

Issue Severity

High: It blocks me from completing my task.

Metadata

Assignees

No one assigned

    Labels

    bugSomething that is supposed to be working; but isn'tcoreIssues that should be addressed in Ray CorejobstriageNeeds triage (eg: priority, bug/not-bug, and owning component)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions