[Core] ray job stop
Fails with 500 Error in CLI #49889
Open
Description
What happened + What you expected to happen
Description:
Attempting to stop a Ray job using the ray job stop command fails with a 500 error. The detailed traceback indicates a TimeoutError during the job termination process.
Command Executed:
ray job submit --runtime-env-json='{"pip": ["requests==2.26.0"]}' --working-dir ./ -- python script.py
ray job stop raysubmit_MFG6KpQRRaGKBawC
Output
Job submission server address: http://**.**.73.122:8265
Attempting to stop job 'raysubmit_VYbBGNJeavjmHdyy'
Traceback (most recent call last):
File "/home/ray/anaconda3/bin/ray", line 8, in <module>
sys.exit(main())
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/scripts/scripts.py", line 2668, in main
return cli()
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
return self.main(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1078, in main
rv = self.invoke(ctx)
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/ray/anaconda3/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli_utils.py", line 54, in wrapper
return func(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/autoscaler/_private/cli_logger.py", line 823, in wrapper
return f(*args, **kwargs)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/cli.py", line 389, in stop
client.stop_job(job_id)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/sdk.py", line 284, in stop_job
self._raise_error(r)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/dashboard_sdk.py", line 283, in _raise_error
raise RuntimeError(
RuntimeError: Request failed with status code 500: Traceback (most recent call last):
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 390, in stop_job
resp = await job_agent_client.stop_job_internal(job.submission_id)
File "/home/ray/anaconda3/lib/python3.10/site-packages/ray/dashboard/modules/job/job_head.py", line 88, in stop_job_internal
async with self._session.post(
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 1197, in __aenter__
self._resp = await self._coro
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client.py", line 608, in _request
await resp.start(conn)
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/client_reqrep.py", line 976, in start
message, payload = await protocol.read() # type: ignore[union-attr]
File "/home/ray/anaconda3/lib/python3.10/site-packages/aiohttp/streams.py", line 640, in read
await self._waiter
aiohttp.client_exceptions.ServerDisconnectedError: Server disconnected
Issue Details:
Job Submission Server Address: http://..73.122:8265
Versions / Dependencies
Ray Version: 2.40.0
Reproduction script
runtime_env = {"pip": ["emoji"]}
ray.init(runtime_env=runtime_env)
@ray.remote
def f():
import emoji
return emoji.emojize('Python is :thumbs_up:')
print(ray.get(f.remote()))
Issue Severity
High: It blocks me from completing my task.