CLI: kill actor via ray kill actors *actor_id* or via dashboard #39240
Labels
core
Issues that should be addressed in Ray Core
dashboard
Issues specific to the Ray Dashboard
enhancement
Request for new feature and/or capability
P2
Important issue, but not time-critical
Description
Allow killing an actor from the CLI and/or dashboard. I know we can use ray.kill(handle) from within a driver script or from within the job submitted to a cluster but what about killing a misbehaving actor if the job does not have the kill logic?
I previously mentioned this on the message board and was asked to open this issue.
Link to message board post: https://discuss.ray.io/t/how-to-kill-actor-from-cli-or-dashboard/11952
Use case
I'd like to be able to kill an actor from the Dashboard via a Kill button, which would terminate the actor and call a function with a conventional name(def exit, or def exit) within the Actor if defined.
Following the same pattern, I'd like to be able to terminate an Actor from the CLI. We already have 'ray list actors' , I think it makes sense also to have 'ray kill actors actor_id actor_id actor_id'. If called from the CLI we would also call the same def kill or def kill function on the Actor.
I'm working on an application that will leverage detached named actors as well as non-named Actors in many ActorPools; while working on some operational documentation for the application, I came across this question and realized there is not a method to kill an actor (other than from within the job code) of course we could always figure out the PID for the actor and kill that, but that solution is not very user friendly and in certain environments might require a sysadmin.
How does the community handle terminating hung/runaway actors when the job itself isn't smart enough to recognize an actor is hung and perform the cleanup without user interaction? Or when you don't want to kill the entire job?
If we did go down the path of allowing the termination of an actor from the dashboard and/or CLI, we would want to cleanup the ActorPool references to that actor so the pool does not have references to the terminated Actor.
The text was updated successfully, but these errors were encountered: