Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TaskCancellation #7669

Merged
merged 50 commits into from
Apr 25, 2020
Merged

TaskCancellation #7669

merged 50 commits into from
Apr 25, 2020

Conversation

ijrsvt
Copy link
Contributor

@ijrsvt ijrsvt commented Mar 20, 2020

Task Cancellation for locally submitted tasks. Cancellation for remote tasks and the full API will be added in follow-up PRs.

Why are these changes needed?

Related issue number

Closes #854

Checks

@ijrsvt ijrsvt changed the title Local Mode is in C++ TaskCancellation Mar 20, 2020
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23407/
Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23709/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23717/
Test PASSed.

@simon-mo simon-mo self-assigned this Mar 27, 2020
@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23979/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/23986/
Test PASSed.

@ijrsvt ijrsvt marked this pull request as ready for review April 7, 2020 03:23
@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24331/
Test FAILed.

worker = ray.worker.global_worker
worker.check_connected()
worker.core_worker.kill_actor(actor._ray_actor_id, False)

if isinstance(id, ray.actor.ActorHandle):
Copy link
Contributor Author

@ijrsvt ijrsvt Apr 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll move this to a new PR later

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24341/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/24344/
Test PASSed.

Copy link
Contributor

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!

auto task_id = object_id.TaskId();
if (task_manager_->IsTaskPending(task_id)) {
auto task_spec = task_manager_->GetTaskSpec(object_id.TaskId());
if (!task_spec.IsActorCreationTask())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do a RAY_CHECK instead of if statement here because this is definitely checked by python code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it actually is.

boost::bind(&CoreWorker::TryKillTask, this, task_id, num_tries - 1));
}

void CoreWorker::HandleKillTask(const rpc::KillTaskRequest &request,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happen if the task is inside worker's scheduling queue and timeout expired?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For direct tasks, the task is never entered into the scheduling queue, it is immediately executed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about for actor tasks? Is that not supported at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25075/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25107/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25108/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25109/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25139/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25142/
Test PASSed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25143/
Test FAILed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25146/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25151/
Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/Ray-PRB/25156/
Test PASSed.

@stephanie-wang stephanie-wang merged commit 69ff7e3 into ray-project:master Apr 25, 2020
@ijrsvt ijrsvt mentioned this pull request May 19, 2020
6 tasks
@simon-mo
Copy link
Contributor

simon-mo commented May 26, 2020 via email

@mitar
Copy link
Member

mitar commented May 26, 2020

Not sure why there are two calls here though. It seems pretty easy to expose to the user only one? API proliferation?

@ijrsvt
Copy link
Contributor Author

ijrsvt commented May 28, 2020

@mitar We may eventually combine them into one API. The main reason for separating them was that kill-ing a scheduled task semantically doesn't make sense--cancel-ing a task does. Similarly, cancel-ing an actor does not make as much sense as kill-ing an actor.

@mitar
Copy link
Member

mitar commented May 28, 2020

But canceling a running task still kills it, no?

@edoakes
Copy link
Contributor

edoakes commented May 28, 2020

ray.kill kills the target process, ray.cancel doesn't by default

@mitar
Copy link
Member

mitar commented May 28, 2020

Yea, so we could have ray.cancel(id, force) for both actors and tasks, where one would send an interrupt without force to the actor as well?

Anyway, it works for me. I am just surprised with this design decisions. :-) Keep up with good work. :-) Ray is amazing.

@ijrsvt ijrsvt deleted the TaskCancellation branch August 26, 2020 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ray kill remote tasks
7 participants