Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Add bundle id as a label; #16819

Merged
merged 28 commits into from
Jul 15, 2021
Merged

[core] Add bundle id as a label; #16819

merged 28 commits into from
Jul 15, 2021

Conversation

fishbone
Copy link
Contributor

@fishbone fishbone commented Jul 1, 2021

Why are these changes needed?

Right now, we have a bug where the waiting jobs will fail due to resources capacity.
This PR fixed it by adding bundle id as an implicit flag.

TODO: add this flag to all jobs when the bundle is assigned.

Related issue number

Closes #15842

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@fishbone fishbone changed the title Pl fix [core] Add bundle id as a label; Jul 2, 2021
@fishbone fishbone marked this pull request as ready for review July 2, 2021 06:58
@@ -775,6 +773,9 @@ bool ClusterResourceScheduler::AllocateTaskResourceInstances(
}
}
} else {
// Allocation failed. Restore node's local resources by freeing the resources
// of the failed allocation.
FreeTaskResourceInstances(task_allocation);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuisawesome I think it's a bug here, but not sure, could you confirm it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuisawesome in case you missed this one

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuisawesome please verify if this makes sense. @iycheng is this related to this PR? Can we make a separate PR for the fix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's just a fix for this one but feel like worth separating for it. I'm not sure whether it's easy to add a unit test for this as well. Lack of bandwidth for that if it's too difficult. I'm ok to roll it back and create an issue if you prefer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it seems like the scenario you are thinking is the allocation only contains the subset of custom resources right?

e.g.,

Task: custom_1: 1.0, custom_2: 1.0

Node: custom_1: 1.0

and this won't free custom_1.

I think this should be easy to unit test? If we add this in this PR, I think we should add unit test at least

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, then let me roll it back for now and create a p1 ticket for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rkooo567 I think this code path will only be reached in this case:

  • placement group created
  • schedule with that pg (before running this function)
  • placement group returned
  • fail to allocate the resource for placement group

The case you mentioned won't trigger this error, since it won't even be scheduled to this node.

@@ -313,13 +316,6 @@ std::string NodeResources::DebugString(StringIdMap string_to_in_map) const {
return buffer.str();
}

const std::string format_resource(std::string resource_name, double quantity) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove duplicate function in scheduling_resources.h

@rkooo567 rkooo567 self-assigned this Jul 2, 2021
@rkooo567
Copy link
Contributor

rkooo567 commented Jul 6, 2021

I will leave the label until tests are actually passing? If you'd like us to see the PR before that, let us know.

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 6, 2021
@fishbone
Copy link
Contributor Author

fishbone commented Jul 7, 2021

I will leave the label until tests are actually passing? If you'd like us to see the PR before that, let us know.

@rkooo567 It'll be good if you can check whether this solution is a reasonable one. It'll change the behavior of the current implementation.

Before:

If resource == 0, it'll be scheduled to any node that fit even placement group bundle is given.

After:
Even resource == 0, when placement group is given, it'll only be scheduled to the node allocated for that PG.

Due to the specialty of actor creation job, I only add that for the ready task. The above feature can be added later.

@fishbone fishbone removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 12, 2021
@rkooo567
Copy link
Contributor

I think the general approach makes sense to me. Btw, did you verify with

import ray
import time

ray.init()

@ray.remote
class ChildActor:
    def __init__(self):
        self.val = 1


@ray.remote
class RemoteActor:
    def __init__(self):
        self.val = 1

main_actors = []
child_actors = []

for num_children in [10, 1, 20, 10, 5] * 4:
    pg = ray.util.placement_group(bundles=[{"CPU": 1}] + [{"CPU": 0.01}] * num_children)
    while not ray.get(pg.ready()):
        pass

    main_actor = RemoteActor.options(num_cpus=1, placement_group=pg, placement_group_bundle_index=0).remote()
    main_actors.append(main_actor)

    for j in range(num_children):
        actor = ChildActor.options(
            placement_group=pg,
            num_cpus=0.01).remote()
        child_actors.append(actor)
    pg.ready()

    print("Created actor with", num_children, "children")
    time.sleep(1)

@@ -28,6 +28,30 @@ def method(self, x):
return x + 2


@pytest.mark.parametrize("connect_to_client", [False, True])
def test_placement_ready(ray_start_cluster, connect_to_client):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_placement_ready(ray_start_cluster, connect_to_client):
def test_placement_ready(ray_start_regular_shared, connect_to_client):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And remove the corresponding cluster utils stuff)

@fishbone
Copy link
Contributor Author

@rkooo567 it's running good with this script.

src/ray/common/task/scheduling_resources.h Outdated Show resolved Hide resolved
src/ray/raylet/placement_group_resource_manager_test.cc Outdated Show resolved Hide resolved
@@ -775,6 +773,9 @@ bool ClusterResourceScheduler::AllocateTaskResourceInstances(
}
}
} else {
// Allocation failed. Restore node's local resources by freeing the resources
// of the failed allocation.
FreeTaskResourceInstances(task_allocation);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wuisawesome please verify if this makes sense. @iycheng is this related to this PR? Can we make a separate PR for the fix?

with connect_to_client_or_not(connect_to_client):
pg = ray.util.placement_group(bundles=[{"CPU": 1}])
ray.get(pg.ready())
a = Actor.options(num_cpus=1, placement_group=pg).remote()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right test isn't this? We should test with the default actor that requires 0 cpu?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, this PR only fix the issue #15842 here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I see. Can you actually add a comment here to explain what the test is doing? I think you can say;

# Test when all the bundle is occupied by the placement group, ready task is still schedulable

Also, I prefer to test 0 cpu case because that's the behavior change the PR will invoke.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh actually, I saw #16819 (comment).

Does that mean this won't work for 0 cpu actor case still right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last followup after sync:

  • Comment what the test is doing.
  • Make the bundle resource private?

@fishbone
Copy link
Contributor Author

Roll it back and let's follow the bug in a separate ticket #17044

Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you follow up with #16819 (comment)? That's the last thing I'd like to discuss. Other parts LGMT

@rkooo567 rkooo567 added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 13, 2021
@rkooo567 rkooo567 self-requested a review July 14, 2021 17:42
Copy link
Contributor

@rkooo567 rkooo567 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the last comments are addressed https://github.com/ray-project/ray/pull/16819/files#r669822319

@fishbone fishbone removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Jul 15, 2021
@rkooo567 rkooo567 merged commit 1386762 into ray-project:master Jul 15, 2021
rkooo567 pushed a commit that referenced this pull request Oct 19, 2023
#39946)

Before, actors scheduled with no resources and placement groups would not be placed with the placement group or bundle. Fix this by using the phantom bundle resource that is also used for the Ready() call as an additional placement group constraint. (See #16819 for context)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[core/placement groups] placement group ready() occupies resources, blocks usage
3 participants