Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dashboard]Don't set node actors when node_id of actor is Nil #13573

Merged
merged 2 commits into from
Jan 22, 2021

Conversation

WangTaoTheTonic
Copy link
Contributor

Why are these changes needed?

When actor is not ALIVE, normally it's raylet id is not set when published. The raylet id would be Nil("ffffffffffffffffffffffffffffffffffffffffffffffffffffffff", aka "f" * 56).

We should not update node_actors when the raylet id is Nil.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@WangTaoTheTonic WangTaoTheTonic changed the title Don't set node actors when node_id of actor is Nil [Dashboard]Don't set node actors when node_id of actor is Nil Jan 20, 2021
@rkooo567
Copy link
Contributor

Doesn't this mean we cannot get dead actors information?

@WangTaoTheTonic
Copy link
Contributor Author

WangTaoTheTonic commented Jan 20, 2021

Doesn't this mean we cannot get dead actors information?

We can still get dead actors, no matter before or after this change, as even an actor is dead, its raylet id still is valid(not empty).

This pr fix bug that when a new actor coming, it carries empty raylet id because it has not been allocated yet. Then the dashboard will put information of this actor into node_actors with an empty node id key, which means all new actors would be put into that map unnecessarily, causing memory leak.

node_actors = dict(DataSource.node_actors.get(node_id, {}))
node_actors[actor_id] = actor_table_data
DataSource.node_actors[node_id] = node_actors
if node_id != "f" * 56:
Copy link
Contributor

@fyrestone fyrestone Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to set a const in stats_collector_consts.py

NIL_NODE_ID = ray.NodeID.nil().hex()

then check if node_id != stats_collector_consts.NIL_NODE_ID.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a good idea!

Copy link
Contributor

@fyrestone fyrestone Jan 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this check to GetAllActorInfo processing logic, too? Then, the assertion could be added to the test_stats_collector.py:

# Start a cluster and a actor, use the resource constraints to prevent the actor from being alive
response = requests.get(webui_url + "/test/dump?key=node_actors")
response.raise_for_status()
result = response.json()
assert stats_collector_consts.NIL_NODE_ID not in result["data"]["node_actors"]

@fyrestone
Copy link
Contributor

Doesn't this mean we cannot get dead actors information?

We can still get dead actors, no matter before or after this change, as even an actor is dead, its raylet id still is valid(not empty).

This pr fix bug that when a new actor coming, it carries empty raylet id because it has not been allocated yet. Then the dashboard will put information of this actor into node_actors with an empty node id key, which means all new actors would be put into that map unnecessarily, causing memory leak.

Good catch! The value of DataSource.node_actors[NIL_NODE_ID] will be leaking in such case. Actually, none of the actor info will be deleted from Redis, so this is a leaky system. But we can avoid the leak of DataSource.node_actors[NIL_NODE_ID].

@rkooo567
Copy link
Contributor

Ping me after addressing @fyrestone's comment!

@rkooo567
Copy link
Contributor

I will merge upon his approval

Copy link
Contributor

@fyrestone fyrestone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WangTaoTheTonic
Copy link
Contributor Author

@rkooo567 It's ok to go!

@rkooo567 rkooo567 merged commit aa5d7a5 into ray-project:master Jan 22, 2021
@WangTaoTheTonic WangTaoTheTonic deleted the filter_nil_node branch January 22, 2021 05:47
fishbone pushed a commit to fishbone/ray that referenced this pull request Feb 16, 2021
…oject#13573)

* Don't set node actors when node_id of actor is Nil

* add test per comment
fishbone added a commit to fishbone/ray that referenced this pull request Feb 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants