Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster dump utilities #5920

Merged
merged 29 commits into from
Mar 23, 2022
Merged

Cluster dump utilities #5920

merged 29 commits into from
Mar 23, 2022

Conversation

sjperkins
Copy link
Member

@sjperkins sjperkins commented Mar 9, 2022

Based on #5863 by @gjoseph92 which this PR will track further.

@sjperkins sjperkins marked this pull request as draft March 9, 2022 15:09
@gjoseph92
Copy link
Collaborator

Are you using much code from #5863, or just the cluster_dump.py file? If just the file, it might be worth suffering some easy merge conflicts to have a much smaller diff here.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 9, 2022

Unit Test Results

       12 files  ±  0         12 suites  ±0   7h 45m 57s ⏱️ + 25m 21s
  2 669 tests +  3    2 587 ✔️ +  2    80 💤 ±0  2 +1 
15 926 runs  +18  15 061 ✔️ +11  862 💤 +5  3 +2 

For more details on these failures, see this check.

Results for commit 9f81cfc. ± Comparison against base commit 4866615.

♻️ This comment has been updated with latest results.

@sjperkins
Copy link
Member Author

Are you using much code from #5863, or just the cluster_dump.py file? If just the file, it might be worth suffering some easy merge conflicts to have a much smaller diff here.

I'm just using cluster_dump.py. I'll go with your suggestion.

@fjetter
Copy link
Member

fjetter commented Mar 10, 2022

I just merged #5863. Could you rebase/merge? This would make a review much easier.

@sjperkins sjperkins marked this pull request as ready for review March 11, 2022 13:42
@sjperkins
Copy link
Member Author

I've done some manual testing with the dataset from #5675 as the default added tests are not very representative of real world cluster deadlocks.

distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/tests/test_client.py Outdated Show resolved Hide resolved
distributed/tests/test_client.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
@fjetter
Copy link
Member

fjetter commented Mar 14, 2022

A method that would likely be helpful is to split an existing dump artifact up into multiple files in case we need to look into the raw data. I've been using the below snippet to do this in the past.

@sjperkins would you mind adding something like this?

I see you are having a bunch of merge conflicts. This is due to us changing imports to absolute. They should be easy enough to fix, see #5924

logs = load_cluster_dump()


top_level = "workers"
workers = logs[top_level]
for ix, (worker, info) in enumerate(workers.items()):
    print(f"Processing worker {worker} {ix+1}/{len(workers)}")
    try:
        for name, _logs in info.items():

            directory = os.path.join(main_dir, top_level, worker)
            if not os.path.exists(directory):
                os.makedirs(directory)
            with open(os.path.join(directory, f"{name}.yaml"), "w") as fd:
                dump(_logs, fd, Dumper=Dumper)
    except:
        pass


import json
top_level = "scheduler"
info = logs[top_level]
for name, logs in info.items():
    directory = os.path.join(main_dir, top_level)
    if not os.path.exists(directory):
        os.makedirs(directory)
    with open(os.path.join(directory, f"{name}.yaml"), "w") as fd:
        dump(logs, fd, Dumper=Dumper)

@sjperkins
Copy link
Member Author

A method that would likely be helpful is to split an existing dump artifact up into multiple files in case we need to look into the raw data. I've been using the below snippet to do this in the past.

@sjperkins would you mind adding something like this?

Yes, I'll add it to the DumpInspector class.

I see you are having a bunch of merge conflicts. This is due to us changing imports to absolute. They should be easy enough to fix, see #5924

I've resolved these locally.

@gjoseph92 gjoseph92 self-requested a review March 15, 2022 16:05
distributed/cluster_dump.py Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
distributed/tests/test_client.py Outdated Show resolved Hide resolved
distributed/tests/test_client.py Outdated Show resolved Hide resolved
distributed/tests/test_client.py Outdated Show resolved Hide resolved
distributed/tests/test_client.py Outdated Show resolved Hide resolved
distributed/cluster_dump.py Outdated Show resolved Hide resolved
@gjoseph92
Copy link
Collaborator

@fjetter or @crusaderky this just needs a merge. Also @sjperkins maybe you should merge main, see if it helps CI at all?

@fjetter
Copy link
Member

fjetter commented Mar 23, 2022

Test failures appear to be unrelated, thank you @sjperkins !

@fjetter fjetter merged commit b872d45 into dask:main Mar 23, 2022
@sjperkins sjperkins deleted the cluster_dump_utilities branch March 23, 2022 10:58
@sjperkins
Copy link
Member Author

Test failures appear to be unrelated, thank you @sjperkins !

First Coiled commit. Woop woop!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Utility functions to debug cluster_dump
3 participants