Skip to content

Conversation

@milesgranger
Copy link
Contributor

@milesgranger milesgranger commented Jun 2, 2023

Closes #7875

  • Tests added / passed
  • Passes pre-commit run --all-files

@milesgranger milesgranger requested a review from fjetter as a code owner June 2, 2023 10:59
@milesgranger milesgranger requested a review from crusaderky June 2, 2023 12:06
@github-actions
Copy link
Contributor

github-actions bot commented Jun 2, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       20 files  ±0         20 suites  ±0   11h 45m 3s ⏱️ + 16m 55s
  3 676 tests +1    3 562 ✔️  - 2     108 💤 ±0  6 +3 
35 548 runs  +8  33 778 ✔️ +5  1 764 💤 ±0  6 +3 

For more details on these failures, see this check.

Results for commit d3f9b58. ± Comparison against base commit b52024b.

♻️ This comment has been updated with latest results.

@crusaderky
Copy link
Collaborator

Better than before, but there's still a problem.

  1. start the cluster:
import dask.array as da
import distributed

client = distributed.Client(n_workers=4, threads_per_worker=1, memory_limit="2 GiB")
a = da.random.random((14_000, 14_000))
b = (a @ a.T).sum()
  1. open the fine performance metrics in the dashboard
  2. while the browser is open on the widget, call b.compute()
  3. as soon as you start spilling, you'll get a traceback in the stderr:
2023-06-02 14:53:43,795 - bokeh.server.protocol_handler - ERROR - error handling message
 message: Message 'PATCH-DOC' content: {'events': [{'kind': 'ModelChanged', 'model': {'id': 'p6560'}, 'attr': 'inner_width', 'new': 575}, {'kind': 'ModelChanged', 'model': {'id': 'p6560'}, 'attr': 'inner_height', 'new': 433}, {'kind': 'ModelChanged', 'model': {'id': 'p6560'}, 'attr': 'outer_width', 'new': 615}, {'kind': 'ModelChanged', 'model': {'id': 'p6560'}, 'attr': 'outer_height', 'new': 512}]} 
 error: DeserializationError("can't resolve reference 'p6560'")
Traceback (most recent call last):
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/server/protocol_handler.py", line 97, in handle
    work = await handler(message, connection)
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/server/session.py", line 94, in _needs_document_lock_wrapper
    result = func(self, *args, **kwargs)
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/server/session.py", line 288, in _handle_patch
    message.apply_to_document(self.document, self)
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/protocol/messages/patch_doc.py", line 104, in apply_to_document
    invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter))
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/document/callbacks.py", line 443, in invoke_with_curdoc
    return f()
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/protocol/messages/patch_doc.py", line 104, in <lambda>
    invoke_with_curdoc(doc, lambda: doc.apply_json_patch(self.payload, setter=setter))
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/document/document.py", line 369, in apply_json_patch
    patch: PatchJson = deserializer.deserialize(patch_json)
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 501, in deserialize
    return self.decode(obj.content, obj.buffers)
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 516, in decode
    return self._decode(obj)
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 557, in _decode
    return {key: self._decode(val) for key, val in obj.items()}
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 557, in <dictcomp>
    return {key: self._decode(val) for key, val in obj.items()}
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 559, in _decode
    return [self._decode(entry) for entry in obj]
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 559, in <listcomp>
    return [self._decode(entry) for entry in obj]
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 557, in _decode
    return {key: self._decode(val) for key, val in obj.items()}
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 557, in <dictcomp>
    return {key: self._decode(val) for key, val in obj.items()}
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 555, in _decode
    return self._decode_ref(cast(Ref, obj))
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 569, in _decode_ref
    self.error(f"can't resolve reference '{id}'")
  File "/home/crusaderky-nocrypt/mambaforge/envs/distributed/lib/python3.10/site-packages/bokeh/core/serialization.py", line 717, in error
    raise DeserializationError(message)
bokeh.core.serialization.DeserializationError: can't resolve reference 'p6560'

The error disappears after you reload the page in your browser.

@milesgranger
Copy link
Contributor Author

@crusaderky I really appreciate the feedback and examples of reproducing. I think this ought to be good for another look when time permits.

@crusaderky
Copy link
Collaborator

@milesgranger now it hangs with 100% CPU usage...

@crusaderky crusaderky merged commit fdeb6b3 into dask:main Jun 7, 2023
@milesgranger milesgranger deleted the milesgranger/7875-fine-perf-metrics-spill-crash branch June 7, 2023 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fine performance metrics dashboard crashes w/ spill activity

2 participants