Skip to content

KeyError when looking at durations of executing tasks #4587

Closed
@mrocklin

Description

@mrocklin

When stopping an in-flight computation I get the following traceback:

Traceback (most recent call last):
  File "/home/mrocklin/workspace/tornado/tornado/ioloop.py", line 741, in _run_callback
    ret = callback()
  File "/home/mrocklin/workspace/tornado/tornado/ioloop.py", line 765, in _discard_future_result
    future.result()
  File "/home/mrocklin/workspace/distributed/distributed/worker.py", line 942, in heartbeat
    response = await retry_operation(
  File "/home/mrocklin/workspace/distributed/distributed/utils_comm.py", line 384, in retry_operation
    return await retry(
  File "/home/mrocklin/workspace/distributed/distributed/utils_comm.py", line 369, in retry
    return await coro()
  File "/home/mrocklin/workspace/distributed/distributed/core.py", line 861, in send_recv_from_rpc
    result = await send_recv(comm=comm, op=key, **kwargs)
  File "/home/mrocklin/workspace/distributed/distributed/core.py", line 660, in send_recv
    raise exc.with_traceback(tb)
  File "/home/mrocklin/workspace/distributed/distributed/core.py", line 496, in handle_comm
    result = handler(comm, **msg)
  File "/home/mrocklin/workspace/distributed/distributed/scheduler.py", line 3653, in heartbeat_worker
    ws._executing = {
  File "/home/mrocklin/workspace/distributed/distributed/scheduler.py", line 3654, in <dictcomp>
    parent._tasks[key]: duration for key, duration in executing.items()
KeyError: "('random_sample-qr-2c558c0a248d6692297cf169f1415126', 921, 0)"

The actual computation I was running was the following, but I suspect that this won't be hard to reproduce.

from dask.distributed import Client
client = Client(n_workers=20)

import dask.array as da, dask

# start one
x = da.random.random((10000000, 100), chunks=(10000, None))
u, s, v = da.linalg.svd(x)
u, s, v = dask.persist(u, s, v)

import time
time.sleep(3)  # This is just a guess, untested

# then decide to change it a bit and rerun
x = da.random.random((2000000, 100), chunks=(10000, None))
u, s, v = da.linalg.svd(x)
u, s, v = dask.persist(u, s, v)

cc @gforsyth @fjetter

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions