Performance regressions after queuing PR

https://github.com/dask/distributed/pull/6614 causes a significant performance improvement to `test_vorticity`: 
![image](https://user-images.githubusercontent.com/3309802/187816080-dc0994c6-3903-4172-a215-248f0fc85ba9.png)

And a significant performance regression to `test_dataframe_align`: 
![image](https://user-images.githubusercontent.com/3309802/187816116-a0c1f165-08e9-499b-91fa-47774cfa42e2.png)

I think the key change is fixing https://github.com/dask/distributed/issues/6597.

`test_vorticity` comes from https://github.com/dask/distributed/issues/6571. This workload is the one where we discovered the co-assignment bug that's now fixed. So it's encouraging (and not surprising) that improving co-assignment significantly reduces transfers, and improves performance.

`test_dataframe_align` is a bit surprising. You'd think that assigning matching partitions of the two input dataframes to the same worker would reduce downstream transfers—which it indeed does.

Worth noting: in my [original benchmarks](https://github.com/dask/distributed/issues/6560#issuecomment-1165215417), `test_dataframe_align` was probably the most affected by root task overproduction out of everything I tested:
![](https://user-images.githubusercontent.com/3309802/183553333-09060399-988a-4457-af4d-4a7b158d5cd5.png)

I ran it manually a few times both before and after the queuing PR. I also tried turning work stealing off, but it didn't make a difference.

### Before
![before](https://user-images.githubusercontent.com/3309802/187818808-2d697e3b-032b-4e63-997d-e99dbcb1172c.gif)
### After
![after](https://user-images.githubusercontent.com/3309802/187818817-a2abbe99-0af7-4f9f-bac2-4b758e04beb9.gif)

If you stare at these GIFs for a while, you can notice that:
1. In "after", progress bars 3, 4, and 5 take longer to start, then accelerate faster. In "before", it feels like there's more of a linear lag between the top two bars and the next three.
2. The memory usage peaks higher "after" (we already knew this from the benchmark)

Here's a comparison between the two dashboards at the same point through the graph (785 `make-timeseries` tasks complete):

![comparison2](https://user-images.githubusercontent.com/3309802/187819526-a45de682-e7ce-413a-ad9e-837e1b29feb5.png)

This is also a critical point, because it's when we start spilling to disk in the "after" case. The "before" case never spills to disk.

You can see that:
1. There were no transfers in the "after" case. Before, we had lots of transfers. So improved co-assignment is working correctly.
1. There are way more `repartition-merge` tasks in memory (top progress bar) in the "after" case. Nearly every executed task is in memory. Compare that to the before case, where half have already been released from memory.
1. ...because there are way _fewer_ `sub`, `dataframe-count`, and `dataframe-sum` executed. These are the data-consuming tasks.
    1. A bunch of `sub`s have just been scheduled. A moment before, none of those were in processing. So sadly, they arrive just a moment too late to prevent the workers from spilling to disk.

The simplest hypothesis is that the `repartition-merge`s are completing way faster now, since they don't have to transfer data. Maybe that increase in speed gives them the chance to run further ahead of the scheduler before it can submit the `sub` tasks? This pushes the workers over the spill threshold, so then everything slows down.

I know I tend to blame things on root task overproduction, so I'm trying hard to come up with some other explanation here. But it does feel a bit like, because there's no data transfer holding the tasks back anymore, they are able to complete faster than the scheduler is able to schedule data-consuming tasks.

What's confusing is just that `repartition-merge` isn't a root task—and we see relatively few `make-timeseries` or `repartition-split`s hanging around in memory. So why is it that scheduling `sub`s lags behind more?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance regressions after queuing PR #295

Before

After

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance regressions after queuing PR #295

Description

Before

After

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions