Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When sorting update commands add Deleted + Inserted number of edges instead of Deleted * Inserted #26082

Merged
merged 1 commit into from
Sep 20, 2021

Conversation

AndriySvyryd
Copy link
Member

@AndriySvyryd AndriySvyryd commented Sep 17, 2021

Description

#25952 decreased throughput in mixed update scenario by about 20-40%. These changes mostly remove this degradation:

  • When sorting update commands add Deleted + Inserted number of edges instead of Deleted * Inserted
  • Don't allocate a collection for edges if there's only one edge between given two vertices
  • Reuse the Multigraph instance
  • Change the benchmark to reuse the context instance to simulate pooling behavior
Before #25952
Method Async Batching Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
UpdatePipeline False False 122.4 ms 1.78 ms 1.49 ms 8.169 1000.0000 - - 11.31 MB
UpdatePipeline False True 227.3 ms 4.48 ms 5.16 ms 4.400 2000.0000 - - 15.33 MB
UpdatePipeline True False 129.5 ms 1.35 ms 1.20 ms 7.725 1000.0000 - - 11.53 MB
UpdatePipeline True True 243.4 ms 1.13 ms 0.94 ms 4.109 2000.0000 - - 17.51 MB
After #25952
Method Async Batching Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
UpdatePipeline False False 201.0 ms 3.80 ms 5.08 ms 4.975 5000.0000 3000.0000 1000.0000 32.94 MB
UpdatePipeline False True 279.2 ms 4.44 ms 3.93 ms 3.581 5000.0000 3000.0000 1000.0000 36.96 MB
UpdatePipeline True False 196.6 ms 3.72 ms 4.43 ms 5.087 5000.0000 3000.0000 1000.0000 33.15 MB
UpdatePipeline True True 370.5 ms 7.09 ms 12.79 ms 2.699 6000.0000 4000.0000 1000.0000 39.16 MB
After this fix
Method Async Batching Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
UpdatePipeline False False 125.8 ms 2.51 ms 6.44 ms 7.950 1000.0000 - - 11.08 MB
UpdatePipeline False True 226.3 ms 3.06 ms 2.71 ms 4.420 2000.0000 1000.0000 - 15.11 MB
UpdatePipeline True False 126.8 ms 2.42 ms 2.69 ms 7.883 1000.0000 - - 11.29 MB
UpdatePipeline True True 277.9 ms 5.54 ms 5.93 ms 3.599 2000.0000 1000.0000 - 17.29 MB

Customer Impact

SaveChanges with mixed deletes and inserts is significantly slower.

How found

Automated benchmarks

Test coverage

Already covered

Regression?

Yes, from 6.0-rc1

Risk

Low, no functional changes.

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

But can we do a run with context pooling off (as before) to separate perf gains from actual the actual product changes below and perf gains simply from turning on pooling?

{
return edgeList;
return edges is IEnumerable<TEdge> edgeList ? edgeList : (new TEdge[] { (TEdge)edges! });
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would allocate more if GetEdges is called multiple times for the same two vertices - is this intentional (e.g. because we never do that?).

We can also continue returnining a List<TEdge> to keep the typing simpler - the perf difference between a bare array and a List<T> is very unlikely to matter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would allocate more if GetEdges is called multiple times for the same two vertices - is this intentional (e.g. because we never do that?).

Yes, it's only called during cycle breaking

We can also continue returnining a List to keep the typing simpler - the perf difference between a bare array and a List is very unlikely to matter.

Can you make it a suggestion? I don't understand how it's simpler

@AndriySvyryd
Copy link
Member Author

Turning pooling off doesn't have a significant effect on the results

Method Async Batching Mean Error StdDev Op/s Gen 0 Gen 1 Gen 2 Allocated
UpdatePipeline False False 135.2 ms 2.58 ms 3.07 ms 7.399 1000.0000 - - 11.19 MB
UpdatePipeline False True 205.2 ms 4.07 ms 3.61 ms 4.873 2000.0000 1000.0000 - 15.22 MB
UpdatePipeline True False 129.7 ms 2.56 ms 3.50 ms 7.713 1000.0000 - - 11.4 MB
UpdatePipeline True True 243.0 ms 3.73 ms 3.49 ms 4.116 2000.0000 1000.0000 - 17.4 MB

…nstead of Deleted * Inserted

Don't allocate a collection for edges if there's only one edge between given two vertices
Reuse the Multigraph instance
@roji
Copy link
Member

roji commented Sep 17, 2021

Great!

@AndriySvyryd
Copy link
Member Author

@Pilchie for approval

@Pilchie
Copy link
Member

Pilchie commented Sep 20, 2021

Approved for EFCore 6.

@AndriySvyryd AndriySvyryd changed the title [6.0] Fix update pipeline perf regression When sorting update commands add Deleted + Inserted number of edges instead of Deleted * Inserted Sep 20, 2021
@AndriySvyryd AndriySvyryd merged commit c6a9f1c into release/6.0 Sep 20, 2021
@AndriySvyryd AndriySvyryd deleted the FasterSave branch September 20, 2021 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants