Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve grouping performance via better vectorization in accumulate functions #6954

Closed
wants to merge 4 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Jul 13, 2023

Draft as I am running performance tests and will clean up the code a bit if possible.

Which issue does this PR close?

Part of #6889

This is a follow on to #6904 where some last minute changes (to track null state correctly) decreased performance slightly -- I am trying to get some back

Rationale for this change

I am trying to help the compiler to vectorize the inner loops of the accumulators. Let's do this by doing the minimal work as possible

What changes are included in this PR?

Update null state with a separate loop

Are these changes tested?

covered by existing tests

Are there any user-facing changes?

@github-actions github-actions bot added the physical-expr Physical Expressions label Jul 13, 2023
value_fn(group_index, new_value);
}
// update seen values in separate loop
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is to use a separate loop

Note we can skip this update entirely once all the groups have been seen -- I will explore if I can figure out how to do that next.

value_fn(group_index, new_value);
}
// update seen values in separate loop
for &group_index in group_indices.iter() {
seen_values.set_bit(group_index, true);
Copy link
Contributor

@Dandandan Dandandan Jul 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we assume seen_values is true for every group_index for "no nulls, no filter" until we observed a first null-value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an excellent point. I will try and take that into account

@alamb
Copy link
Contributor Author

alamb commented Jul 13, 2023

af3f62b shows a little benefit bit maybe in the noise range

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_faster_grouping ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  560.29ms │              533.10ms │     no change │
│ QQuery 2     │  174.72ms │              162.55ms │ +1.07x faster │
│ QQuery 3     │  167.44ms │              161.30ms │     no change │
│ QQuery 4     │  119.56ms │              117.52ms │     no change │
│ QQuery 5     │  394.33ms │              400.22ms │     no change │
│ QQuery 6     │   45.54ms │               40.08ms │ +1.14x faster │
│ QQuery 7     │  849.06ms │              837.12ms │     no change │
│ QQuery 8     │  258.03ms │              247.16ms │     no change │
│ QQuery 9     │  595.82ms │              566.07ms │     no change │
│ QQuery 10    │  341.31ms │              337.04ms │     no change │
│ QQuery 11    │  170.28ms │              159.25ms │ +1.07x faster │
│ QQuery 12    │  175.19ms │              173.80ms │     no change │
│ QQuery 13    │  331.46ms │              315.21ms │     no change │
│ QQuery 14    │   48.67ms │               49.25ms │     no change │
│ QQuery 15    │   59.04ms │               50.88ms │ +1.16x faster │
│ QQuery 16    │  164.45ms │              164.20ms │     no change │
│ QQuery 17    │  966.84ms │              873.41ms │ +1.11x faster │
│ QQuery 18    │ 1676.43ms │             1637.90ms │     no change │
│ QQuery 19    │  166.95ms │              167.24ms │     no change │
│ QQuery 20    │  313.92ms │              313.73ms │     no change │
│ QQuery 21    │ 1062.69ms │             1060.65ms │     no change │
│ QQuery 22    │   88.17ms │               93.33ms │  1.06x slower │
└──────────────┴───────────┴───────────────────────┴───────────────┘

@alamb
Copy link
Contributor Author

alamb commented Jul 13, 2023

I am now benchmarking the special case of "don't track nulls they can't possibly be present

@alamb alamb changed the title Allow better vectorization in accumulate functions Improve grouping performance via better vectorization in accumulate functions Jul 14, 2023
@alamb
Copy link
Contributor Author

alamb commented Jul 14, 2023

Special casing the nulls seems to have helped a little, though the fact that some queries now report slower is surprising to me

┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ main_base ┃ alamb_faster_grouping ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │  574.53ms │              522.95ms │ +1.10x faster │
│ QQuery 2     │  168.33ms │              152.40ms │ +1.10x faster │
│ QQuery 3     │  173.45ms │              182.54ms │  1.05x slower │
│ QQuery 4     │  116.60ms │              116.16ms │     no change │
│ QQuery 5     │  402.46ms │              405.81ms │     no change │
│ QQuery 6     │   45.04ms │               41.94ms │ +1.07x faster │
│ QQuery 7     │  922.93ms │              916.25ms │     no change │
│ QQuery 8     │  255.73ms │              240.93ms │ +1.06x faster │
│ QQuery 9     │  577.15ms │              554.53ms │     no change │
│ QQuery 10    │  318.40ms │              330.10ms │     no change │
│ QQuery 11    │  167.56ms │              172.50ms │     no change │
│ QQuery 12    │  174.26ms │              177.85ms │     no change │
│ QQuery 13    │  324.26ms │              320.80ms │     no change │
│ QQuery 14    │   48.11ms │               48.24ms │     no change │
│ QQuery 15    │   54.18ms │               57.05ms │  1.05x slower │
│ QQuery 16    │  164.83ms │              168.32ms │     no change │
│ QQuery 17    │  897.29ms │              877.01ms │     no change │
│ QQuery 18    │ 1672.32ms │             1675.61ms │     no change │
│ QQuery 19    │  179.13ms │              166.64ms │ +1.07x faster │
│ QQuery 20    │  324.28ms │              324.27ms │     no change │
│ QQuery 21    │ 1147.35ms │             1177.47ms │     no change │
│ QQuery 22    │   93.86ms │               91.11ms │     no change │

@Dandandan
Copy link
Contributor

Noise maybe? Your previous run seems to have some noisy results as well.
I only get relatively relatively stable perf results by closing all browsers / IDEs etc.

@alamb
Copy link
Contributor Author

alamb commented Jul 14, 2023

Noise maybe? Your previous run seems to have some noisy results as well.
I only get relatively relatively stable perf results by closing all browsers / IDEs etc.

Yeah, I am trying to control for noise by running these experiments on a GCP machine (not my laptop) 🤔

@alamb
Copy link
Contributor Author

alamb commented Jul 24, 2023

Tracking this basic idea in #7066 , I don't plan ti pursue this PR anymore for now

@alamb alamb closed this Jul 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants