Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[carnot] Implement execution of partial aggregates. #1440

Open
4 tasks done
JamesMBartlett opened this issue Jun 6, 2023 · 2 comments · Fixed by #1676
Open
4 tasks done

[carnot] Implement execution of partial aggregates. #1440

JamesMBartlett opened this issue Jun 6, 2023 · 2 comments · Fixed by #1676
Assignees

Comments

@JamesMBartlett
Copy link
Member

JamesMBartlett commented Jun 6, 2023

Describe the Task
We already have support for planning queries with partial aggregates (Partial aggregates means some of the aggregation is done on the agent nodes before sending the partially aggregated results to Kelvin for final aggregation). However, the query execution engine doesn't yet support executing partial aggregate nodes. This issue tracks the implementation on the execution side.

Subtasks

  • Add UDADefinition accessors for Serialize/Deserialize methods so that the execution side can access them.
  • Fix the compiler's output schema for partial aggregates. Currently, the compiler outputs a single column for all partial aggregates, but its much more convenient from the execution's point of view for there to be one column for each UDA executed.
  • Implement the deserialization, merging, and serialization in the agg node.
  • Finally, enable partial aggregates on the planner side.
@JamesMBartlett JamesMBartlett changed the title Implement execution of partial aggregates. [carnot] Implement execution of partial aggregates. Jun 6, 2023
@JamesMBartlett JamesMBartlett self-assigned this Jun 6, 2023
JamesMBartlett added a commit that referenced this issue Jun 7, 2023
Summary: In preparation for implementing partial aggregates on the exec
side, this PR adds Serialize/Deserialize methods to the UDADefinition.
This will allow the exec side to execute udas' Serialize and Deserialize
methods in a type erased way, just as we do for Update, Merge, Finalize,
etc.

Relevant Issues: #1440

Type of change: /kind cleanup

Test Plan: Added a test for the SerializeArrow and Deserialize methods.
Also tested as part of broader partial aggregate implmentation.

Signed-off-by: James Bartlett <jamesbartlett@pixielabs.ai>
JamesMBartlett added a commit that referenced this issue Jun 7, 2023
Summary: Prior to this change, the compiler would output a single column
called `serialized_expressions` for any partial agg. This isn't
particularly convenient for the execution side. Instead, the compiler
will now output a column per UDA involved in the agg node. For example,
if you have an agg node that has a `min` and `max` output the partial
agg will now output two columns (plus any group columns) called
`serialized_min` and `serialized_max`, where before it would output an
unspecified `serialized_expressions` column.

Relevant Issues: #1440

Type of change: /kind cleanup

Test Plan: Modified existing tests to test the new behaviour. Also
tested as part of broader partial aggregate changes.

Signed-off-by: James Bartlett <jamesbartlett@pixielabs.ai>
vihangm pushed a commit that referenced this issue Jun 13, 2023
Summary: Implements support for executing partial aggregates in the
AggNode. The planner still has partial aggs disabled. A separate PR will
flip the switch on the planner side.

Relevant Issues: #1440

Type of change: /kind feature

Test Plan: Added tests in `agg_node_test` that exercise the partial
aggregate code path. Also, tested on a cluster with partial aggregates
enabled on the planner side.

Signed-off-by: James Bartlett <jamesbartlett@pixielabs.ai>
JamesMBartlett added a commit that referenced this issue Aug 16, 2023
Summary: Flips the switch to enable partial aggregates. This should
improve query performance.

Relevant Issues: Fixes #1440

Type of change: /kind feature

Test Plan: Tested by skaffolding to a cluster. Saw that `px/cluster` and
other scripts return reasonable results. Compared the node CPU % from
`px/cluster` (calculated through an aggregate) to the actual usage on
the node. Also, used `px:set explain=true` `px:set analyze=true` to
check that the query plan was using partial aggregates.

Changelog Message:
```release-note
Queries now support partial aggregates which should improve query performance significantly in some cases.
```

Signed-off-by: James Bartlett <jamesbartlett@pixielabs.ai>
@ddelnano ddelnano reopened this Aug 30, 2023
@ddelnano
Copy link
Member

Reopening since this was reverted in #1692.

@ddelnano
Copy link
Member

ddelnano commented Oct 2, 2023

The partial aggregate change was re-enabled in #1702, however, when testing the next release I noticed that the flame graph script was still broken. #1721 will be reverting this change again.

vihangm pushed a commit that referenced this issue Oct 3, 2023
Summary: Revert partial aggregate opt in

This change reverts the partial aggregate opt in from
2645963. During the testing of
0.14.7-pre-main.0, I noticed that the flame graphs generated were
lacking the typical towers seen.

<img width="1728" alt="Java flame graph on pre-release"
src="https://github.com/pixie-io/pixie/assets/5855593/5e1f15f0-97ce-4263-a418-9986f675fa55">
<img width="1679" alt="flame graph on pre-release"
src="https://github.com/pixie-io/pixie/assets/5855593/f2b7d988-471c-48e0-bed1-339896574940">

Relevant Issues: #1440

Type of change: /kind bug

Test Plan: `skaffold`ed the changes and verified that the flame graph
looks correct now (as seen below)
<img width="1718" alt="flame graph with partial agg revert"
src="https://github.com/pixie-io/pixie/assets/5855593/f6c947fe-1930-4bc5-bdb9-89f33cf0e19a">

Signed-off-by: Dom Del Nano <ddelnano@pixielabs.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants