Skip to content

[Feature] Implement pre-aggregation on data nodes #13291

@hanahmily

Description

@hanahmily

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

Problem

Currently, raw data points are transported to the liaison node for deduplication and aggregation from multiple replicas. This approach creates performance bottlenecks, as all raw data must be transferred over the network before any processing occurs, resulting in increased latency and network overhead.

Proposed Solution

Implement a pre-aggregation mechanism on data nodes that selects all replicas to perform initial aggregation before sending results to the liaison node. This will significantly reduce the amount of data transferred and improve overall query performance.

Implementation Requirements

All Replica Selection:

  • Ensure the same replica is consistently chosen as the default result.
  • Handle replica availability and failover scenarios gracefully

Pre-aggregation on Data Nodes:

  • Implement aggregation logic on the selected primary replica
  • Support common aggregation operations (sum, count, mean, min, max, etc.)
  • Ensure partial aggregation results can be properly combined at the liaison node
  • Maintain compatibility with existing deduplication mechanisms

Use case

No response

Related issues

No response

Are you willing to submit a pull request to implement this on your own?

  • Yes I am willing to submit a pull request on my own!

Code of Conduct

Sub-issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    databaseBanyanDB - SkyWalking native databaseenhancementEnhancement on performance or codes

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions