Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Macro for aggregating per-second rates - need to simplified SQL query which allows get rates with dimensions but show only one time series with sum/avg #386

Closed
oplehto opened this issue Nov 23, 2021 · 6 comments · Fixed by #541
Assignees
Labels
p1 Priority 1
Milestone

Comments

@oplehto
Copy link

oplehto commented Nov 23, 2021

One common pattern in our metrics is that we want to calculate the rate of a large number of counters and then aggregate those series, typically using a sum. A good example of this would be the total transmit bandwidth for all interfaces grouped by datacenter.

There may be hundreds of series here so we really need to push down the query to ClickHouse so doing workarounds with transforms would not really help. It is doable but

I've included below the InfluxDB and ClickHouse versions of the raw query. Having a macro would simplify the CH version considerably. I was thinking this could be a $perSecond* variant with 2 extra parameters the aggregation function (typically sum) and the columns to aggregate over.

InfluxDB:

SELECT sum(x) FROM (SELECT non_negative_derivative(max("PortXmitData_bits"), 1s) AS x FROM "netstats" WHERE ("datacenter" =~ /^$datacenter$/) AND env = 'prod' AND $timeFilter GROUP BY time($__interval),host,interface) GROUP BY time($__interval) fill(null)

ClickHouse:

SELECT
    t,
    datacenter,
    groupArray((datacenter, max_0_Rate_aggregated)) AS groupArr 
FROM
(
  SELECT 
    t,
    datacenter,
    sum(max_0_Rate) AS max_0_Rate_aggregated
    FROM (
    SELECT
        t,
        datacenter,
        perSecondColumns,
        if(runningDifference(max_0) < 0, nan, runningDifference(max_0) / runningDifference(t / 1000)) AS max_0_Rate
    FROM
(
        SELECT
            $timeSeries as t,
            concat(host, interface) AS perSecondColumns,
            max(PortXmitData_bits) AS max_0,
            datacenter
        FROM default.netstats

        WHERE
            $timeFilter
            AND datacenter in ($datacenter) AND env = 'prod'
        GROUP BY
            t,
            datacenter,
            perSecondColumns
        ORDER BY
            perSecondColumns,
            datacenter,
            t
)
)
GROUP BY datacenter, t
ORDER BY datacenter, t
)
GROUP BY datacenter, t
ORDER BY datacenter, t
@Slach
Copy link
Collaborator

Slach commented Nov 23, 2021

Why $perSecondColumns is not applicable for you?

@oplehto
Copy link
Author

oplehto commented Nov 23, 2021

Can I do this with $perSecondColumns? I may have missed something.

I have a counter that has the identifying columns datacenter, host and interface. So I want to do the following:

  1. Rate of each counter (so GROUP BY all 3 columns)
  2. Sum of the rates grouped by datacenter

The first part I can do with $perSecondColumn resulting in 1 graph per counter but how do I do the second level aggregation to get the sum of the counters?

@Slach
Copy link
Collaborator

Slach commented Nov 24, 2021

How many time series do you need?
Is only one series per datacenter or one per datacenter plus one per datacenter+interface?

@oplehto
Copy link
Author

oplehto commented Nov 24, 2021

In this particular case the time series would be per datacenter but I have queries that I'm planning to migrate to ClickHouse where we need the top-level GROUP BY based on multiple columns.

@Slach Slach added this to the 2.5.0 milestone Nov 29, 2021
@Slach Slach self-assigned this Nov 29, 2021
@Slach
Copy link
Collaborator

Slach commented Jun 1, 2022

maybe GROUP BY ... WITH ROLLUP .. will good solution for your use case?

https://clickhouse.com/docs/en/sql-reference/statements/select/group-by/#with-rollup-modifier

@Slach Slach modified the milestones: 2.5.0, 2.6.0 Jun 1, 2022
@Slach Slach modified the milestones: 2.6.0, 3.0.0 Jan 9, 2024
@Slach Slach changed the title Macro for aggregating per-second rates Macro for aggregating per-second rates - need to simplified SQL query which allows get rates with dimensions but show only one time series with sum/avg Feb 28, 2024
@Slach
Copy link
Collaborator

Slach commented Feb 28, 2024

main PAIN

we need to simplify SQL query in ClickHouse or create macros like $aggregatedRate
which allow get analog of prometheus rate/increase function with dimension
but show only one time series in the result with sum/avg/min/max results of internal rate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
p1 Priority 1
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants