Skip to content

Nested queries / subquery support for Loki #14018

Open

Description

Is your feature request related to a problem? Please describe.
We have a user that have a dashboard/panel that they want to show a Top 10 Errors by log hash ids. The panel should show:

  • Loghash
  • The category of the loghash
  • A random sampled content message of the lines with each loghash (each loghash may have many different log content lines, we just want to display any random one of each loghash)
  • The rate/s the loghashes occur

We do this currently by running two queries, query A which gets the top 10 loghashes by their rate

QueryA:
topk(10, sum(rate({servicename="$servicename", environment="$environment"}[$__interval] |= " WARN " |~ "$components" | regexp ".*(?P<category>($components)).*(?P<loghash>\\[[[:xdigit:]]+:[[:xdigit:]]+\\]) - (?P<content>.*)")) by (loghash))

Example result:
loghash           value
[d184e388:01f0]   13.0
[72d3a246:0e1b]   8.65
[771a605c:0235]   3.38
...

The panel then also runs query B which queries all matching loglines parsing out the content and loghash as labels

Query B:
{servicename="$servicename", environment="$environment"} |= " WARN " |~ "$components" | regexp ".*(?P<category>($components)).*(?P<loghash>\\[[[:xdigit:]]+:[[:xdigit:]]+\\]) - (?P<content>.*)"

Example result:
category     content              loghash
store        saving cart to db    [d184e388:01f0]
web          user viewing x       [ecbcfc1a:0094]
web          user put x in cart   [ecbcfc1a:0094]

Then we use panel transformations to merge the results of these two queries using an OUTER join.

This works if the time range it short enough to avoid query B hitting the line limit of 5000 lines.
This happens when the user increases the query window to be more than an hour, and that starts to create missing content of some of the rows. If the user goes to like 12h hours no content is returned because the 5000 lines returned did not include any of the top 10 loghashes.

To clarify Query B may return many lines for the same loghash, and we only care about one random line of them.

Describe the solution you'd like
I think in order to avoid the line limit for query B, we would want to limit query B to only fetch lines that actually contains the top 10 loghashes.

One thought I had was if loki supported some kind of subquery, were you could run an INNER query and use the results from the INNER query in the OUTER query. but I don't think Loki have that feature.

Another thought was to use a Loki recording rule to create a metric of the top 10 loghashes, but that could generate quite a bit of cardinality, and I would need to apply the recording rule on all our loki stacks (which we have many off), then use a dashboard variable to get the loghash label values from the metric.

The other option I thought of was to put query A in a dashboard variable, but the dashboard variables for loki datasources only support basic label selector filter and not line filters etc. Once we have the loghashes as a variable we would need to use that variable to narrow down query B to only fetch the needed loghashes.

Describe alternatives you've considered
Already covered by the above section.

Additional context
I would prefer to avoid the recording rule route as it would require copying the recording rule out to many stacks. It would also add extra cost for the metrics.
The user wants to be able to query at least the last 24hours of the dashboard, and based on what I seen they would currently need a line limit of about 50 000 lines, but increasing the line limit would not be a good fix either since that would depend on how many logs were generated in the selected time frame which can vary with usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    type/featureSomething new we should do

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions