LID: introduce Coprocessor #8616

liguozhong · 2023-02-24T08:42:31Z

What this PR does / why we need it:
[new feature] introduce loki Coprocessor querier pre query

Which issue(s) this PR fixes:
Fixes #8568 AND #8568

Special notes for your reviewer:

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

jeschkies

Thanks for your contribution. I like the idea overall but have some open question. Also, this use case is very narrow. I wonder if there's another solution.

jeschkies · 2023-02-24T13:58:53Z

docs/sources/lids/0003-Coprocessor.md

+
+So we hope to introduce some auxiliary abilities to solve this "7d-10m" invalid search.
+
+We have checked that in the database field, such feature have been implemented very maturely.


Where in the database? Which database do you mean?

jeschkies · 2023-02-24T14:00:44Z

docs/sources/lids/0003-Coprocessor.md

+traceID Coprocessor 1 simple text analysis :
+if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be
+similar to AWS X-Ray (use timestamp )#1947》), this type of traceID information has a timestamp,
+and Coprocessor can specify a trace to execute the longest duration to cooperate


Ah. So the coprocessor parses the timestamp. Did you try doing it in LogQL?

jeschkies · 2023-02-24T14:03:44Z

docs/sources/lids/0003-Coprocessor.md

+but because users do not know traceID start time and end time , they usually search for 7 day log. 
+In fact having a time range of "7d-10m" is an invalid search.
+
+## Goals


If I understood the HBase use case they basically created a plug-in system with some hooks. This seems to be the goal here. How would we implement it? How are we shipping the data? We could use Hashicorp's go-plug but there's probably s serialization overhead. As I understood the HBase design the coprocessor is started on the host where the data is an this can read the data directly. Not sure we can do it here.

If I understood the HBade use case they basically created a plug-in system with some hooks.

yes.

How would we implement it? How are we shipping the data? We could use Hashicorp's go-plug but there's probably s serialization overhead

My suggestion is that we should not restrict the language. Through http+protobuf, more developer can work together with loki.
Because prometheus remote read and k8s hpa implement the plugin mechanism through http+protobuf, this has achieved great success.
The implementation of HBase limits the jvm language so that the coprocessor can only be implemented in languages such as java or scala. I don't think it's a good idea.

Hashicorp's go-plug

Will this limit the development language?

Will this limit the development language?

No, they also use gRPC but make it a little simpler for plugins in Go.

It would be great if there is no restriction on the implementation language.
Can we refer to open telemetry’s collector that supports both http and grpc protocols?
The SREs I know actually don’t know much about grpc, but they can easily build an http server .

ex: 😱
protoc -I ./vendor/github.com/gogo/protobuf:./vendor:./pkg/logqlmodel/stats:./pkg/logproto --gogoslick_out=plugins=grpc,Mgoogle/protobuf/any.proto=github.com/gogo/protobuf/types,:pkg/logproto/ pkg/logproto/logproto.proto -I ./

I don't think the plug-in is restricted to Go. One could use gRPC.

cyriltovena · 2023-03-02T12:43:07Z

docs/sources/lids/0003-Coprocessor.md

+We try to implement two types of Coprocessors in this scenario.
+
+traceID Coprocessor 1 simple text analysis :
+if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be


So this works only if we're using XRay format ?

Yes, our team's exploration is only for this purpose now.

owen-d

I think this is a pretty interesting idea, thanks for opening the issue. We've also been thinking about the "find a traceID over a large time range" use case, which is a common ask for people migrating to Loki from other, more heavily index based projects. I've been considering ways to solve it within the project, but this is another very interesting approach. To be honest, I'm not sure how I feel yet. I see our current options as:

Do nothing
Implement key-value lookups within Loki

Benefits: Less complex for users who want key-value lookups in loki.
Costs: More complexity (code & configuration) in the base system. More surface area in the project that not all users may use. Not sure how to do it (yet).

Implement Coprocessor support in Loki.

Benefits: Loki doesn't need to solve each use case explicitly. Other use cases can be added. Loki can be extended in arbitrary ways.
Costs: External use cases require a lot more investment, both operationally (running plugins) and development (building plugins). Loki now has arbitrary dependencies. Reliability concerns with external plugins, etc.

Something else?

Sorry I don't have an immediate answer, I need to think about this more.

As for the method signature you proposed, what about something like:

PreQuery(params logql.Params) (mutatedParams logql.Params, shouldQuery bool)

This would allow loki to choose how to change the query that's executed (for instance by reducing the time range to just the 10m that contains the trace) as well as choose to halt the query.

liguozhong · 2023-03-03T06:50:04Z

thanks，Very detailed answer, if loki wants to directly solve the traceID search problem in the future.

I suggest not to consider introducing a similar mechanism(Coprocessor or plugin) in the past two years.This will disperse the energy of the community.

In the future, there may be duplication of feature, and it will be a very difficult time to choose one of the two feature to delete.

liguozhong · 2023-03-03T06:53:16Z

I have seen tempo's traceID related code before, which is to build a bloomfilter for each block when writing, and quickly judge whether there is traceID in the block through bloomfilter when querying a traceID.

owen-d · 2023-03-03T17:20:20Z

I have seen tempo's traceID related code before, which is to build a bloomfilter for each block when writing, and quickly judge whether there is traceID in the block through bloomfilter when querying a traceID.

Ha, I've been thinking about this exact same thing :)

slim-bean · 2023-03-15T00:59:54Z

@liguozhong thank you for this idea and discussion!

There is something here, I don't know what it will look like though.

Solving for the X-Ray trace ID like this is very clever.

We have not given up on this idea yet.

sandstrom · 2023-04-28T11:42:31Z

@slim-bean Any estimate on a time frame in which you are planning to tackle this?

We have this problem with our Loki setup, where basically 90% of queries are for a request-id (unique id shared by some ~10-100 lines that all were part of the same request) or similar.

It's a pain point and we're sometimes regretting that we didn't stick with ELK, though Loki also have a lot of nice properties that we really like.

Wild thought

Maybe bloom filters could be used? They have false positives, but not false negatives. A request-id stored in a bloom filter would at worst cause loki to grab an extra chunk or two that it didn't have to look in. Maybe not so bad.

valyala · 2023-07-13T04:47:35Z

FYI, VictoriaLogs supports high-cardinality labels - they just don't go into stream labels, so they do not lead to high cardinality issues such as high memory usage, OOM or significant slowdown. It supports fast search over such labels with label_name:value query syntax - see these docs for details.

It would be great if Loki will support high-cardinality labels in a similar fashion too. See more details on the underlying storage architecture of VictoriaLogs here.

sandstrom · 2023-09-06T16:54:29Z

@slim-bean @owen-d Congrats on the 2.9 release! 🎉

Just wanted to check in on this issue and the related High Cardinality labels and see if you have any plans in that area?

We've migrated off ElasticSearch a while ago. Overall happy, but even with our modest data, finding these 'needle in the haystack' type of things (request ids and trace ids) is an area where Loki could improve.

liguozhong · 2023-09-07T09:47:07Z

90% of our logs are traceID and requestID logs. We solved the problem by changing requestID to traceID.

LID: introduce Coprocessor

ee0f1af

liguozhong requested a review from JStickler as a code owner February 24, 2023 08:42

pull-request-size bot added the size/L label Feb 24, 2023

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 24, 2023

This was referenced Feb 24, 2023

[new feature] introduce loki Coprocessor querier pre query ,And provider a golang demo XRayCoprocessor. #8568

Open

[new feature] introduce Coprocessor #8559

Open

jeschkies reviewed Feb 24, 2023

View reviewed changes

cyriltovena reviewed Mar 2, 2023

View reviewed changes

owen-d reviewed Mar 2, 2023

View reviewed changes

liguozhong closed this Mar 13, 2023

sandstrom mentioned this pull request Apr 28, 2023

High cardinality labels #91

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LID: introduce Coprocessor #8616

LID: introduce Coprocessor #8616

liguozhong commented Feb 24, 2023

jeschkies left a comment

jeschkies Feb 24, 2023

jeschkies Feb 24, 2023

jeschkies Feb 24, 2023 •

edited

Loading

liguozhong Feb 24, 2023

jeschkies Feb 24, 2023

liguozhong Feb 28, 2023 •

edited

Loading

jeschkies Feb 28, 2023

liguozhong Mar 1, 2023

cyriltovena Mar 2, 2023

liguozhong Mar 3, 2023

owen-d left a comment •

edited

Loading

liguozhong commented Mar 3, 2023 •

edited

Loading

liguozhong commented Mar 3, 2023

owen-d commented Mar 3, 2023

slim-bean commented Mar 15, 2023

sandstrom commented Apr 28, 2023 •

edited

Loading

valyala commented Jul 13, 2023

sandstrom commented Sep 6, 2023

liguozhong commented Sep 7, 2023


		So we hope to introduce some auxiliary abilities to solve this "7d-10m" invalid search.

		We have checked that in the database field, such feature have been implemented very maturely.

LID: introduce Coprocessor #8616

LID: introduce Coprocessor #8616

Conversation

liguozhong commented Feb 24, 2023

jeschkies left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jeschkies Feb 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liguozhong Feb 28, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owen-d left a comment • edited Loading

Choose a reason for hiding this comment

liguozhong commented Mar 3, 2023 • edited Loading

liguozhong commented Mar 3, 2023

owen-d commented Mar 3, 2023

slim-bean commented Mar 15, 2023

sandstrom commented Apr 28, 2023 • edited Loading

Wild thought

valyala commented Jul 13, 2023

sandstrom commented Sep 6, 2023

liguozhong commented Sep 7, 2023

jeschkies Feb 24, 2023 •

edited

Loading

liguozhong Feb 28, 2023 •

edited

Loading

owen-d left a comment •

edited

Loading

liguozhong commented Mar 3, 2023 •

edited

Loading

sandstrom commented Apr 28, 2023 •

edited

Loading