Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LID: introduce Coprocessor #8616

Closed
wants to merge 1 commit into from

Conversation

liguozhong
Copy link
Contributor

What this PR does / why we need it:
[new feature] introduce loki Coprocessor querier pre query

Which issue(s) this PR fixes:
Fixes #8568 AND #8568

Special notes for your reviewer:

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@github-actions github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 24, 2023
Copy link
Contributor

@jeschkies jeschkies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution. I like the idea overall but have some open question. Also, this use case is very narrow. I wonder if there's another solution.


So we hope to introduce some auxiliary abilities to solve this "7d-10m" invalid search.

We have checked that in the database field, such feature have been implemented very maturely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where in the database? Which database do you mean?

traceID Coprocessor 1 simple text analysis :
if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be
similar to AWS X-Ray (use timestamp )#1947》), this type of traceID information has a timestamp,
and Coprocessor can specify a trace to execute the longest duration to cooperate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. So the coprocessor parses the timestamp. Did you try doing it in LogQL?

but because users do not know traceID start time and end time , they usually search for 7 day log.
In fact having a time range of "7d-10m" is an invalid search.

## Goals
Copy link
Contributor

@jeschkies jeschkies Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the HBase use case they basically created a plug-in system with some hooks. This seems to be the goal here. How would we implement it? How are we shipping the data? We could use Hashicorp's go-plug but there's probably s serialization overhead. As I understood the HBase design the coprocessor is started on the host where the data is an this can read the data directly. Not sure we can do it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understood the HBade use case they basically created a plug-in system with some hooks.

yes.

How would we implement it? How are we shipping the data? We could use Hashicorp's go-plug but there's probably s serialization overhead

My suggestion is that we should not restrict the language. Through http+protobuf, more developer can work together with loki.
Because prometheus remote read and k8s hpa implement the plugin mechanism through http+protobuf, this has achieved great success.
The implementation of HBase limits the jvm language so that the coprocessor can only be implemented in languages such as java or scala. I don't think it's a good idea.

Hashicorp's go-plug

Will this limit the development language?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this limit the development language?

No, they also use gRPC but make it a little simpler for plugins in Go.

Copy link
Contributor Author

@liguozhong liguozhong Feb 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if there is no restriction on the implementation language.
Can we refer to open telemetry’s collector that supports both http and grpc protocols?
The SREs I know actually don’t know much about grpc, but they can easily build an http server .

ex: 😱
protoc -I ./vendor/github.com/gogo/protobuf:./vendor:./pkg/logqlmodel/stats:./pkg/logproto --gogoslick_out=plugins=grpc,Mgoogle/protobuf/any.proto=github.com/gogo/protobuf/types,:pkg/logproto/ pkg/logproto/logproto.proto -I ./

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the plug-in is restricted to Go. One could use gRPC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great

We try to implement two types of Coprocessors in this scenario.

traceID Coprocessor 1 simple text analysis :
if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this works only if we're using XRay format ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, our team's exploration is only for this purpose now.

Copy link
Member

@owen-d owen-d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a pretty interesting idea, thanks for opening the issue. We've also been thinking about the "find a traceID over a large time range" use case, which is a common ask for people migrating to Loki from other, more heavily index based projects. I've been considering ways to solve it within the project, but this is another very interesting approach. To be honest, I'm not sure how I feel yet. I see our current options as:

  1. Do nothing
  2. Implement key-value lookups within Loki
  • Benefits: Less complex for users who want key-value lookups in loki.
  • Costs: More complexity (code & configuration) in the base system. More surface area in the project that not all users may use. Not sure how to do it (yet).
  1. Implement Coprocessor support in Loki.
  • Benefits: Loki doesn't need to solve each use case explicitly. Other use cases can be added. Loki can be extended in arbitrary ways.
  • Costs: External use cases require a lot more investment, both operationally (running plugins) and development (building plugins). Loki now has arbitrary dependencies. Reliability concerns with external plugins, etc.
  1. Something else?

Sorry I don't have an immediate answer, I need to think about this more.

As for the method signature you proposed, what about something like:

PreQuery(params logql.Params) (mutatedParams logql.Params, shouldQuery bool)

This would allow loki to choose how to change the query that's executed (for instance by reducing the time range to just the 10m that contains the trace) as well as choose to halt the query.

@liguozhong
Copy link
Contributor Author

liguozhong commented Mar 3, 2023

thanks,Very detailed answer, if loki wants to directly solve the traceID search problem in the future.

I suggest not to consider introducing a similar mechanism(Coprocessor or plugin) in the past two years.This will disperse the energy of the community.

In the future, there may be duplication of feature, and it will be a very difficult time to choose one of the two feature to delete.

@liguozhong
Copy link
Contributor Author

I have seen tempo's traceID related code before, which is to build a bloomfilter for each block when writing, and quickly judge whether there is traceID in the block through bloomfilter when querying a traceID.

@owen-d
Copy link
Member

owen-d commented Mar 3, 2023

I have seen tempo's traceID related code before, which is to build a bloomfilter for each block when writing, and quickly judge whether there is traceID in the block through bloomfilter when querying a traceID.

Ha, I've been thinking about this exact same thing :)

@liguozhong liguozhong closed this Mar 13, 2023
@slim-bean
Copy link
Collaborator

@liguozhong thank you for this idea and discussion!

There is something here, I don't know what it will look like though.

Solving for the X-Ray trace ID like this is very clever.

We have not given up on this idea yet.

@sandstrom
Copy link

sandstrom commented Apr 28, 2023

@slim-bean Any estimate on a time frame in which you are planning to tackle this?

We have this problem with our Loki setup, where basically 90% of queries are for a request-id (unique id shared by some ~10-100 lines that all were part of the same request) or similar.

It's a pain point and we're sometimes regretting that we didn't stick with ELK, though Loki also have a lot of nice properties that we really like.

Wild thought

Maybe bloom filters could be used? They have false positives, but not false negatives. A request-id stored in a bloom filter would at worst cause loki to grab an extra chunk or two that it didn't have to look in. Maybe not so bad.

@valyala
Copy link

valyala commented Jul 13, 2023

FYI, VictoriaLogs supports high-cardinality labels - they just don't go into stream labels, so they do not lead to high cardinality issues such as high memory usage, OOM or significant slowdown. It supports fast search over such labels with label_name:value query syntax - see these docs for details.

It would be great if Loki will support high-cardinality labels in a similar fashion too. See more details on the underlying storage architecture of VictoriaLogs here.

@sandstrom
Copy link

@slim-bean @owen-d Congrats on the 2.9 release! 🎉

Just wanted to check in on this issue and the related High Cardinality labels and see if you have any plans in that area?

We've migrated off ElasticSearch a while ago. Overall happy, but even with our modest data, finding these 'needle in the haystack' type of things (request ids and trace ids) is an area where Loki could improve.

@liguozhong
Copy link
Contributor Author

90% of our logs are traceID and requestID logs. We solved the problem by changing requestID to traceID.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size/L type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants