-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LID: introduce Coprocessor #8616
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution. I like the idea overall but have some open question. Also, this use case is very narrow. I wonder if there's another solution.
|
||
So we hope to introduce some auxiliary abilities to solve this "7d-10m" invalid search. | ||
|
||
We have checked that in the database field, such feature have been implemented very maturely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where in the database? Which database do you mean?
traceID Coprocessor 1 simple text analysis : | ||
if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be | ||
similar to AWS X-Ray (use timestamp )#1947》), this type of traceID information has a timestamp, | ||
and Coprocessor can specify a trace to execute the longest duration to cooperate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. So the coprocessor parses the timestamp. Did you try doing it in LogQL?
but because users do not know traceID start time and end time , they usually search for 7 day log. | ||
In fact having a time range of "7d-10m" is an invalid search. | ||
|
||
## Goals |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood the HBase use case they basically created a plug-in system with some hooks. This seems to be the goal here. How would we implement it? How are we shipping the data? We could use Hashicorp's go-plug but there's probably s serialization overhead. As I understood the HBase design the coprocessor is started on the host where the data is an this can read the data directly. Not sure we can do it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understood the HBade use case they basically created a plug-in system with some hooks.
yes.
How would we implement it? How are we shipping the data? We could use Hashicorp's go-plug but there's probably s serialization overhead
My suggestion is that we should not restrict the language. Through http+protobuf, more developer can work together with loki.
Because prometheus remote read and k8s hpa implement the plugin mechanism through http+protobuf, this has achieved great success.
The implementation of HBase limits the jvm language so that the coprocessor can only be implemented in languages such as java or scala. I don't think it's a good idea.
Hashicorp's go-plug
Will this limit the development language?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this limit the development language?
No, they also use gRPC but make it a little simpler for plugins in Go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if there is no restriction on the implementation language.
Can we refer to open telemetry’s collector that supports both http and grpc protocols?
The SREs I know actually don’t know much about grpc, but they can easily build an http server .
ex: 😱
protoc -I ./vendor/github.com/gogo/protobuf:./vendor:./pkg/logqlmodel/stats:./pkg/logproto --gogoslick_out=plugins=grpc,Mgoogle/protobuf/any.proto=github.com/gogo/protobuf/types,:pkg/logproto/ pkg/logproto/logproto.proto -I ./
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the plug-in is restricted to Go. One could use gRPC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great
We try to implement two types of Coprocessors in this scenario. | ||
|
||
traceID Coprocessor 1 simple text analysis : | ||
if traceID is traceID from XRay or openTelemtry (《Change default trace-id format to be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this works only if we're using XRay format ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, our team's exploration is only for this purpose now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a pretty interesting idea, thanks for opening the issue. We've also been thinking about the "find a traceID over a large time range" use case, which is a common ask for people migrating to Loki from other, more heavily index based projects. I've been considering ways to solve it within the project, but this is another very interesting approach. To be honest, I'm not sure how I feel yet. I see our current options as:
- Do nothing
- Implement key-value lookups within Loki
- Benefits: Less complex for users who want key-value lookups in loki.
- Costs: More complexity (code & configuration) in the base system. More surface area in the project that not all users may use. Not sure how to do it (yet).
- Implement Coprocessor support in Loki.
- Benefits: Loki doesn't need to solve each use case explicitly. Other use cases can be added. Loki can be extended in arbitrary ways.
- Costs: External use cases require a lot more investment, both operationally (running plugins) and development (building plugins). Loki now has arbitrary dependencies. Reliability concerns with external plugins, etc.
- Something else?
Sorry I don't have an immediate answer, I need to think about this more.
As for the method signature you proposed, what about something like:
PreQuery(params logql.Params) (mutatedParams logql.Params, shouldQuery bool)
This would allow loki to choose how to change the query that's executed (for instance by reducing the time range to just the 10m
that contains the trace) as well as choose to halt the query.
thanks,Very detailed answer, if loki wants to directly solve the traceID search problem in the future. I suggest not to consider introducing a similar mechanism(Coprocessor or plugin) in the past two years.This will disperse the energy of the community. In the future, there may be duplication of feature, and it will be a very difficult time to choose one of the two feature to delete. |
I have seen tempo's traceID related code before, which is to build a bloomfilter for each block when writing, and quickly judge whether there is traceID in the block through bloomfilter when querying a traceID. |
Ha, I've been thinking about this exact same thing :) |
@liguozhong thank you for this idea and discussion! There is something here, I don't know what it will look like though. Solving for the X-Ray trace ID like this is very clever. We have not given up on this idea yet. |
@slim-bean Any estimate on a time frame in which you are planning to tackle this? We have this problem with our Loki setup, where basically 90% of queries are for a request-id (unique id shared by some ~10-100 lines that all were part of the same request) or similar. It's a pain point and we're sometimes regretting that we didn't stick with ELK, though Loki also have a lot of nice properties that we really like. Wild thoughtMaybe bloom filters could be used? They have false positives, but not false negatives. A request-id stored in a bloom filter would at worst cause loki to grab an extra chunk or two that it didn't have to look in. Maybe not so bad. |
FYI, VictoriaLogs supports high-cardinality labels - they just don't go into stream labels, so they do not lead to high cardinality issues such as high memory usage, OOM or significant slowdown. It supports fast search over such labels with It would be great if Loki will support high-cardinality labels in a similar fashion too. See more details on the underlying storage architecture of VictoriaLogs here. |
@slim-bean @owen-d Congrats on the 2.9 release! 🎉 Just wanted to check in on this issue and the related High Cardinality labels and see if you have any plans in that area? We've migrated off ElasticSearch a while ago. Overall happy, but even with our modest data, finding these 'needle in the haystack' type of things (request ids and trace ids) is an area where Loki could improve. |
90% of our logs are traceID and requestID logs. We solved the problem by changing requestID to traceID. |
What this PR does / why we need it:
[new feature] introduce loki Coprocessor querier pre query
Which issue(s) this PR fixes:
Fixes #8568 AND #8568
Special notes for your reviewer:
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updateddocs/sources/upgrading/_index.md