-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logcli: Add parallel flags #7543
Conversation
./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0.1%
+ iter 0%
+ storage 0%
+ chunkenc 0%
- logql -0.4%
+ loki 0% |
@SN9NV I'm not sure we can accept this as is. The log lines will be returned out of order which will be very confusing for users. Is there a way you can add ordering? Also this needs rebasing. |
Hi, yeah, I know it would be confusing, that is why it specifically is mentioned in the help text that using the flag will result in the logs being out of order. |
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
One of the ways I thought we could add ordering is if we created an output file for each part. Then the user can order the logs using the filename. But I guess that would be some work. Adding new flags and stuff. |
./tools/diff_coverage.sh ../loki-target-branch/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell. + ingester 0%
+ distributor 0%
+ querier 0%
+ querier/queryrange 0%
+ iter 0%
+ storage 0%
+ chunkenc 0%
+ logql 0%
+ loki 0% |
I have pushed a new bit of functionality which allows the user to create output files, one for each partition range, then the user can order file files as they please. I think we can leave that up to them. Not so sure on the naming. I couldn't think of another tool which does this to get some naming inspiration. The reason for the prefix is so that you can place the files in a specific directory, or "label" a specific download set so many in the same directory don't collide. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the ist but I'd wie the results into one file. As I wouldn't expect all results to arrive at once you could use a channel to the writer that synchronizes the writes. Maybe a simple mutex will do.
pkg/logcli/query/query.go
Outdated
} | ||
|
||
start = end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkout the comment in
loki/pkg/logcli/query/query.go
Line 142 in cf23194
// Batching works by taking the timestamp of the last query and using it in the next query, |
I'm not quite sure what you are saying. Once you start the workers in parallel, sorting the results into one file would mean that you would need all the memory to hold all the messages until the worker with the lowest value is done. This is why I have saved one file per section. The previous comment said we should try to keep the ordering of the log messages. And you comment seems to contradict that. |
You don't have to keep them in memory. You could insert them into a file. That's a little bit more complicated though. It's like insertion sort with a file. |
@SN9NV I've done a quick research. What I mean is called External Sorting. However, we probably do not need it this complicated. I know this is some extra work but what do you think about naming the part files with timestamps and then merge them afterwards? It would be an append only operation. The difficult part is to figure out if we need deduplication. |
@SN9NV I'm sorry that you closed the PR. I liked the idea. |
I moved the code to another branch. I guess that's why it was closed. I'll open another PR from the new branch. |
What this PR does / why we need it:
Requesting a large range, a day, for example, takes a very long time to download. You can download in parallel by starting many processes, but this just made my job so much easier.
Special notes for your reviewer:
TBH, I am not good at testing, so IDK how such a thing can be tested.
Checklist
CONTRIBUTING.md
guideCHANGELOG.md
updateddocs/sources/upgrading/_index.md