-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
logcli: Add parallel flags #8518
Conversation
226016e
to
4232d05
Compare
Hi, I have updated the code and made a new PR. Basically what is new here is that there's two new flags:
What this does is it will read the parts in order and write the output to stdout. And it will do this in order. So it will act similar to the non-parallel usage, and then after each file has been output, it will delete the file.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Docs squad] Lots of missing punctuation and some rewording suggestions.
docs/sources/tools/logcli.md
Outdated
Notice that when using --from and --to then ensure to use RFC3339Nano time | ||
format, but without timezone at the end. The local timezone will be added | ||
automatically or if using --timezone flag. | ||
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will | |
Note that when using --from and --to you must use the RFC3339Nano time format, but without timezone at the end. The local timezone will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JStickler Hi, what you are commenting on here is generated, it's not changes that I made. It is output of the help text of the application. A few are from new stuff that I added and I will change that, do you want me to change all of these things? Maybe it would be better to make a PR yourself to change it? I feel it's a bit out of the scope of my contribution here. IDK what do you think?
docs/sources/tools/logcli.md
Outdated
format, but without timezone at the end. The local timezone will be added | ||
automatically or if using --timezone flag. | ||
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will | ||
be added automatically or if using --timezone flag. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
be added automatically or if using --timezone flag. | |
be added automatically or when using the --timezone flag. |
docs/sources/tools/logcli.md
Outdated
--merge-parts | ||
--keep-parts | ||
|
||
Refer to the help of these specific flags to understand what each of them do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to the help of these specific flags to understand what each of them do. | |
Refer to the help for each flag for details about what each of them do. |
docs/sources/tools/logcli.md
Outdated
--merge-parts | ||
'my-query' | ||
|
||
This will start 10 workers, and they will each start downloading 15 minute slices of the specified time range. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will start 10 workers, and they will each start downloading 15 minute slices of the specified time range. | |
This example will start 10 workers, and they will each start downloading 15 minute slices of the specified time range. |
docs/sources/tools/logcli.md
Outdated
|
||
If you do not specify the --merge-parts flag, the part files will be downloaded, and logcli will exit, and you can process the files as you | ||
wish. With the flag specified, the part files will be read in order, and the output printed to the terminal. The lines will be printed as | ||
soon as the next part is complete, you don't have to wait for all the parts to download before getting output. --merge-parts will remove |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
soon as the next part is complete, you don't have to wait for all the parts to download before getting output. --merge-parts will remove | |
soon as the next part is complete, you don't have to wait for all the parts to download before getting output. the --merge-parts flag will remove |
docs/sources/tools/logcli.md
Outdated
Exclude labels given the provided key during output. | ||
--include-label=INCLUDE-LABEL ... | ||
Include labels given the provided key during output. | ||
--labels-length=0 Set a fixed padding to labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--labels-length=0 Set a fixed padding to labels | |
--labels-length=0 Set a fixed padding to labels. |
docs/sources/tools/logcli.md
Outdated
--store-config="" Execute the current query using a configured storage from a given Loki configuration file. | ||
--remote-schema Execute the current query using a remote schema retrieved using the configured storage in the given Loki | ||
configuration file. | ||
--colored-output Show output with colored labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--colored-output Show output with colored labels | |
--colored-output Show output with colored labels. |
docs/sources/tools/logcli.md
Outdated
--remote-schema Execute the current query using a remote schema retrieved using the configured storage in the given Loki | ||
configuration file. | ||
--colored-output Show output with colored labels | ||
-t, --tail Tail the logs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-t, --tail Tail the logs | |
-t, --tail Tail the logs. |
docs/sources/tools/logcli.md
Outdated
configuration file. | ||
--colored-output Show output with colored labels | ||
-t, --tail Tail the logs | ||
-f, --follow Alias for --tail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-f, --follow Alias for --tail | |
-f, --follow Alias for --tail. |
docs/sources/tools/logcli.md
Outdated
--colored-output Show output with colored labels | ||
-t, --tail Tail the logs | ||
-f, --follow Alias for --tail | ||
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering | |
--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeschkies I have made a few comments on some of your comments and will hopefully work on some of your suggestions tomorrow.
docs/sources/tools/logcli.md
Outdated
Notice that when using --from and --to then ensure to use RFC3339Nano time | ||
format, but without timezone at the end. The local timezone will be added | ||
automatically or if using --timezone flag. | ||
Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JStickler Hi, what you are commenting on here is generated, it's not changes that I made. It is output of the help text of the application. A few are from new stuff that I added and I will change that, do you want me to change all of these things? Maybe it would be better to make a PR yourself to change it? I feel it's a bit out of the scope of my contribution here. IDK what do you think?
|
4232d05
to
0168668
Compare
@JStickler I have added some more docs, can you please check the language again. I believe I have solved all your previous comments, but it's possible I missed one because there were so many. @jeschkies I have pushed a few commits, I believe everything should be good now. I reworked the part file to use some better names, I didn't like |
@SN9NV thanks for your effort. We are close and just need to make a decision for #8518 (comment). |
05bb4e0
to
f6e7129
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work. Please address my last comment about documenting the limit and we are good to go from my side.
I'll follow up with a "proper" unlimited option as this will require more discussion.
f6e7129
to
eb78d41
Compare
Cool, I think everything should be done. |
eb78d41
to
ba29563
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉
Thanks for all your reviews and comments, I hope this will be useful to people. |
@SN9NV I've just ran it on my machine and found the parallel execution to be slower than the normal execution. I'll take a closer look tomorrow. EDIT: Ok, I think there might be some lock contention or deadlock as my CPU is pretty much idle. |
That is strange. For me it was definitely faster, a download which took many hours, like over night, took just tens of minutes with the parallelism. And I could see the time it took to download a single part is the time it took to download the same period sequentially. I did have autoscaling in Kubernetes. Maybe it is a problem if you just have one instance. IDK. It would add one or two more pods, but for sure, it was not scaling up to the same degree as the speedup if that makes sense. |
I'm running it against Grafana Clou so I might be rate limited. However, even with just two workers there's an issue. I can see the parts being done. Hm. |
Hmm, I checked the last few rebase/commits and it seemed like it was still working as expected. I was starting 4 workers with 5-minute jobs in my tests, but I also downloaded about 25GB of logs on Tuesday and it worked well with 24 workers, 30-minute jobs, so I think it was working for me at least with my setup. I can test master again tomorrow. |
|
||
func (j *parallelJob) run(c client.Client, out output.LogOutput, statistics bool) { | ||
j.q.DoQuery(c, out, statistics) | ||
j.done <- struct{}{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@SN9NV this is the deadlock. An unbuffered channel will write into the target when read. That means it blocks here until the channel is read. However, we don't want to wait for that. See my fix #8553.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just came here to report the same thing, but you beat me to it. 😂
I was testing with very large number of threads, so this was kind of less noticeable. Smaller numbers makes the problem much more visible. So yeah, good catch.
@SN9NV @jeschkies this feature seems to only be implemented for range queries in logcli, and as a consequence has broken instant queries: $ go run ./cmd/logcli/main.go --addr="http://localhost:3100" --org-id="docker" instant-query 'sum by (method) (count_over_time({job="generated-logs"} | json[1m]))'
main: error: parallel-max-workers must be greater than 0, try --help
exit status 1 Is there any reason why this should not apply to instant queries? |
@dannykopping thanks for the report. That indeed is not ideal that it's showing an error message for an unrelated command. Why it was not created for instant queries, I only understand range query, but if it's useful, then we can have a look at adding the feature for this command as well. I will have a fix soon. I think setting to 1 will be a better idea instead of an error stopping execution. |
@SN9NV thanks for the quick response. As far as I understand, this feature mainly deals with downloading large volumes of logs returned by a query. Thinking about it more, if that is the case, we don't need to handle instant queries I guess because it's very rare that one might have a large volume of logs for the same instant. Apart from the instant query question above: I was also a little disappointed to see that this was merged without tests which validate that the responses returned are identical with and without parallelization, which would be a reasonable expectation for a user to have; parallelization should not change the output IMHO since it's just a performance optimization. |
…tion (#8641) **What this PR does / why we need it**: Fixes regression reported here: #8518 (comment) @dannykopping can you please verify that it fixes the issue for you? **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`
@dannykopping can you please provide a query which is not returning correct results? I tested quite a few queries for logs which I have had to download in the past and the parallel and standard methods produced exactly the same files in the output. I didn't test any aggregation queries because this is not something I've done before, just simple filtering/mapping functions. I guess I can see that if there's an aggregation which spans many of the |
Just run any query that has an aggregation, like |
What this PR does / why we need it:
Requesting a large range, a day, for example, takes a very long time to download. You can download in parallel by starting many processes, but this just made my job so much easier.
Continuation of #7543
Checklist
CONTRIBUTING.md
guide (required)CHANGELOG.md
updateddocs/sources/upgrading/_index.md