logcli: Add parallel flags #8518

angaz · 2023-02-13T14:55:58Z

What this PR does / why we need it:
Requesting a large range, a day, for example, takes a very long time to download. You can download in parallel by starting many processes, but this just made my job so much easier.

Continuation of #7543

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

angaz · 2023-02-13T15:16:41Z

@jeschkies

Hi, I have updated the code and made a new PR.

Basically what is new here is that there's two new flags:

merge-parts
keep-parts

What this does is it will read the parts in order and write the output to stdout. And it will do this in order. So it will act similar to the non-parallel usage, and then after each file has been output, it will delete the file.

keep-parts overrides this behaviour, keeping the part files instead of deleting them in case the user wants to keep the files.

cmd/logcli/main.go

pkg/logcli/query/part_file.go

pkg/logcli/query/query.go

JStickler

[Docs squad] Lots of missing punctuation and some rewording suggestions.

JStickler · 2023-02-14T19:14:51Z

docs/sources/tools/logcli.md

-Notice that when using --from and --to then ensure to use RFC3339Nano time
-format, but without timezone at the end. The local timezone will be added
-automatically or if using --timezone flag.
+Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will


Suggested change

Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will

Note that when using --from and --to you must use the RFC3339Nano time format, but without timezone at the end. The local timezone will

@JStickler Hi, what you are commenting on here is generated, it's not changes that I made. It is output of the help text of the application. A few are from new stuff that I added and I will change that, do you want me to change all of these things? Maybe it would be better to make a PR yourself to change it? I feel it's a bit out of the scope of my contribution here. IDK what do you think?

JStickler · 2023-02-14T19:15:16Z

docs/sources/tools/logcli.md

-format, but without timezone at the end. The local timezone will be added
-automatically or if using --timezone flag.
+Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will
+be added automatically or if using --timezone flag.


Suggested change

be added automatically or if using --timezone flag.

be added automatically or when using the --timezone flag.

JStickler · 2023-02-14T19:18:50Z

docs/sources/tools/logcli.md

+  --merge-parts
+  --keep-parts
+
+Refer to the help of these specific flags to understand what each of them do.


Suggested change

Refer to the help of these specific flags to understand what each of them do.

Refer to the help for each flag for details about what each of them do.

JStickler · 2023-02-14T19:19:18Z

docs/sources/tools/logcli.md

+     --merge-parts
+     'my-query'
+
+This will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.


Suggested change

This will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.

This example will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.

JStickler · 2023-02-14T19:20:26Z

docs/sources/tools/logcli.md

+
+If you do not specify the --merge-parts flag, the part files will be downloaded, and logcli will exit, and you can process the files as you
+wish. With the flag specified, the part files will be read in order, and the output printed to the terminal. The lines will be printed as
+soon as the next part is complete, you don't have to wait for all the parts to download before getting output. --merge-parts will remove


Suggested change

soon as the next part is complete, you don't have to wait for all the parts to download before getting output. --merge-parts will remove

soon as the next part is complete, you don't have to wait for all the parts to download before getting output. the --merge-parts flag will remove

JStickler · 2023-02-14T19:31:33Z

docs/sources/tools/logcli.md

+                                 Exclude labels given the provided key during output.
+      --include-label=INCLUDE-LABEL ...
+                                 Include labels given the provided key during output.
+      --labels-length=0          Set a fixed padding to labels


Suggested change

--labels-length=0 Set a fixed padding to labels

--labels-length=0 Set a fixed padding to labels.

JStickler · 2023-02-14T19:31:45Z

docs/sources/tools/logcli.md

+      --store-config=""          Execute the current query using a configured storage from a given Loki configuration file.
+      --remote-schema            Execute the current query using a remote schema retrieved using the configured storage in the given Loki
+                                 configuration file.
+      --colored-output           Show output with colored labels


Suggested change

--colored-output Show output with colored labels

--colored-output Show output with colored labels.

JStickler · 2023-02-14T19:31:52Z

docs/sources/tools/logcli.md

+      --remote-schema            Execute the current query using a remote schema retrieved using the configured storage in the given Loki
+                                 configuration file.
+      --colored-output           Show output with colored labels
+  -t, --tail                     Tail the logs


Suggested change

-t, --tail Tail the logs

-t, --tail Tail the logs.

JStickler · 2023-02-14T19:31:59Z

docs/sources/tools/logcli.md

+                                 configuration file.
+      --colored-output           Show output with colored labels
+  -t, --tail                     Tail the logs
+  -f, --follow                   Alias for --tail


Suggested change

-f, --follow Alias for --tail

-f, --follow Alias for --tail.

JStickler · 2023-02-14T19:32:05Z

docs/sources/tools/logcli.md

+      --colored-output           Show output with colored labels
+  -t, --tail                     Tail the logs
+  -f, --follow                   Alias for --tail
+      --delay-for=0              Delay in tailing by number of seconds to accumulate logs for re-ordering


Suggested change

--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering

--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering.

angaz

@jeschkies I have made a few comments on some of your comments and will hopefully work on some of your suggestions tomorrow.

pkg/logcli/query/part_file.go

pkg/logcli/query/query.go

angaz · 2023-02-14T20:02:33Z

docs/sources/tools/logcli.md

-Notice that when using --from and --to then ensure to use RFC3339Nano time
-format, but without timezone at the end. The local timezone will be added
-automatically or if using --timezone flag.
+Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will


@JStickler Hi, what you are commenting on here is generated, it's not changes that I made. It is output of the help text of the application. A few are from new stuff that I added and I will change that, do you want me to change all of these things? Maybe it would be better to make a PR yourself to change it? I feel it's a bit out of the scope of my contribution here. IDK what do you think?

JStickler · 2023-02-14T22:45:28Z

what you are commenting on here is generated, it's not changes that I made. ... I feel it's a bit out of the scope of my contribution here.
@SN9NV I'll leave that up to you. I can make a PR later if that's easier for you.

angaz · 2023-02-15T12:39:39Z

@JStickler I have added some more docs, can you please check the language again. I believe I have solved all your previous comments, but it's possible I missed one because there were so many. cmd/logcli/main.go would be the right file to check for the documentation comments. Thank you.

@jeschkies I have pushed a few commits, I believe everything should be good now.

I reworked the part file to use some better names, I didn't like Complete, it didn't really feel like a command to me, Finalize feels much better.

jeschkies · 2023-02-15T13:02:59Z

@SN9NV thanks for your effort. We are close and just need to make a decision for #8518 (comment).

jeschkies

Awesome work. Please address my last comment about documenting the limit and we are good to go from my side.

I'll follow up with a "proper" unlimited option as this will require more discussion.

pkg/logcli/query/query.go

cmd/logcli/main.go

angaz · 2023-02-16T11:15:15Z

Cool, I think everything should be done.

jeschkies

🎉

angaz · 2023-02-16T14:16:21Z

Thanks for all your reviews and comments, I hope this will be useful to people.

jeschkies · 2023-02-16T17:35:10Z

@SN9NV I've just ran it on my machine and found the parallel execution to be slower than the normal execution. I'll take a closer look tomorrow.

EDIT: Ok, I think there might be some lock contention or deadlock as my CPU is pretty much idle.

angaz · 2023-02-16T17:44:36Z

That is strange. For me it was definitely faster, a download which took many hours, like over night, took just tens of minutes with the parallelism. And I could see the time it took to download a single part is the time it took to download the same period sequentially. I did have autoscaling in Kubernetes. Maybe it is a problem if you just have one instance. IDK. It would add one or two more pods, but for sure, it was not scaling up to the same degree as the speedup if that makes sense.

jeschkies · 2023-02-16T17:46:01Z

I'm running it against Grafana Clou so I might be rate limited. However, even with just two workers there's an issue. I can see the parts being done. Hm.

angaz · 2023-02-16T17:49:54Z

Hmm, I checked the last few rebase/commits and it seemed like it was still working as expected. I was starting 4 workers with 5-minute jobs in my tests, but I also downloaded about 25GB of logs on Tuesday and it worked well with 24 workers, 30-minute jobs, so I think it was working for me at least with my setup. I can test master again tomorrow.

jeschkies · 2023-02-17T11:27:29Z

pkg/logcli/query/query.go

+
+func (j *parallelJob) run(c client.Client, out output.LogOutput, statistics bool) {
+	j.q.DoQuery(c, out, statistics)
+	j.done <- struct{}{}


@SN9NV this is the deadlock. An unbuffered channel will write into the target when read. That means it blocks here until the channel is read. However, we don't want to wait for that. See my fix #8553.

I just came here to report the same thing, but you beat me to it. 😂

I was testing with very large number of threads, so this was kind of less noticeable. Smaller numbers makes the problem much more visible. So yeah, good catch.

dannykopping · 2023-02-27T09:44:32Z

@SN9NV @jeschkies this feature seems to only be implemented for range queries in logcli, and as a consequence has broken instant queries:

$ go run ./cmd/logcli/main.go --addr="http://localhost:3100" --org-id="docker" instant-query 'sum by (method) (count_over_time({job="generated-logs"} | json[1m]))'
main: error: parallel-max-workers must be greater than 0, try --help
exit status 1

Is there any reason why this should not apply to instant queries?

angaz · 2023-02-27T10:05:15Z

@dannykopping thanks for the report.

That indeed is not ideal that it's showing an error message for an unrelated command.

Why it was not created for instant queries, I only understand range query, but if it's useful, then we can have a look at adding the feature for this command as well.

I will have a fix soon. I think setting to 1 will be a better idea instead of an error stopping execution.

dannykopping · 2023-02-27T10:31:05Z

@SN9NV thanks for the quick response.

As far as I understand, this feature mainly deals with downloading large volumes of logs returned by a query. Thinking about it more, if that is the case, we don't need to handle instant queries I guess because it's very rare that one might have a large volume of logs for the same instant.

Apart from the instant query question above:
The functionality as implemented seems to break down when range aggregations are performed - with the responses not being returned in a coherent manner. The results are ostensibly not recombined in a way that mimicks the existing behaviour.

I was also a little disappointed to see that this was merged without tests which validate that the responses returned are identical with and without parallelization, which would be a reasonable expectation for a user to have; parallelization should not change the output IMHO since it's just a performance optimization.

@dannykopping

…tion (#8641) **What this PR does / why we need it**: Fixes regression reported here: #8518 (comment) @dannykopping can you please verify that it fixes the issue for you? **Checklist** - [x] Reviewed the [`CONTRIBUTING.md`](https://github.com/grafana/loki/blob/main/CONTRIBUTING.md) guide (**required**) - [ ] Documentation added - [ ] Tests updated - [ ] `CHANGELOG.md` updated - [ ] Changes that require user attention or interaction to upgrade are documented in `docs/sources/upgrading/_index.md`

angaz · 2023-02-27T11:16:50Z

@dannykopping can you please provide a query which is not returning correct results? I tested quite a few queries for logs which I have had to download in the past and the parallel and standard methods produced exactly the same files in the output.

I didn't test any aggregation queries because this is not something I've done before, just simple filtering/mapping functions. I guess I can see that if there's an aggregation which spans many of the parallel-duration's that it would not be able to return the correct result. I'm not sure such a case would be possible to compensate for.

dannykopping · 2023-02-28T08:22:19Z

@dannykopping can you please provide a query which is not returning correct results? I tested quite a few queries for logs which I have had to download in the past and the parallel and standard methods produced exactly the same files in the output.

Just run any query that has an aggregation, like count_over_time. When a query produces metrics this feature should be disabled IMHO.

pull-request-size bot added the size/XL label Feb 13, 2023

angaz force-pushed the logcli_parallel branch from 226016e to 4232d05 Compare February 13, 2023 15:09

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 13, 2023

angaz marked this pull request as ready for review February 13, 2023 15:11

angaz requested review from JStickler and a team as code owners February 13, 2023 15:11

jeschkies reviewed Feb 14, 2023

View reviewed changes

JStickler reviewed Feb 14, 2023

View reviewed changes

angaz commented Feb 14, 2023

View reviewed changes

angaz force-pushed the logcli_parallel branch from 4232d05 to 0168668 Compare February 15, 2023 11:54

angaz force-pushed the logcli_parallel branch from 05bb4e0 to f6e7129 Compare February 15, 2023 13:47

jeschkies requested changes Feb 15, 2023

View reviewed changes

pkg/logcli/query/query.go Show resolved Hide resolved

cmd/logcli/main.go Outdated Show resolved Hide resolved

cmd/logcli/main.go Outdated Show resolved Hide resolved

jeschkies mentioned this pull request Feb 15, 2023

Batch export of stream over timerange #6840

Open

jeschkies reviewed Feb 16, 2023

View reviewed changes

cmd/logcli/main.go Outdated Show resolved Hide resolved

angaz added 3 commits February 16, 2023 11:49

[logcli] Add parallelism

2636ee2

update changelog

07083ff

Add part file

847653b

angaz force-pushed the logcli_parallel branch from f6e7129 to eb78d41 Compare February 16, 2023 11:14

angaz added 7 commits February 16, 2023 12:35

Add merge, and keep-parts flags

12a2bb0

Remove 'file' from part options

70c3428

Update docs

72156cf

change part-prefix to part-path-prefix

13bd88e

Add validation for parallel flags

d2bdc0e

Rework part file

4428a9d

Move conditional part file logic to caller

975d22b

angaz added 5 commits February 16, 2023 12:36

rename ceiling division function

5e26040

close file

edcb89d

explain optimization for future generations

aaf26d5

ignore limit with parallel workers

ca7db25

Update docs

ba29563

angaz force-pushed the logcli_parallel branch from eb78d41 to ba29563 Compare February 16, 2023 11:36

jeschkies approved these changes Feb 16, 2023

View reviewed changes

jeschkies merged commit 42522d7 into grafana:main Feb 16, 2023

angaz deleted the logcli_parallel branch February 16, 2023 14:15

jeschkies mentioned this pull request Feb 17, 2023

Fix logcli parallel download deadlock. #8553

Merged

5 tasks

jeschkies reviewed Feb 17, 2023

View reviewed changes

angaz mentioned this pull request Feb 27, 2023

[logcli] set default instead of error for parallel-max-workers validation #8641

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logcli: Add parallel flags #8518

logcli: Add parallel flags #8518

angaz commented Feb 13, 2023

angaz commented Feb 13, 2023 •

edited

Loading

JStickler left a comment

JStickler Feb 14, 2023

angaz Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

JStickler Feb 14, 2023

angaz left a comment

angaz Feb 14, 2023

JStickler commented Feb 14, 2023

angaz commented Feb 15, 2023

jeschkies commented Feb 15, 2023

jeschkies left a comment

angaz commented Feb 16, 2023

jeschkies left a comment

angaz commented Feb 16, 2023

jeschkies commented Feb 16, 2023 •

edited

Loading

angaz commented Feb 16, 2023

jeschkies commented Feb 16, 2023

angaz commented Feb 16, 2023

jeschkies Feb 17, 2023

angaz Feb 17, 2023

dannykopping commented Feb 27, 2023

angaz commented Feb 27, 2023

dannykopping commented Feb 27, 2023

angaz commented Feb 27, 2023

dannykopping commented Feb 28, 2023

	Notice that when using --from and --to then ensure to use RFC3339Nano time format, but without timezone at the end. The local timezone will
	Note that when using --from and --to you must use the RFC3339Nano time format, but without timezone at the end. The local timezone will

	be added automatically or if using --timezone flag.
	be added automatically or when using the --timezone flag.

	Refer to the help of these specific flags to understand what each of them do.
	Refer to the help for each flag for details about what each of them do.

	This will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.
	This example will start 10 workers, and they will each start downloading 15 minute slices of the specified time range.

	soon as the next part is complete, you don't have to wait for all the parts to download before getting output. --merge-parts will remove
	soon as the next part is complete, you don't have to wait for all the parts to download before getting output. the --merge-parts flag will remove

	--labels-length=0 Set a fixed padding to labels
	--labels-length=0 Set a fixed padding to labels.

	--colored-output Show output with colored labels
	--colored-output Show output with colored labels.

	--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering
	--delay-for=0 Delay in tailing by number of seconds to accumulate logs for re-ordering.

logcli: Add parallel flags #8518

logcli: Add parallel flags #8518

Conversation

angaz commented Feb 13, 2023

angaz commented Feb 13, 2023 • edited Loading

JStickler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

angaz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JStickler commented Feb 14, 2023

angaz commented Feb 15, 2023

jeschkies commented Feb 15, 2023

jeschkies left a comment

Choose a reason for hiding this comment

angaz commented Feb 16, 2023

jeschkies left a comment

Choose a reason for hiding this comment

angaz commented Feb 16, 2023

jeschkies commented Feb 16, 2023 • edited Loading

angaz commented Feb 16, 2023

jeschkies commented Feb 16, 2023

angaz commented Feb 16, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dannykopping commented Feb 27, 2023

angaz commented Feb 27, 2023

dannykopping commented Feb 27, 2023

angaz commented Feb 27, 2023

dannykopping commented Feb 28, 2023

angaz commented Feb 13, 2023 •

edited

Loading

jeschkies commented Feb 16, 2023 •

edited

Loading