Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reduce cost of pipeline stages by pooling maps we create for labels #11484

Merged
merged 4 commits into from
Dec 16, 2023

Conversation

cstyan
Copy link
Contributor

@cstyan cstyan commented Dec 14, 2023

If we look at profiling of ingesters in some of our production deployments during which we have high ingester load from querying we see that we spend roughly 15-20% of our CPU time on GC related work.

Looking at different runtime memory related calls, it looks like ~50% of all CPU time during Pipeline stage processing is the creation of the map for labels that is then passed to text template parsing. This change will make the pipeline processing itself ~15% faster but hopefully will also reduce (very slightly) the impact of GC in heavy query load situations.

goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
cpu: AMD Ryzen 9 5950X 16-Core Processor
                                    │     base     │                mapPool                │
                                    │    sec/op    │    sec/op     vs base                 │
_Pipeline/pipeline_bytes-32           7.749µ ± ∞ ¹   6.464µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          7.691µ ± ∞ ¹   6.549µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     7.922µ ± ∞ ¹   6.882µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    7.900µ ± ∞ ¹   6.878µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    7.952µ ± ∞ ¹   6.976µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   7.940µ ± ∞ ¹   7.102µ ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                               7.858µ         6.805µ        -13.41%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                    │     base      │                mapPool                 │
                                    │     B/op      │     B/op       vs base                 │
_Pipeline/pipeline_bytes-32           4.963Ki ± ∞ ¹   1.413Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          5.025Ki ± ∞ ¹   1.476Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     5.030Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    5.028Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    5.027Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   5.027Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                               5.017Ki         1.466Ki        -70.77%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                    │    base     │               mapPool               │
                                    │  allocs/op  │  allocs/op   vs base                │
_Pipeline/pipeline_bytes-32           37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          38.00 ± ∞ ¹   36.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                               37.16         35.16        -5.38%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan cstyan requested a review from a team as a code owner December 14, 2023 01:22
Copy link
Contributor

github-actions bot commented Dec 14, 2023

Trivy scan found the following vulnerabilities:

  • HIGH, Target: docker.io/grafana/loki:main-be71a80 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libcrypto3 v3.1.3-r0. Fixed in v3.1.4-r0
  • HIGH, Target: docker.io/grafana/loki:main-be71a80 (alpine 3.18.4), Type: alpine openssl: Incorrect cipher key and IV length processing in libssl3 v3.1.3-r0. Fixed in v3.1.4-r0
    \nTo see more details on these vulnerabilities, and how/where to fix them, please run docker build -t grafana/loki:main-be71a80 -f cmd/loki/Dockerfile .
    trivy i grafana/loki:main-be71a80 on your branch. If these were not introduced by your PR, please considering fixing them in via a subsequent PR. Thanks!

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@pull-request-size pull-request-size bot added size/M and removed size/S labels Dec 14, 2023
@cstyan
Copy link
Contributor Author

cstyan commented Dec 14, 2023

Benchmark for latest commit. Additional pooling of the map at the beginning of LabelsFormatter.Process which needs the introduction of a LabelsBuilder.IntoMap function.

goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
cpu: AMD Ryzen 9 5950X 16-Core Processor
                                    │     base     │                mapPool                │
                                    │    sec/op    │    sec/op     vs base                 │
_Pipeline/pipeline_bytes-32           7.749µ ± ∞ ¹   6.356µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          7.691µ ± ∞ ¹   6.377µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     7.922µ ± ∞ ¹   6.676µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    7.900µ ± ∞ ¹   6.797µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    7.952µ ± ∞ ¹   6.954µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   7.940µ ± ∞ ¹   6.749µ ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                               7.858µ         6.648µ        -15.40%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                    │     base      │                mapPool                 │
                                    │     B/op      │     B/op       vs base                 │
_Pipeline/pipeline_bytes-32           4.963Ki ± ∞ ¹   1.389Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          5.025Ki ± ∞ ¹   1.451Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     5.030Ki ± ∞ ¹   1.454Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    5.028Ki ± ∞ ¹   1.454Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    5.027Ki ± ∞ ¹   1.454Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   5.027Ki ± ∞ ¹   1.454Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                               5.017Ki         1.443Ki        -71.25%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                    │    base     │               mapPool                │
                                    │  allocs/op  │  allocs/op   vs base                 │
_Pipeline/pipeline_bytes-32           37.00 ± ∞ ¹   33.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          38.00 ± ∞ ¹   34.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     37.00 ± ∞ ¹   33.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    37.00 ± ∞ ¹   33.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    37.00 ± ∞ ¹   33.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   37.00 ± ∞ ¹   33.00 ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                               37.16         33.16        -10.76%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

@@ -380,7 +385,12 @@ func (lf *LabelsFormatter) Process(ts int64, l []byte, lbs *LabelsBuilder) ([]by
lf.currentLine = l
lf.currentTs = ts

var data interface{}
var data = stringMapPool.Get().(map[string]string)
clear(data)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Would it make a difference if the clear() is done in the defer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I am misunderstanding how some of this works under the hood, it seems like unfortunately the text template execution is using references to the data in the map, so if we clear at the exit of this function rather than when we first grab something from the pool the tests will fail.

I've modified things so that there's a wrapper struct for the pool that calls clear for us when we grab something from the pool, just to make things cleaner.

@dannykopping
Copy link
Contributor

Can you share a couple profiles showing this? (probably best to export the pprof files from pyroscope and attach here instead of linking, for the sake of the community)

@@ -437,6 +438,32 @@ func (b *LabelsBuilder) UnsortedLabels(buf labels.Labels, categories ...LabelCat
return buf
}

var stringMapPool = sync.Pool{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to wrap this in a specific type so we don't have to typcast when we decide to use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

funnily enough, I did exactly this already but forgot to push it

@MasslessParticle
Copy link
Contributor

This seems like a cool idea, in general. I've left a comment with a potential QOL improvement for someone using it.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@cstyan
Copy link
Contributor Author

cstyan commented Dec 15, 2023

Can you share a couple profiles showing this? (probably best to export the pprof files from pyroscope and attach here instead of linking, for the sake of the community)

I actually don't know how to export profiles from pyroscope still, but here's a screenshot.

2023-12-15-142509_1201x300_scrot

Signed-off-by: Callum Styan <callumstyan@gmail.com>
@pull-request-size pull-request-size bot added size/L and removed size/M labels Dec 15, 2023
@cstyan cstyan merged commit e93243f into main Dec 16, 2023
9 checks passed
@cstyan cstyan deleted the pipeline-map-pool branch December 16, 2023 02:20
cstyan added a commit that referenced this pull request Dec 19, 2023
remove mistakenly committed files (benchmarking related) from #11484

Signed-off-by: Callum Styan <callumstyan@gmail.com>
cstyan added a commit that referenced this pull request Jan 4, 2024
Signed-off-by: Callum Styan <callumstyan@gmail.com>
cstyan added a commit that referenced this pull request Jan 4, 2024
Signed-off-by: Callum Styan <callumstyan@gmail.com>
cstyan added a commit that referenced this pull request Jan 4, 2024
reverts pooling that causes a panic for concurrent map writes

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
…rafana#11484)

If we look at profiling of ingesters in some of our production
deployments during which we have high ingester load from querying we see
that we spend roughly 15-20% of our CPU time on GC related work.

Looking at different `runtime` memory related calls, it looks like ~50%
of all CPU time during Pipeline stage processing is the creation of the
map for labels that is then passed to text template parsing. This change
will make the pipeline processing itself ~15% faster but hopefully will
also reduce (very slightly) the impact of GC in heavy query load
situations.

```
goos: linux
goarch: amd64
pkg: github.com/grafana/loki/pkg/logql/log
cpu: AMD Ryzen 9 5950X 16-Core Processor
                                    │     base     │                mapPool                │
                                    │    sec/op    │    sec/op     vs base                 │
_Pipeline/pipeline_bytes-32           7.749µ ± ∞ ¹   6.464µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          7.691µ ± ∞ ¹   6.549µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     7.922µ ± ∞ ¹   6.882µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    7.900µ ± ∞ ¹   6.878µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    7.952µ ± ∞ ¹   6.976µ ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   7.940µ ± ∞ ¹   7.102µ ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                               7.858µ         6.805µ        -13.41%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                    │     base      │                mapPool                 │
                                    │     B/op      │     B/op       vs base                 │
_Pipeline/pipeline_bytes-32           4.963Ki ± ∞ ¹   1.413Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          5.025Ki ± ∞ ¹   1.476Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     5.030Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    5.028Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    5.027Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   5.027Ki ± ∞ ¹   1.478Ki ± ∞ ¹        ~ (p=1.000 n=1) ²
geomean                               5.017Ki         1.466Ki        -70.77%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05

                                    │    base     │               mapPool               │
                                    │  allocs/op  │  allocs/op   vs base                │
_Pipeline/pipeline_bytes-32           37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/pipeline_string-32          38.00 ± ∞ ¹   36.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_bytes-32     37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/line_extractor_string-32    37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_bytes-32    37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
_Pipeline/label_extractor_string-32   37.00 ± ∞ ¹   35.00 ± ∞ ¹       ~ (p=1.000 n=1) ²
geomean                               37.16         35.16        -5.38%
¹ need >= 6 samples for confidence interval at level 0.95
² need >= 4 samples to detect a difference at alpha level 0.05
```

---------

Signed-off-by: Callum Styan <callumstyan@gmail.com>
rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024
remove mistakenly committed files (benchmarking related) from grafana#11484

Signed-off-by: Callum Styan <callumstyan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants