Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[prometheusremotewriteexporter] reduce allocations when serializing protobufs #35185

Conversation

edma2
Copy link
Contributor

@edma2 edma2 commented Sep 14, 2024

Description:

There are two allocations that happen on every push

  1. Serializing the remote write protobuf to a byte array.
  2. Compressing the byte array with snappy.

Since these buffers can be quite large, we can save some allocations with a sync.Pool.

Link to tracking Issue:

Testing:

Tests still pass. We have been running this successfully in production for a few months now.

                           │ /tmp/old.txt │            /tmp/new.txt            │
                           │    sec/op    │   sec/op     vs base               │
Execute/numSample=100-14      43.11µ ± 2%   43.09µ ± 2%       ~ (p=0.853 n=10)
Execute/numSample=1000-14     105.4µ ± 1%   102.2µ ± 1%  -3.04% (p=0.000 n=10)
Execute/numSample=10000-14    685.5µ ± 1%   663.6µ ± 5%  -3.19% (p=0.023 n=10)
geomean                       146.0µ        143.0µ       -2.10%

                           │ /tmp/old.txt  │             /tmp/new.txt             │
                           │     B/op      │     B/op      vs base                │
Execute/numSample=100-14     14.809Ki ± 0%   6.091Ki ± 0%  -58.87% (p=0.000 n=10)
Execute/numSample=1000-14     94.18Ki ± 0%   22.16Ki ± 0%  -76.47% (p=0.000 n=10)
Execute/numSample=10000-14   726.23Ki ± 0%   39.83Ki ± 0%  -94.52% (p=0.000 n=10)
geomean                       100.4Ki        17.52Ki       -82.56%

                           │ /tmp/old.txt │           /tmp/new.txt            │
                           │  allocs/op   │ allocs/op   vs base               │
Execute/numSample=100-14       81.00 ± 0%   79.00 ± 0%  -2.47% (p=0.000 n=10)
Execute/numSample=1000-14      85.00 ± 0%   83.00 ± 0%  -2.35% (p=0.000 n=10)
Execute/numSample=10000-14     85.00 ± 0%   83.00 ± 0%  -2.35% (p=0.000 n=10)
geomean                        83.65        81.64       -2.39%

Documentation:

@dashpole
Copy link
Contributor

Looks great! Are you able to add a benchmark to demonstrate the improvement and make sure it doesn't regress in the future?

cc @jmichalek132 @ArthurSens feel free to review as well

Copy link
Contributor

@jmichalek132 jmichalek132 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but could you please look at the failing compliance tests? So turns out that should be fixed now #35119 so rebasing should hopefully make them pass.

By any chance is there any visible improvement you can see with this change from the metrics of the collector, if so could you post them here?

@edma2 edma2 requested a review from a team as a code owner October 1, 2024 22:29
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Oct 16, 2024
@dashpole dashpole removed the Stale label Oct 16, 2024
@edma2
Copy link
Contributor Author

edma2 commented Oct 18, 2024

sorry for the staleness, I'd still like to get this PR merged (was busy with work). I will provide a benchmark to get this PR moving along

Copy link
Contributor

github-actions bot commented Nov 2, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Nov 2, 2024
@ArthurSens
Copy link
Member

I'm taking a look at this today, I took the liberty to write a benchmark for execute:

func BenchmarkExecute(b *testing.B) {
	for _, numSample := range []int{100, 1000, 10000} {
		b.Run(fmt.Sprintf("numSample=%d", numSample), func(b *testing.B) {
			benchmarkExecute(b, numSample)
		})
	}
}

func benchmarkExecute(b *testing.B, numSample int) {
	mockServer := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
		w.WriteHeader(http.StatusOK)
	}))
	defer mockServer.Close()
	endpointURL, err := url.Parse(mockServer.URL)
	require.NoError(b, err)

	// Create the prwExporter
	exporter := &prwExporter{
		endpointURL: endpointURL,
		client:      http.DefaultClient,
	}

	generateSamples := func(n int) []prompb.Sample {
		samples := make([]prompb.Sample, 0, n)
		for i := 0; i < n; i++ {
			samples = append(samples, prompb.Sample{
				Timestamp: int64(i),
				Value:     float64(i),
			})
		}
		return samples
	}

	reqs := make([]*prompb.WriteRequest, 0, b.N)
	const labelValue = "abcdefg'hijlmn234!@#$%^&*()_+~`\"{}[],./<>?hello0123hiOlá你好Dzieńdobry9Zd8ra765v4stvuyte"
	for n := 0; n < b.N; n++ {
		num := strings.Repeat(strconv.Itoa(n), 16)
		req := &prompb.WriteRequest{
			Timeseries: []prompb.TimeSeries{{
				Samples: generateSamples(numSample),
				Labels: []prompb.Label{
					{Name: "__name__", Value: "test_metric"},
					{Name: "test_label_name_" + num, Value: labelValue + num},
				}}},
		}
		reqs = append(reqs, req)
	}

	ctx := context.Background()
	b.ReportAllocs()
	b.ResetTimer()
	for _, req := range reqs {
		err := exporter.execute(ctx, req)
		require.NoError(b, err)
	}
}

Once you have the time, could you include it in your PR? Maybe the input could be improved as well, I'm only generating samples and labels, but we could include metadata and histograms as well for completeness.

The results are looking good :)

$ benchstat base=main.txt new=pr-35185.txt
goos: darwin
goarch: arm64
pkg: github.com/open-telemetry/opentelemetry-collector-contrib/exporter/prometheusremotewriteexporter
cpu: Apple M2 Pro
                          │    base     │                new                 │
                          │   sec/op    │   sec/op     vs base               │
Execute/numSample=100-2     50.96µ ± 6%   50.40µ ± 8%       ~ (p=0.105 n=10)
Execute/numSample=1000-2    79.25µ ± 1%   77.20µ ± 1%  -2.59% (p=0.000 n=10)
Execute/numSample=10000-2   332.9µ ± 3%   309.8µ ± 1%  -6.95% (p=0.000 n=10)
geomean                     110.4µ        106.4µ       -3.58%

                          │     base      │                 new                  │
                          │     B/op      │     B/op      vs base                │
Execute/numSample=100-2      9.824Ki ± 0%   6.077Ki ± 0%  -38.15% (p=0.000 n=10)
Execute/numSample=1000-2     42.85Ki ± 0%   10.85Ki ± 0%  -74.68% (p=0.000 n=10)
Execute/numSample=10000-2   342.11Ki ± 0%   38.19Ki ± 0%  -88.84% (p=0.000 n=10)
geomean                      52.42Ki        13.61Ki       -74.04%

                          │    base    │                new                │
                          │ allocs/op  │ allocs/op   vs base               │
Execute/numSample=100-2     81.00 ± 0%   79.00 ± 0%  -2.47% (p=0.000 n=10)
Execute/numSample=1000-2    85.00 ± 0%   83.00 ± 0%  -2.35% (p=0.000 n=10)
Execute/numSample=10000-2   85.00 ± 0%   83.00 ± 0%  -2.35% (p=0.000 n=10)
geomean                     83.65        81.64       -2.39%

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About the code so far, LGTM! Let's just add the benchmark so we can re-use it in future optimization works :)

Personally, sync.Pool is a new thing for me. I had to read[1][2] a ton to properly review the PR here, and I learned a lot, so thanks for raising the PR!

@github-actions github-actions bot removed the Stale label Nov 3, 2024
@edma2
Copy link
Contributor Author

edma2 commented Nov 5, 2024

@ArthurSens thank you for the benchmark! I'll get on it this week. sync.Pool is relatively new to me too, and seemed like a good fit for this code.

@edma2 edma2 force-pushed the prometheusremotewriteexporter-optimize-serialization branch from 38ca486 to f2b6aac Compare November 5, 2024 04:36
@edma2 edma2 force-pushed the prometheusremotewriteexporter-optimize-serialization branch from f2b6aac to f8f1d41 Compare November 5, 2024 04:38
@edma2
Copy link
Contributor Author

edma2 commented Nov 5, 2024

@ArthurSens added the benchmark, PTAL!

Copy link
Member

@ArthurSens ArthurSens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

We still need an approver from a codeowner though, maybe @dashpole?

Copy link
Contributor

@dashpole dashpole left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dashpole dashpole added enhancement New feature or request ready to merge Code review completed; ready to merge by maintainers labels Nov 12, 2024
@bogdandrutu bogdandrutu merged commit 7281556 into open-telemetry:main Nov 13, 2024
158 checks passed
@github-actions github-actions bot added this to the next release milestone Nov 13, 2024
sbylica-splunk pushed a commit to sbylica-splunk/opentelemetry-collector-contrib that referenced this pull request Dec 17, 2024
…rotobufs (open-telemetry#35185)

**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->

There are two allocations that happen on every push
1. Serializing the remote write protobuf to a byte array.
2. Compressing the byte array with snappy.

Since these buffers can be quite large, we can save some allocations
with a `sync.Pool`.

**Link to tracking Issue:** <Issue number if applicable>

**Testing:**

Tests still pass. We have been running this successfully in production
for a few months now.

```
                           │ /tmp/old.txt │            /tmp/new.txt            │
                           │    sec/op    │   sec/op     vs base               │
Execute/numSample=100-14      43.11µ ± 2%   43.09µ ± 2%       ~ (p=0.853 n=10)
Execute/numSample=1000-14     105.4µ ± 1%   102.2µ ± 1%  -3.04% (p=0.000 n=10)
Execute/numSample=10000-14    685.5µ ± 1%   663.6µ ± 5%  -3.19% (p=0.023 n=10)
geomean                       146.0µ        143.0µ       -2.10%

                           │ /tmp/old.txt  │             /tmp/new.txt             │
                           │     B/op      │     B/op      vs base                │
Execute/numSample=100-14     14.809Ki ± 0%   6.091Ki ± 0%  -58.87% (p=0.000 n=10)
Execute/numSample=1000-14     94.18Ki ± 0%   22.16Ki ± 0%  -76.47% (p=0.000 n=10)
Execute/numSample=10000-14   726.23Ki ± 0%   39.83Ki ± 0%  -94.52% (p=0.000 n=10)
geomean                       100.4Ki        17.52Ki       -82.56%

                           │ /tmp/old.txt │           /tmp/new.txt            │
                           │  allocs/op   │ allocs/op   vs base               │
Execute/numSample=100-14       81.00 ± 0%   79.00 ± 0%  -2.47% (p=0.000 n=10)
Execute/numSample=1000-14      85.00 ± 0%   83.00 ± 0%  -2.35% (p=0.000 n=10)
Execute/numSample=10000-14     85.00 ± 0%   83.00 ± 0%  -2.35% (p=0.000 n=10)
geomean                        83.65        81.64       -2.39%
```

**Documentation:** <Describe the documentation added.>

---------

Co-authored-by: David Ashpole <dashpole@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request exporter/prometheusremotewrite ready to merge Code review completed; ready to merge by maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants