Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add an API to send multiple samples at once #235

Merged

Conversation

blemale
Copy link
Contributor

@blemale blemale commented Jan 26, 2024

recordDistributionValues is similar to recordDistributionValue but it lets the client sends multiple samples in one message using dogstatsd 1.1 protocol.

Because this is a shift compared to how other methods are behaving, these methods are provided only in DirectStatsDClient which provides direct access to some low level dogstatsd protocol features.

This is recommended in high performance cases were the overhead of the statsd library might be significant and the sampling is already done by the client.

Port of DataDog/datadog-go#296.

recordDistributionValues is similar to recordDistributionValue but it lets the
client sends multiple samples in one message using dogstatsd 1.1 protocol.

Because this is a shift compared to how other methods are behaving, these
methods are provided only in DirectStatsDClient which provides direct access to
some low level dogstatsd protocol features.

This is recommended in high performance cases were the overhead of the statsd
library might be significant and the sampling is already done by the client.

Port of DataDog/datadog-go#296.
@blemale blemale requested a review from a team as a code owner January 26, 2024 17:02

@Override
protected void writeValuesTo(StringBuilder builder) {
for (long value : values) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if there are more values than we have space in the payload for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iksaif how are you handling that on the go side? As far as I have seen there is no specific handling.
So in this case, I suppose we will throw an exception, but I'm not sure to see how it differs from someone sending a lot a tags for instance, are we bounding that? And it is explicitly stated in the javadoc that it is the responsibility of the caller to bound the number of values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After discussing with @iksaif, I will port something similar to https://github.com/DataDog/datadog-go/blob/master/statsd/buffer.go#L75.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vickenty I have addressed your concern in 55ebdc5, could you take a look at it?

@blemale blemale force-pushed the bastien.lemale/feat_send_multiple_samples branch 2 times, most recently from e5d517f to 15f60b4 Compare February 7, 2024 15:47
When we send multiple samples at once, handle gracefully the case where the
output payload is bigger than the max packet size by splitting the samples into
several payloads.
@blemale blemale force-pushed the bastien.lemale/feat_send_multiple_samples branch from 15f60b4 to 55ebdc5 Compare February 7, 2024 15:56
int previousLength = builder.length();
writeHeadMetadata(builder);
writeTailMetadata(builder, containerID);
metadataSize = builder.length() - previousLength;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This counts characters, while the capacity passed into writeTo is in bytes, which can cause payload drops if the direct client is used with non-ascii data. Would it be possible to account for encoded length here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I have fixed that in 37f858d

Copy link
Contributor Author

@blemale blemale Feb 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have again pushed the commit to use the target encoding:

final int previousEncodedLength = builder.toString().getBytes(StandardCharsets.UTF_8).length;

https://github.com/DataDog/java-dogstatsd-client/pull/235/files#diff-363801336ae6e5f0955834149b9e550375970e5c0dcde6b72ce36c69abed7bc6R67

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is allocating, if it is a blocker, I can intern this code from guava: https://github.com/google/guava/blob/master/guava/src/com/google/common/base/Utf8.java#L49

@blemale blemale force-pushed the bastien.lemale/feat_send_multiple_samples branch 3 times, most recently from 3fbd013 to ec54735 Compare February 12, 2024 15:28
@blemale
Copy link
Contributor Author

blemale commented Feb 12, 2024

@vickenty does the PR looks good to you now and can be merged?
The 2 red job in the CI seem to be unrelated with this PR:

Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 32.133 sec <<< FAILURE! - in com.timgroup.statsd.NonBlockingStatsDClientPerfTest
perfTest(com.timgroup.statsd.NonBlockingStatsDClientPerfTest)  Time elapsed: 30.007 sec  <<< ERROR!
org.junit.runners.model.TestTimedOutException: test timed out after 30000 milliseconds
        at com.timgroup.statsd.NonBlockingStatsDClientPerfTest.perfTest(NonBlockingStatsDClientPerfTest.java:86)
Chocolatey v2.2.2
Installing the following packages:
maven
By installing, you accept licenses for the packages.
Progress: Downloading maven 3.9.6... 100%

maven v3.9.6 [Approved]
maven package files install completed. Performing other installation steps.
The package maven wants to run 'chocolateyinstall.ps1'.
Note: If you don't run this script, the installation will fail.
Note: To confirm automatically next time, use '-y' or consider:
choco feature enable -n allowGlobalConfirmation
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
Do you want to run the script?([Y]es/[A]ll - yes to all/[N]o/[P]rint): 
Timeout or your choice of '' is not a valid selection.
You must select an answer
maven not installed. An error occurred during installation:
 Too many bad attempts. Stopping before application crash.

Chocolatey installed 0/0 packages. 
 See the log for details (C:\ProgramData\chocolatey\logs\chocolatey.log).
The process cannot access the file 'C:\ProgramData\chocolatey\lib\maven\.chocolateyPending' because it is being used by another process.

@blemale blemale force-pushed the bastien.lemale/feat_send_multiple_samples branch from 2e4d9c3 to 0f0ee89 Compare February 12, 2024 15:59
To compute the available capacity in multi valued messages use the encoded
length of the metadata instead of the length of the string builder as a
character can be encoded on several bytes if not ascii.
@blemale blemale force-pushed the bastien.lemale/feat_send_multiple_samples branch from 0f0ee89 to 82eea75 Compare February 12, 2024 16:19
writeValueTo(builder, i);
if (builder.length() > maxLength) {
builder.setLength(previousLength);
offset += i;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could be missing something, but this looks like it will skip values for chunks after second one: let's say we start with offset=3 and write 4 values at positions 3, 4, 5 and 6 before hitting overflow at i=7. This sets offset to 12 (3+7), loosing samples 7-11.

Copy link
Contributor Author

@blemale blemale Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I have updated the test and the implementation to fix that in d4f3cf8 as i is tracking the absolute index in the array not the relative one compare to the offset.

When we need to split multi valued messages in more than two output messages to
enforce the max packet size there is an issue in the way offset in the values
array is tracked.
@vickenty vickenty merged commit 645501e into DataDog:master Feb 13, 2024
11 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants