-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-37413: [Java] ZSTD compression issue #38396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@lidavidm Does this minor change need a test case? though I am not exactly sure how to write one. |
|
I assume you can come up with some data that is incompressible, which would overflow the buffer in the old code, and that should cause an exception when writerIndex is set, or else it would have caused a compress-decompress roundtrip to fail @shujiewu did you observe an exception in your code? Or just silently corrupted data? |
|
@lidavidm There was no error reported when writing compressed data, but an error occurred while decompressing the data. The error message is "ZSTD compression failed: Destination buffer is too small". This may not be easy to reproduce and only occurs on specific data. |
|
@lidavidm I will evaluate this. Thank you. |
I don't understand how this can occur, since we're passing a buffer size that's larger than the value returned by |
|
For the record, the change in this PR is correct, but it's unclear to me why it would fix the mentioned compression error. |
|
@lidavidm I did a few tests, by generating randomized data as follows; private static void writeRandomIntData(IntVector vector, int valueCount) {
Random random = new Random();
for (int i = 0; i < valueCount; i++) {
vector.setSafe(i, random.nextInt());
}
vector.setValueCount(valueCount);
}Used a function like this, assuming randomized data would be hard to compress. And I didn't get the expected exception. Also having a very smaller dataset could also make it hard to compress with (less history, specially with Zstd), and it didn't work as well. I might be wrong with my conclusion or something could be erroneous in my test data, appreciate your feedback here. |
|
Hmm, true. @shujiewu do you have a stack trace? |
|
Sorry, it's a mistake, the error message is "ZSTD decompression failed: Destination buffer is too small". We used the Java library for data compression and then the C++ library for decompression. |
|
Most cases work well, only special data throws exceptions when decompressed, the code location is |
|
Ah, thanks for the clarification! @vibhatha you should print out the buffer lengths in the problematic function and confirm that the overflow is actually happening. |
|
@lidavidm would it be worth while if we test with the mentioned |
|
I think decompressing in java might throw an exception as well. I have some data that may throw an exception, but sensitive |
|
If you're not able to share the data with us, then I would suggest that you try diagnosing it yourself. Start with printing out the various sizes on the compression front (in Java), then print those values when decompressing (in C++). If you notice any discrepancy, then this might lead you to the underlying cause. |
|
Closing this PR since required data is not available to replicate the error. |
|
|
Rationale for this change
This PR fixes apache/arrow-java#169
What changes are included in this PR?
A minor change for a function parameter which fixes the available space in the destination buffer.
Are these changes tested?
No
Are there any user-facing changes?
No