Skip to content

GH-759: Get length of byte[] in TryCopyLastError #760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

hnwyllmm
Copy link
Contributor

@hnwyllmm hnwyllmm commented May 20, 2025

What's Changed

We should get the length of byte[] by GetArrayLength, not strlen which may cause invalid memory access.

Closes #759.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to add a test?

@hnwyllmm
Copy link
Contributor Author

hnwyllmm commented May 21, 2025

The error message in CI doesn't seem to be related to my PR:

/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: cannot find -lutf8proc
  collect2: error: ld returned 1 exit status

linkage: https://github.com/apache/arrow-java/actions/runs/15140171347/job/42594091462?pr=760

Do you have some suggestions?

@hnwyllmm
Copy link
Contributor Author

Is it possible to add a test?

I'll try to add a test case in integration_tests.py.

@kou
Copy link
Member

kou commented May 21, 2025

The error message in CI doesn't seem to be related to my PR:

/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: cannot find -lutf8proc
  collect2: error: ld returned 1 exit status

linkage: https://github.com/apache/arrow-java/actions/runs/15140171347/job/42594091462?pr=760

Do you have some suggestions?

Could you open a separated issue for it?
Let's fix it as a separated task.

@lidavidm
Copy link
Member

lidavidm commented May 21, 2025

Is it possible to add a test?

I'll try to add a test case in integration_tests.py.

Do you need an integration test? IMO, you should be able to export a stream, then immediately import it, all within Java.

@hnwyllmm
Copy link
Contributor Author

The error message in CI doesn't seem to be related to my PR:

/opt/rh/devtoolset-10/root/usr/libexec/gcc/x86_64-redhat-linux/10/ld: cannot find -lutf8proc
  collect2: error: ld returned 1 exit status

linkage: https://github.com/apache/arrow-java/actions/runs/15140171347/job/42594091462?pr=760
Do you have some suggestions?

Could you open a separated issue for it? Let's fix it as a separated task.

issue: #767

@kou
Copy link
Member

kou commented May 24, 2025

Thanks. (We've fixed this problem in apache/arrow. Sorry for not sharing it here...)

@hnwyllmm
Copy link
Contributor Author

Is it possible to add a test?

I'll try to add a test case in integration_tests.py.

Do you need an integration test? IMO, you should be able to export a stream, then immediately import it, all within Java.

I add a unittest but I think it can't make sure TryCopyLastError works correctly, so I suggest it should be removed.
We can read memory after the error string address in most cases. We usually check such problems by sanity tools.

@lidavidm lidavidm changed the title fix get length of byte[] in TryCopyLastError GH-759: Get length of byte[] in TryCopyLastError May 25, 2025

This comment has been minimized.

@lidavidm
Copy link
Member

Can't the test validate that the exception has the right error message after making it through the FFI boundary?

@lidavidm lidavidm added the bug-fix PRs that fix a big. label May 25, 2025
@github-actions github-actions bot added this to the 18.4.0 milestone May 25, 2025
@hnwyllmm
Copy link
Contributor Author

Can't the test validate that the exception has the right error message after making it through the FFI boundary?

PTAL.
I use a real exception message in the ExceptionTest.

root.setRowCount(4);
batches.add(unloader.getRecordBatch());

final String exceptionMessage = "java.lang.RuntimeException: Error occurred while getting next schema root.\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:205)\n\tat com.oceanbase.external.jdbc.JdbcScanner.loadNextBatch(JdbcScanner.java:73)\n\tat org.apache.arrow.c.ArrayStreamExporter$ExportedArrayStreamPrivateData.getNext(ArrayStreamExporter.java:72)\nCaused by: java.lang.RuntimeException: Error occurred while consuming data.\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.consumeData(ArrowVectorIterator.java:127)\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.load(ArrowVectorIterator.java:178)\n\tat org.apache.arrow.adapter.jdbc.ArrowVectorIterator.next(ArrowVectorIterator.java:198)\n\t... 2 more\nCaused by: java.lang.OutOfMemoryError: Java heap space\n";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Um, this is a little excessive. Can we just use a short canary string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

}
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to assert that we did indeed get an exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE

Comment on lines 97 to 99
assertThat(eMessage.length()).isGreaterThan(expectExceptionMessage.length() + 1);
assertThat(eMessage.substring(eMessage.length() - expectExceptionMessage.length() - 1, eMessage.length() - 1))
.isEqualTo(expectExceptionMessage);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use contains...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we must make sure the string in the exception is the same with we throws.
"abc\x037" contains "abc", but it's not correct and we want to catch the bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...then you just want endsWith?

Copy link
Contributor Author

@hnwyllmm hnwyllmm Jun 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use endsWith to simplified the code.

Comment on lines 90 to 102
for (Object batch : batches) {
try {
reader.loadNextBatch();
} catch (Exception e) {
assertThat(exceptionThrowed).isEqualTo(null);
final String eMessage = e.getMessage();
// 1 for '}', ref to CDataJniException
assertThat(eMessage.length()).isGreaterThan(expectExceptionMessage.length() + 1);
assertThat(eMessage.substring(eMessage.length() - expectExceptionMessage.length() - 1, eMessage.length() - 1))
.isEqualTo(expectExceptionMessage);
exceptionThrowed = e;
continue;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use assertThrows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DONE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-fix PRs that fix a big.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

coredump in TryCopyLastError
3 participants