Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Properly handle Java exceptions without error messages; fix loading of comet native library from java.library.path #982

Merged
merged 1 commit into from
Oct 2, 2024

Conversation

Kontinuation
Copy link
Member

Which issue does this PR close?

This is a minor, non-user-aware fix so I've not created a GtHub issue. A detailed explanation is provided in the next section.

Rationale for this change

I had this small patch in my local workspace for a while, I think it is useful for other comet developers as well. The changes are as follows:

  1. Fix the library name when not loading from the JAR bundle. The documentation of System.loadLibrary states that "The libname argument must not contain any platform specific prefix, file extension or path", so the library name should be comet instead of libcomet.so or libcomet.dylib. This fix makes it easier to load the newly built comet library in the native/target directory when running Java tests.
  2. Handle exceptions without error messages properly in native code. When there's an NPE in Scala code, the NPE exception raised by JVM may not have an error message. The native code will panic with the following error message:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.200.154 executor driver): org.apache.comet.CometNativeException: General execution error with reason: Null pointer in get_object_class.
	at org.apache.comet.Native.executePlan(Native Method)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:107)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1$adapted(CometExecIterator.scala:106)
	at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
	at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:106)
	at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:118)
	at org.apache.comet.CometExecIterator.next(CometExecIterator.scala:135)

It does not point to the source of the NPE, which makes troubleshooting difficult. This patch checks if the error message retrieved by the .getMessage JNI call is null and handles it specially. The error message becomes:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.200.154 executor driver): java.lang.NullPointerException
	at org.apache.comet.CometNativeSuite$$anon$1.next(CometNativeSuite.scala:37)
	at org.apache.comet.CometNativeSuite$$anon$1.next(CometNativeSuite.scala:35)
	at org.apache.comet.CometBatchIterator.next(CometBatchIterator.java:56)
	at org.apache.comet.Native.executePlan(Native Method)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1(CometExecIterator.scala:107)
	at org.apache.comet.CometExecIterator.$anonfun$getNextBatch$1$adapted(CometExecIterator.scala:106)
	at org.apache.comet.vector.NativeUtil.getNextBatch(NativeUtil.scala:157)
	at org.apache.comet.CometExecIterator.getNextBatch(CometExecIterator.scala:106)
	at org.apache.comet.CometExecIterator.hasNext(CometExecIterator.scala:118)
	at org.apache.comet.CometExecIterator.next(CometExecIterator.scala:135)

What changes are included in this PR?

This PR includes fixes for the minor problems mentioned above.

How are these changes tested?

Add a unit test for problem 2.

@Kontinuation Kontinuation changed the title Properly handle Java exceptions without error messages; fix loading of comet native library from java.library.path fix: Properly handle Java exceptions without error messages; fix loading of comet native library from java.library.path Sep 30, 2024
@Kontinuation Kontinuation marked this pull request as ready for review September 30, 2024 11:26
@viirya
Copy link
Member

viirya commented Sep 30, 2024

Thanks @Kontinuation

@andygrove andygrove merged commit afd28b9 into apache:main Oct 2, 2024
75 of 76 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants