Skip to content

Conversation

@simonrozsival
Copy link
Member

Fixes #10314

In #10198 I added code which attaches current thread to JNI (https://github.com/dotnet/android/pull/10198/files#diff-fc0414b3741163879db7993fd1fb42fa76e5305bb54c0e9478e077b3094e7aa7R42-R46). If this code is called from a thread pool thread, the thread won't be detached from JNI when exitting.

This PR adds the recommended steps to automatically detach the thread from JNI when exitting: https://developer.android.com/training/articles/perf-jni#threads

/cc @grendello @jonathanpeppers

Copy link
Member

@jonathanpeppers jonathanpeppers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to add a test for this?

Or did MAUI's tests crash on exit? So, we'd need to somehow run an extra test suite?

@jonpryor
Copy link
Contributor

What's odd is that I think we have a test for this? Maybe?

From the abort message in #10314:

7-21 15:25:22.624  1610  1610 F DEBUG   : Abort message: 'art/runtime/thread.cc:1223] Native thread exited without calling DetachCurrentThread: Thread[15,tid=4355,Native,Thread*=0x7f3c09434600,peer=0x12c89580,"Thread-234"]'

which suggests that if we:

  1. Create a new Thread
  2. Do some JNI work on that thread, and
  3. Exit the thread

then the assertion should be triggered, no?

We have tests that do that, e.g.:

[Test]
public void ConversionsAndThreadsAndInstanceMappingsOhMy ()
{
IntPtr lrefJliArray = JNIEnv.NewObjectArray<int> (new[]{1});
IntPtr grefJliArray = JNIEnv.NewGlobalRef (lrefJliArray);
JNIEnv.DeleteLocalRef (lrefJliArray);
Java.Lang.Object[] jarray = (Java.Lang.Object[])
JNIEnv.GetArray (grefJliArray, JniHandleOwnership.DoNotTransfer, typeof(Java.Lang.Object));
Exception ignore_t1 = null;
Exception ignore_t2 = null;
var t1 = new Thread (() => {
int[] output_array1 = new int[1];
for (int i = 0; i < 2000; ++i) {
Console.WriteLine ("# t1 iter: {0}", i);
try {
JNIEnv.CopyObjectArray (grefJliArray, output_array1);
} catch (Exception e) {
ignore_t1 = e;
break;
}
}
});
var t2 = new Thread (() => {
for (int i = 0; i < 2000; ++i) {
Console.WriteLine ("# t2 iter: {0}", i);
try {
JNIEnv.GetArray<int>(jarray);
} catch (Exception e) {
ignore_t2 = e;
break;
}
}
});
t1.Start ();
t2.Start ();
t1.Join ();
t2.Join ();
for (int i = 0; i < jarray.Length; ++i) {
jarray [i].Dispose ();
jarray [i] = null;
}
JNIEnv.DeleteGlobalRef (grefJliArray);
Assert.IsNull (ignore_t1, string.Format ("No exception should be thrown [t1]! Got: {0}", ignore_t1));
Assert.IsNull (ignore_t2, string.Format ("No exception should be thrown [t2]! Got: {0}", ignore_t2));
}

so why didn't we hit this before?

Are thread pool threads different from "normal" new System.Threading.Thread()?

Do we need to do "different" kinds of JNI calls on the created thread to trigger the assertion? It likely wouldn't be a bad idea to have a test that does:

var t = new Thread(() => {
    var list = new Java.Util.ArrayList();
    list.Add(new Java.Lang.String("a");
    list.Add(new Java.Lang.Integer(42);
});
t.Start();
t.Join();

and do a bit more than straight JNIEnv calls on the created thread.

The assert message, as-is, does not imply to me that we need to worry about app exit. (And that's ignoring the fact that "app exit" is a very nebulous concept on Android in the first place!)

@grendello
Copy link
Contributor

@jonpryor I think the situation here is a bit different, at least from looking at the crash stack trace attached to #10314 (comment)

07-21 14:55:47.473  2121  2121 F DEBUG   :     #00 pc 0000000000539ff7  /system/lib64/libart.so (art::DumpCheckpoint::Run(art::Thread*)+439)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #01 pc 000000000053cac6  /system/lib64/libart.so (art::ThreadList::RunCheckpoint(art::Closure*)+278)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #02 pc 000000000053ddf5  /system/lib64/libart.so (art::ThreadList::Dump(std::__1::basic_ostream<char, std::__1::char_traits<char> >&)+357)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #03 pc 00000000004ffea4  /system/lib64/libart.so (art::Runtime::Abort()+660)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #04 pc 0000000000178d71  /system/lib64/libart.so (art::LogMessage::~LogMessage()+2865)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #05 pc 00000000005276ff  /system/lib64/libart.so (art::Thread::ThreadExitCallback(void*)+223)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #06 pc 000000000008590e  /system/lib64/libc.so (pthread_key_clean_all()+142)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #07 pc 000000000008549b  /system/lib64/libc.so (pthread_exit+75)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #08 pc 0000000000084ef6  /system/lib64/libc.so (__pthread_start(void*)+54)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #09 pc 00000000000296eb  /system/lib64/libc.so (__start_thread+11)
07-21 14:55:47.473  2121  2121 F DEBUG   :     #10 pc 000000000001ce55  /system/lib64/libc.so (__bionic_clone+53)

This is 100% native + ART. @simonrozsival might be onto something with this PR (btw, why was it closed?) the thread mentioned in the abort message (PID 4651) is created to run a test:

07-21 14:55:47.368  4496  4651 D monodroid: [precompiled] p/invoke found
07-21 14:55:47.375  4496  4651 D monodroid: clr_external_assembly_probe ("xunit.assert.ni.dll"...)
07-21 14:55:47.375  4496  4651 D monodroid: clr_external_assembly_probe ("xunit.assert.dll"...)
07-21 14:55:47.383  4496  4651 I DOTNET  :      [PASS] Charge_State
07-21 14:55:47.393  4496  4651 I DOTNET  :      [PASS] Charge_Level
07-21 14:55:47.395  4496  4651 I DOTNET  :      [PASS] Charge_Power
07-21 14:55:47.396  4496  4651 I DOTNET  :      [PASS] Unsubscribe_BatteryInfoChanged_Does_Not_Crash
07-21 14:55:47.399  4496  4651 I DOTNET  :      [PASS] App_Is_Not_Lower_Power_mode
07-21 14:55:47.407  4496  4651 W art     : Native thread exiting without having called DetachCurrentThread (maybe it's going to use a pthread_key_create destructor?): Thread[17,tid=4651,Native,Thread*=0x7fc955cf4a00,peer=0x12c5a3a0,"Thread-212"]
07-21 14:55:47.408  4496  4651 F art     : art/runtime/thread.cc:1223] Native thread exited without calling DetachCurrentThread: Thread[17,tid=4651,Native,Thread*=0x7fc955cf4a00,peer=0x12c5a3a0,"Thread-212"]
07-21 14:55:47.408  4496  4646 F DOTNET  : Aborting process.

If the GC bridge kicks in while the test is running and on the same thread that crashes, this would explain the error. Our test assumes that calling a JNI method actually attaches the thread to JNI, which might not be the case.
In the MonoVM runtime we always attach every thread automatically, this is not necessarily the case with CoreCLR (or it might be a bug in CoreCLR - we don't control how/when it attaches threads)

@simonrozsival
Copy link
Member Author

btw, why was it closed?

I was not able to finish the work on the PR and fully test it, so I closed it for the time being. We can definitely revisit the PR.

If the GC bridge kicks in while the test is running and on the same thread that crashes, this would explain the error.

I don't know if this could be caused by the GC itself (GC has a background thread and the GC bridge has its own separate background thread, both of which are attached to JNI but both should live thoughout app's lifetime). The changes to the ensure_jnienv method affect other pieces of code, not just the GC bridge.

My hypothesis was that when this code is reached on a threadpool thread then the TP thread is attached to JNI and it is likely that this thread will be exitted at some point. This would cause the app to crash because we don't have the automatic detach machanism implemented. I wasn't able to reproduce the crash locally before I left on vacation, so I abandoned this attempt at that time.

@grendello
Copy link
Contributor

@simonrozsival does threadpool actually attach threads to JNI? If I remember correctly, both MonoVM and CoreCLR use the same managed threadpool implementation? If it is indeed true, then the code crashing on CoreCLR but working on MonoVM would actually be a runtime issue. On MonoVM we get called back whenever any managed thread is created/destroyed, so we can attach/detach accordingly. On CoreCLR we don't have this opportunity, so it is on the runtime to do the right thing.

I used GC bridge as an example of what might occur, it's hard to tell from the stack trace that this is really the case - as you say, it can be anything that creates the thread, attaches it, but then fails to detach. If it's a managed thread, then it will exit only after the workload running it it finishes (I hope that this is the case), so there's still the chance to do the right thing there, a chance that maybe is missed for some reason?

TBH, I would love if CoreCLR called us back the same way MonoVM does on any thread creation/distruction, so that we can control things like JNI/JVM thread attachment/detachment.

@simonrozsival
Copy link
Member Author

@simonrozsival does threadpool actually attach threads to JNI? If I remember correctly, both MonoVM and CoreCLR use the same managed threadpool implementation?

If at some point the ensure_jnienv method is called and the thread isn't attached to JNI yet, it will be attached to JNI. So, if we register the automatic cleanup at this point, it should work just fine and there is no need for us to attach threads to JNI unnecessarily.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 4, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Running device tests for CoreCLR failing on maui

5 participants