Skip to content

Cherry-pick #7732 to d17-5 #7746

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 28, 2023
Merged

Conversation

jonpryor
Copy link
Contributor

@jonpryor jonpryor commented Jan 26, 2023

See #7732

@jonpryor jonpryor requested a review from grendello as a code owner January 26, 2023 20:02
Context: https://discord.com/channels/732297728826277939/732297837953679412/1067806732983746621
Context: https://discord.com/channels/732297728826277939/732297837953679412/1067822882274689024

In certain "bad app" scenarios, we would print a message and then
**exit**(3) the app.  The problem with using `exit()` in this manner
is that it *can* result in the app constantly relaunching, "spamming"
the UI.

Improve this by calling **abort**(3) instead.  This prevents Android
from constantly re-launching the app, and also prints useful abort
information such as registers and a native stack trace to `adb logcat`.

Unexpectedly, however, "simply" replacing app calls to `exit()` with
calls to `abort()` resulted in `libmonodroid.so` increasing in size
by ~16%.  For some reason, calling `abort()` caused the compiler to
inline much more code and in a weird manner (it repeated essentially
identical blocks of code from an inlined function, which were
previously folded into a single block when `exit()` was used).

We found that consolidating the `abort()` calls into
`Helpers::abort_application()` removed the size increase.
…otnet#7732)

Fixes: dotnet#7335

Context: d236af5

Commit d236af5 introduced embedded assembly compression, using LZ4,
which speeds up startup and reduces final package size.

Assemblies are compressed at build time and, at the same time, pre-
allocated buffers for the **decompressed** data are allocated in
`libxamarin-app.so`.  The buffers are then passed to the LZ4 APIs,
all threads using the same output buffer.  The assumption was that we
can do fine without locking as even if overlapped decompression
happens, the output data will be the same and so even if two threads
do the same thing at the same time, the data will be valid at all
times, so long as at least one thread completes the decompression.

This assumption proved to be **largely** true, but it appears that in
high concurrency cases it is possible that the data in the
decompression buffer differs.  This can result in app crashes:

	A/libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 3092 (.NET ThreadPool), pid 2727 (myapp.name)
	A/DEBUG: pid: 2727, tid: 3092, name: .NET ThreadPool  >>> myapp.name <<<
	A/DEBUG:       #1 pc 0000000000029b1c  /data/app/myapp.name-B9t_3dF9i8mDxJEKodZw5w==/split_config.arm64_v8a.apk!libmono-android.release.so (offset 0x103d000) (xamarin::android::internal::MonodroidRuntime::mono_log_handler(char const*, char const*, char const*, int, void*)+144) (BuildId: 29c5a3805a0bedee1eede9b6668d7c676aa63371)
	A/DEBUG:       #2 pc 00000000002680bc  /data/app/myapp.name-B9t_3dF9i8mDxJEKodZw5w==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (offset 0x109b000) (BuildId: 4a5dd4396e8816b7f69881838bd549285213d53b)
	A/DEBUG:       dotnet#3 pc 00000000002681e8  /data/app/myapp.name-B9t_3dF9i8mDxJEKodZw5w==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (offset 0x109b000) (BuildId: 4a5dd4396e8816b7f69881838bd549285213d53b)
	A/DEBUG:       dotnet#4 pc 000000000008555c  /data/app/myapp.name-B9t_3dF9i8mDxJEKodZw5w==/split_config.arm64_v8a.apk!libmonosgen-2.0.so (offset 0x109b000) (mono_metadata_string_heap+188) (BuildId: 4a5dd4396e8816b7f69881838bd549285213d53b)
	…

My guess is that LZ4 either uses the output buffer as a scratchpad
area when decompressing or that it initializes/modifies the buffer
before writing actual data in it.  With overlapped decompression, it
may lead to one thread overwriting valid data previously written by
another thread, so that when the latter returns the buffer it thought
to have had valid data may contain certain bytes temporarily
overwritten by the decompression session in the other, still running,
thread.  It may happen that MonoVM reads the corrupted data just when
it is still invalid (before the still running decompression session
actually writes the valid data), a classic race condition.

To fix this, the decompression block is now protected with a startup-
aware mutex.  Mutex will be held only after the initial startup phase
is completed, so there should not be much loss of startup performance.
@jonpryor jonpryor merged commit a200af1 into dotnet:d17-5 Jan 28, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Jan 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants