Skip to content

Fix bug where ASan shadow region was overlapping static data #15806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 17, 2021
Merged

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Dec 16, 2021

This bug was caused by miscalculating the adjustments made to TOTAL_MEMORY
and GLOBAL_BASE when using ASan. The bug was introduced in #14631 and
was causing ASan to write poison values over static data in some cases.

This bug was really only effecting builds without ALLOW_MEMORY_GROWTH
since when ALLOW_MEMORY_GROWTH is enabled that shadow region is very
large and unless users were use a lot of memory it was unlikely ASan
would write to the end of it (and clobber static data).

As well as fixing the calculations, this change also adds a runtime
assert to ensure these regions never overlap (at least not at startup).
This should catch regressions in these calculations.

In order to test this case I removed ALLOW_MEMORY_GROWTH form several
asan and lsan tests. It should not longer be necessary to set
ALLOW_MEMORY_GROWTH or adjust TOTAL_MEMORY when using sanitizers
these days. The asan suite in the core tests still uses
ALLOW_MEMORY_GROWTH. As a followup we might want to look into
removing that too.

Fixes: #15762

@sbc100 sbc100 requested review from tlively and kripken December 16, 2021 20:14
This bug was caused by miscalculating the adjustments make to TOTAL_MEMORY
and GLOBAL_BASE and using ASan.  The bug was introduced in #14631 and
was causing ASan to write poison values over static data in some cases.

This bug was really only effecting builds without ALLOW_MEMORY_GROWTH
since when ALLOW_MEMORY_GROWTH is enabled that shadow region is very
large and unless users were use a lot of memory it was unlikely ASan
would write to the end of it (and clobber static data).

As well as fixing the calculations, this change also adds a runtime
assert to ensure these regions never overlap (at least not at startup).
This should catch regressions in these calculations.

In order to test this case I removed ALLOW_MEMORY_GROWTH form several
asan and lsan tests.  It should not longer be necessary to set
`ALLOW_MEMORY_GROWTH` or adjust `TOTAL_MEMORY` when using sanitizers
these days.  The `asan` suite in the core tests still uses
`ALLOW_MEMORY_GROWTH`.  As a followup we might want to look into
removing that too.

Fixes: #15762
# to find the total memory and the shadow size based on the fact
# that the user memory is 7/8ths of the total memory.
# (i.e. user_mem == total_mem * 7 / 8
total_mem = user_mem * 8 / 7
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly worried about rounding errors here, since I assume ASan rounds up, that is, isn't the size of shadow memory potentially slightly more than user memory / 7?

I would compute this is

shadow_mem = ..however asan rounds..
total_mem = user_mem + shadow_mem

Or does ASan actually compute it as a fraction of total memory, not user memory? In that case your code lgtm

Copy link
Collaborator Author

@sbc100 sbc100 Dec 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ASan literally does a shift to from a pointer to its shadow byte:

#define MEM_TO_SHADOW(mem) ((mem) >> SHADOW_SCALE)

The problem with trying to start by calculating the size of the shadow region is that you can't do that without also knowing the size of the shadow region.. since the bigger the shadow region the more shadow region you need because it shift the reset of the memory upwards as it grows.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion that I've added to the code I think means that if these calculation are wrong we would know right away.. at least if we ever under-estimate the size of the shadow region which is the dangerous direction.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait if SHADOW_SCALE=3 then total memory is 9/8 times user memory, isn't it, and not 8/7?

Copy link
Collaborator Author

@sbc100 sbc100 Dec 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because SHADOW_SCALE=3 the size of the shadow memory is 1/8th size of the total memory. But shadow memory is also part of the total memory and because shadow memory leaves at address zero that leave 7/8ths of the total memory left, for what we are calling user memory.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For what its worth the calculation of __global_base in emcc.py is puts its exactly at the address calculated for max_shadow_address in the native code.. which seems like fairly good evidence that the calculation here are correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. sgtm

@sbc100 sbc100 enabled auto-merge (squash) December 17, 2021 00:47
@sbc100 sbc100 disabled auto-merge December 17, 2021 01:07
@sbc100
Copy link
Collaborator Author

sbc100 commented Dec 17, 2021

Landing despite unrelated browser.test_emscripten_async_wget2_data failure. That looks like flakyness on the part of the remote host used by that test.

@sbc100 sbc100 merged commit 49dfe38 into main Dec 17, 2021
@sbc100 sbc100 deleted the asan_mem_size branch December 17, 2021 01:17
mmarczell-graphisoft pushed a commit to GRAPHISOFT/emscripten that referenced this pull request Jan 5, 2022
…ten-core#15806)

This bug was caused by miscalculating the adjustments made to TOTAL_MEMORY
and GLOBAL_BASE and using ASan.  The bug was introduced in emscripten-core#14631 and
causes ASan to write poison values over static data in some cases.

This bug was really only effecting builds without ALLOW_MEMORY_GROWTH
since when ALLOW_MEMORY_GROWTH is enabled that shadow region is very
large and unless users were use a lot of memory it was unlikely ASan
would write to the end of it (and clobber static data).

As well as fixing the calculations, this change also adds a runtime
assert to ensure these regions never overlap (at least not at startup).
This should catch regressions in these calculations.

In order to test this case I removed ALLOW_MEMORY_GROWTH form several
asan and lsan tests.  It should not longer be necessary to set
`ALLOW_MEMORY_GROWTH` or adjust `TOTAL_MEMORY` when using sanitizers
these days.  The `asan` suite in the core tests still uses
`ALLOW_MEMORY_GROWTH`.  As a followup we might want to look into
removing that too.

Fixes: emscripten-core#15762
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ASan error inside pthread_create and malloc
3 participants