[wasm][coreclr] Improve primary reverse thunk lookup#124730
[wasm][coreclr] Improve primary reverse thunk lookup#124730radekdoulik wants to merge 1 commit intodotnet:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request improves the WASM/CoreCLR reverse thunk lookup mechanism to prevent undetectable hash collisions that were occurring after recent changes to the runtime. The previous implementation used AssemblyFQName.Hash() ^ token as the lookup key, which could produce collisions when tokens changed across builds while the assembly name remained the same. The fix switches to using MVID.Hash() ^ token as the primary key and adds exact verification of both the MVID and token after a hash match to detect stale entries.
Changes:
- Replaced assembly fully-qualified name with MVID (Module Version ID) in the primary hash key computation, ensuring the key changes with every compilation
- Added exact verification of MVID string and token after hash lookups to detect collisions
- Added exact verification of fallback source string after fallback hash lookups
- Added diagnostic logging to aid in debugging reverse thunk lookup issues
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/tasks/WasmAppBuilder/coreclr/PInvokeCollector.cs | Captures MVID from each module for use in hash key generation |
| src/tasks/WasmAppBuilder/coreclr/PInvokeTableGenerator.cs | Generates MVID array and embeds Token, MVIDIndex, and FallbackSource in ReverseThunkMapValue for runtime verification |
| src/coreclr/vm/wasm/callhelpers.hpp | Adds Token, MVIDIndex, and FallbackSource fields to ReverseThunkMapValue structure |
| src/coreclr/vm/wasm/helpers.cpp | Implements MVID-based key creation and adds exact verification logic for both primary and fallback lookups, plus diagnostic logging |
| src/coreclr/vm/wasm/callhelpers-reverse.cpp | Generated file containing the MVID array and updated reverse thunk entries with new verification fields |
| var fsName = FixedSymbolName(cb, Log); | ||
|
|
||
| return $" {{ {cb.Token ^ HashString(cb.AssemblyFQName)}, {HashString(cb.Key)}, {{ &MD_{fsName}, (void*)&Call_{cb.EntrySymbol} }} }} /* alternate key source: {cb.Key} */"; | ||
| return $" {{ {cb.Token ^ HashString(cb.MVID)}, {HashString(cb.Key)}, {{ &MD_{fsName}, (void*)&Call_{cb.EntrySymbol}, {cb.Token}, {mvidIndexMap[cb.MVID]}, \"{cb.Key}\" }} }}"; |
There was a problem hiding this comment.
The cb.Key value is embedded in the generated C++ string literal without escaping. If the key contains special characters like quotes or backslashes (e.g., from compiler-generated method names or unusual namespaces), the generated C++ code will be invalid. The key should be escaped using EscapeLiteral(cb.Key) instead of directly embedding cb.Key in the string.
| return $" {{ {cb.Token ^ HashString(cb.MVID)}, {HashString(cb.Key)}, {{ &MD_{fsName}, (void*)&Call_{cb.EntrySymbol}, {cb.Token}, {mvidIndexMap[cb.MVID]}, \"{cb.Key}\" }} }}"; | |
| return $" {{ {cb.Token ^ HashString(cb.MVID)}, {HashString(cb.Key)}, {{ &MD_{fsName}, (void*)&Call_{cb.EntrySymbol}, {cb.Token}, {mvidIndexMap[cb.MVID]}, \"{EscapeLiteral(cb.Key)}\" }} }}"; |
|
Do we need to bother with matching tokens at all in the temporary call helpers solution? Corelib and other assemblies are changing all the time, so the checked in call helpers map is going to use the fallback path like 99% of the time. Once we teach crossgen how to generate this map. I expect that we are going to use the optimized native R2R hashtable to store these. We won't use the token matching algorithm used by call helpers currently. |
Not really, it was nice exercise with copilot to fix the broken primary keys. The other parts are more useful from this view point. The validity checks and logging. That would be useful to investigate the problem we had, it was painful to find out what broke. |
|
Tagging subscribers to 'arch-wasm': @lewing, @pavelsavara |
Can we delete the code that deals with MVIDs and tokens from this PR then? |
AaronRobinsonMSFT
left a comment
There was a problem hiding this comment.
I agree with Jan regarding the fallback path being the most common case. Let's see what we can remove here.
| const char* pszLookupName = pMD->GetMethodTable()->GetFullyQualifiedNameInfo(&pszLookupNamespace); | ||
| LOG((LF_STUBS, LL_INFO100000, "WASM lookupThunk pMD: %s.%s::%s\n", pszLookupNamespace ? pszLookupNamespace : "", pszLookupName, pMD->GetName())); | ||
| } | ||
| #endif |
There was a problem hiding this comment.
| #endif | |
| #endif // LOGGING |
The previous reverse thunk lookup keyed on AssemblyFQName.Hash() ^ token, which could produce undetectable collisions — tokens can change across builds while the display name stays the same, so a hash match doesn't guarantee the correct thunk. This already happened after #124303 landed and #123377 started failing.
This change switches the primary key to MVID.Hash() ^ token. The MVID changes with every compilation, so we can detect whether the tokens used to build the key are still valid. After a hash match, the lookup now verifies the MVID and token exactly. The fallback path similarly verifies the full fallback source string after its hash lookup.
Also added diagnostic logging.