Skip to content

Conversation

@alexcrichton
Copy link
Collaborator

This commit is a refinement of #586 to use inline assembly to perform vector loads instead of using a C-defined load. This is done to avoid UB in LLVM where C cannot read either before or after an allocation. When strlen is not inlined, as it currently isn't, then there's not really any reasonable path that a compiler could prove that a load was out-of-bounds so this is issue is unlikely in practice, but it nevertheless is still UB. In the future the eventual goal is to move these SIMD routines into header files to avoid needing multiple builds of libc itself, and in such a situation inlining is indeed possible and a compiler would be capable of much more easily seeing the UB which could cause problems.

Inline assembly unfortunately doesn't work with vector output parameters on Clang 19 and Clang 20 due to an ICE. This was fixed in llvm/llvm-project#146574 for Clang 21, but it means that the SIMD routines are now excluded with Clang 19 and Clang 20 to avoid compilation errors there.

This commit is a refinement of WebAssembly#586 to use inline assembly to perform
vector loads instead of using a C-defined load. This is done to avoid UB
in LLVM where C cannot read either before or after an allocation. When
`strlen` is not inlined, as it currently isn't, then there's not really
any reasonable path that a compiler could prove that a load was
out-of-bounds so this is issue is unlikely in practice, but it
nevertheless is still UB. In the future the eventual goal is to move
these SIMD routines into header files to avoid needing multiple builds
of libc itself, and in such a situation inlining is indeed possible and
a compiler would be capable of much more easily seeing the UB which
could cause problems.

Inline assembly unfortunately doesn't work with vector output parameters
on Clang 19 and Clang 20 due to an ICE. This was fixed in
llvm/llvm-project#146574 for Clang 21, but it
means that the SIMD routines are now excluded with Clang 19 and Clang 20
to avoid compilation errors there.
@alexcrichton alexcrichton mentioned this pull request Jul 2, 2025
Copy link
Collaborator

@abrown abrown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we moved this inline assembly to a .s file we could avoid the #if __clang_major__ ... but you're thinking it would not be inlined here?

@alexcrichton
Copy link
Collaborator Author

While possible that would only be inlineable post-LLD through wasm-opt for example and otherwise I suspect would lead to much worse performance if wasm-opt isn't used, so I'd guess probably not viable for this specifically.

Copy link
Member

@sunfishcode sunfishcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The incompatibility with LLVM 20 is unfortunate, but I think this is the direction we want to go in, to eliminate dependence on UB.

@ncruces
Copy link
Contributor

ncruces commented Jul 2, 2025

I honestly don't understand what's UB about the current code, nor how this inline assembly helps.

Why is dereferencing UB, rather than implementation defined?

Conversion of an integer to a pointer is implementation defined (not UB), but could produce an invalid pointer. Dereferencing an invalid pointer would be UB. Is that the rational behind it?

If so, this specific solution may make strlen work, but I don't see how it can make (e.g.) memchr work: it needs to return a valid pointers, so if we assume we can't conjure valid pointers from integer arithmetic, we're toast.

It seems to me that, if anything, what we need is to launder the pointer, which could be done with __asm__ ("" : "+r"(v)); at the top of the loop: in goes a pointer the compiler might know is invalid, out comes the same pointer which the compiler can't possibly tell is invalid. This is compatible with clang 19 and 20.

@ncruces ncruces mentioned this pull request Jul 2, 2025
@alexcrichton
Copy link
Collaborator Author

I'll try my best to explain things, but I'll also point out that I'm by no means an expert in this field and could very well be wrong.

The simplest answer I can give is this RFC for LLVM which says:

However, I believe that currently, there is no well-defined way to actually achieve this on the LLVM IR level.

in reference to reads past the end of an object where the extra data is ignored.

In contrast an overly long answer I can give is likely rooted in the blog post series ending at https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html (that's got a link to previous posts, and note it's also a few years old at this point).

My own personal understanding primarily comes from Rust, but I'm under the impression that the same rules/model/etc are all just as equally applicable to C as well. In Rust, after a lot of discussion, it's been concluded that every pointer has provenance. AFAIK it's considered UB to read/write a pointer outside of the provenance that it contains. (If you're not familiar with provenance and feel Ralf doesn't write enough about it there's also Gankra's writing too).

My understanding is that, for strlen, there's a pointer passed in and this pointer has some provenance. Attempting to launder the pointer through a variety of means like ptr-to-int and then int-to-ptr casts or inline asm can be tools to thwart compiler optimizations today but there's also no fundamental change to the fact that it is indeed a pointer and it has some provenance. The provenance may not include the bytes before the string or after the string, it's something that strlen doesn't know. This all means that from a memory model perspective reads before an allocation or after are out-of-bounds provenance-wise and thus UB.

I'm unable to create example code that leads to a miscompile today. I tried various examples but none of them produced the UB that I was expecting. That means that compilers can probably practically be thwarted with various ptr-to-int or inline asm things. The downside of that though is that it's just a temporary solution until compilers theoretically catch up in the future.

With all that I can try to answer some specific questions you have too:

Why is dereferencing UB, rather than implementation defined?

To the best of my understanding, the dereference reads outside the provenance of the original pointer. Thus it's UB for at least some possible arguments to the strlen function.

Conversion of an integer to a pointer is implementation defined (not UB), but could produce an invalid pointer. Dereferencing an invalid pointer would be UB. Is that the rational behind it?

Also to the best of my understanding, performing various operations on a pointer like converting it to an integer or throwing it through inline assembly does nothing about the fact that the pointer still has some provenance. It cannot be proven that there exists provenance for the bytes before the allocation or after, and thus it's still fundamentally always UB to read/write outside the provenance of the pointer.

If so, this specific solution may make strlen work, but I don't see how it can make (e.g.) memchr work: it needs to return a valid pointers, so if we assume we can't conjure valid pointers from integer arithmetic, we're toast.

For memchr one possible solution is to calculate the offset to the byte being searched for. The return value of the function is then the original pointer plus this offset. The result of the search is within the provenance of the original pointer and thus the pointer arithmetic to return a new pointer has appropriate provenance on it as well.

It seems to me that, if anything, what we need is to launder the pointer, which could be done with asm ("" : "+r"(v)); at the top of the loop: in goes a pointer the compiler might know is invalid, out comes the same pointer which the compiler can't possibly tell is invalid.

I know I sound like I'm repeating myself at this point but while I don't doubt that this probably works today I am under the belief that from a memory model perspective this has no effect. Ralf's writings are partly from the perspective of writing Miri, an UB-detecting interpreter for Rust. In that sense you could imagine writing a similar interpreter for C to detect UB. Assuming it could execute inline assembly (e.g. it also had a full-blown x64 interpreter or something like that) then if you actually executed this code the interpreter would realize that nothing happened to v. In the end v still has some provenance and it doesn't include the bytes before or after the pointer, and thus the eventual reads are UB.

I honestly don't understand what's UB about the current code, nor how this inline assembly helps.

To address the "how this inline assembly" part here -- the basic problem (again as I'm led to believe) is that performing a read as *v means that the C and LLVM memory models have to define what this operation does. This is where provenance and all that fun business comes into play. With inline assembly, however, all the compiler knows is that an address goes in and a value comes out. The C/LLVM memory model aren't in play at all (AFAIK at least) which means that the definition of the assembly itself is all that matters which, in this case, reads are always allowed so long as the address is valid (aka wasm semantics).


That's probably a bit more than you were asking for, and I apologize that I cannot answer your questions with the ironclad certainty that would probably help put both your mind and my own to rest. I could very well be wrong in my understandings here and/or inaccurate in places. To the best of my knowledge a lot of this is areas of active research in various communities too.

@sunfishcode sunfishcode merged commit 88e4427 into WebAssembly:main Jul 3, 2025
15 checks passed
@ncruces
Copy link
Contributor

ncruces commented Jul 3, 2025

That's probably a bit more than you were asking for

Not at all, this was exactly what I was looking for.

I guess where I might disagree is that using inline asm to launder the pointer should produce a pointer with no (or unknown/invalid) provenance, and that those pointers should be valid to dereference in C (for DMA, if for no other reason) in an "implementation defined" way, keeping in mind that the conversions are "intended to be consistent with the addressing structure of the execution environment."

It's a shame if LLVM feels otherwise and makes these UB. I understand Rust's frustration if it "can't" use LLVM in a sensible way, I have a much harder time accepting that LLVM can't be used to implement "implementation defined" stuff in C/C++ standards in a well-defined way.

That said my interest is in moving things forward, because (as I predicted when someone in HN pressed me to contribute these upstream, instead of just letting someone else pickup that work) it's taking an order of magnitude more mental effort to submit a single function that it was to code the entire set, and decisions here have consequences for other functions as well.

ncruces added a commit to ncruces/wasi-libc that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants