-
Notifications
You must be signed in to change notification settings - Fork 224
Use inline assembly in strlen for vector loads #593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This commit is a refinement of WebAssembly#586 to use inline assembly to perform vector loads instead of using a C-defined load. This is done to avoid UB in LLVM where C cannot read either before or after an allocation. When `strlen` is not inlined, as it currently isn't, then there's not really any reasonable path that a compiler could prove that a load was out-of-bounds so this is issue is unlikely in practice, but it nevertheless is still UB. In the future the eventual goal is to move these SIMD routines into header files to avoid needing multiple builds of libc itself, and in such a situation inlining is indeed possible and a compiler would be capable of much more easily seeing the UB which could cause problems. Inline assembly unfortunately doesn't work with vector output parameters on Clang 19 and Clang 20 due to an ICE. This was fixed in llvm/llvm-project#146574 for Clang 21, but it means that the SIMD routines are now excluded with Clang 19 and Clang 20 to avoid compilation errors there.
abrown
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we moved this inline assembly to a .s file we could avoid the #if __clang_major__ ... but you're thinking it would not be inlined here?
|
While possible that would only be inlineable post-LLD through |
sunfishcode
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The incompatibility with LLVM 20 is unfortunate, but I think this is the direction we want to go in, to eliminate dependence on UB.
|
I honestly don't understand what's UB about the current code, nor how this inline assembly helps. Why is dereferencing UB, rather than implementation defined? Conversion of an integer to a pointer is implementation defined (not UB), but could produce an invalid pointer. Dereferencing an invalid pointer would be UB. Is that the rational behind it? If so, this specific solution may make It seems to me that, if anything, what we need is to launder the pointer, which could be done with |
|
I'll try my best to explain things, but I'll also point out that I'm by no means an expert in this field and could very well be wrong. The simplest answer I can give is this RFC for LLVM which says:
in reference to reads past the end of an object where the extra data is ignored. In contrast an overly long answer I can give is likely rooted in the blog post series ending at https://www.ralfj.de/blog/2022/04/11/provenance-exposed.html (that's got a link to previous posts, and note it's also a few years old at this point). My own personal understanding primarily comes from Rust, but I'm under the impression that the same rules/model/etc are all just as equally applicable to C as well. In Rust, after a lot of discussion, it's been concluded that every pointer has provenance. AFAIK it's considered UB to read/write a pointer outside of the provenance that it contains. (If you're not familiar with provenance and feel Ralf doesn't write enough about it there's also Gankra's writing too). My understanding is that, for I'm unable to create example code that leads to a miscompile today. I tried various examples but none of them produced the UB that I was expecting. That means that compilers can probably practically be thwarted with various ptr-to-int or inline asm things. The downside of that though is that it's just a temporary solution until compilers theoretically catch up in the future. With all that I can try to answer some specific questions you have too:
To the best of my understanding, the dereference reads outside the provenance of the original pointer. Thus it's UB for at least some possible arguments to the
Also to the best of my understanding, performing various operations on a pointer like converting it to an integer or throwing it through inline assembly does nothing about the fact that the pointer still has some provenance. It cannot be proven that there exists provenance for the bytes before the allocation or after, and thus it's still fundamentally always UB to read/write outside the provenance of the pointer.
For
I know I sound like I'm repeating myself at this point but while I don't doubt that this probably works today I am under the belief that from a memory model perspective this has no effect. Ralf's writings are partly from the perspective of writing Miri, an UB-detecting interpreter for Rust. In that sense you could imagine writing a similar interpreter for C to detect UB. Assuming it could execute inline assembly (e.g. it also had a full-blown x64 interpreter or something like that) then if you actually executed this code the interpreter would realize that nothing happened to
To address the "how this inline assembly" part here -- the basic problem (again as I'm led to believe) is that performing a read as That's probably a bit more than you were asking for, and I apologize that I cannot answer your questions with the ironclad certainty that would probably help put both your mind and my own to rest. I could very well be wrong in my understandings here and/or inaccurate in places. To the best of my knowledge a lot of this is areas of active research in various communities too. |
Not at all, this was exactly what I was looking for. I guess where I might disagree is that using inline asm to launder the pointer should produce a pointer with no (or unknown/invalid) provenance, and that those pointers should be valid to dereference in C (for DMA, if for no other reason) in an "implementation defined" way, keeping in mind that the conversions are "intended to be consistent with the addressing structure of the execution environment." It's a shame if LLVM feels otherwise and makes these UB. I understand Rust's frustration if it "can't" use LLVM in a sensible way, I have a much harder time accepting that LLVM can't be used to implement "implementation defined" stuff in C/C++ standards in a well-defined way. That said my interest is in moving things forward, because (as I predicted when someone in HN pressed me to contribute these upstream, instead of just letting someone else pickup that work) it's taking an order of magnitude more mental effort to submit a single function that it was to code the entire set, and decisions here have consequences for other functions as well. |
This commit is a refinement of #586 to use inline assembly to perform vector loads instead of using a C-defined load. This is done to avoid UB in LLVM where C cannot read either before or after an allocation. When
strlenis not inlined, as it currently isn't, then there's not really any reasonable path that a compiler could prove that a load was out-of-bounds so this is issue is unlikely in practice, but it nevertheless is still UB. In the future the eventual goal is to move these SIMD routines into header files to avoid needing multiple builds of libc itself, and in such a situation inlining is indeed possible and a compiler would be capable of much more easily seeing the UB which could cause problems.Inline assembly unfortunately doesn't work with vector output parameters on Clang 19 and Clang 20 due to an ICE. This was fixed in llvm/llvm-project#146574 for Clang 21, but it means that the SIMD routines are now excluded with Clang 19 and Clang 20 to avoid compilation errors there.