-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bit-level undef/poison #366
Comments
Here's a small example for motivation
With clang 8.0 this compiles to
The load at Material in the LLVM literature suggests that some of the LLVM optimizations do not treat
However, my understanding is that other optimization passes do treat Based on this, I think if we want to support undefined values, we will need a separate bitmask representing whether each is defined, and these values will need to be stored/loaded from memory. We would also need to define the semantics of what it means for a bit to be undefined in all LLVM operations. LLVM also has a related concept called "poison". I think the one difference between the two is that My recommendation would be to have a very conservative model that has bit-level Related reading: [llvm-dev] RFC: Killing undef and spreading poison Taming Undefined Behavior in LLVM These both have value level rather than bitlevel poison, which doesn't help with the bitfields problem. Bitlevel poison also lets |
Yeah, as @joehendrix mentioned, the main obstacle to this is the fact that the LLVM devs haven't come to a coherent understanding of what poison and undef actually mean, so the correctness story is pretty tricky. I'm reluctant to do a bunch of work given one understanding of the semantics of these things, only to have to redo it later. However, it seems we're encountering real examples of programs we'd like to verify that we can't, so perhaps a fairly conservative expansion of the semantics is warranted. I think I agree with Joe that a simple bitmask for definedness would get us most of what we need. I suggest that bitwise operations would operate bitwise on definedness, and arithmetic operations would require all input bits to be defined, otherwise the result is totally undefined. Likewise, pointer operations require totally defined values. It would become UB to branch on an undefined bit. Select values on an undefined bit becomes undefined. We'd have to make case-by-case decisions about what to do with intrinsics. I think that probably keeps us from verifying any programs that LLVM considers UB, but still allows some basic patterns we're seeing in the wild. Of course, the devil is in the details, and it would be quite a bit harder to get good error reporting... |
I think bit- vs. value-level undef and poison is in the category of things that it would be reasonable to allow an advanced user to configure. We can be strict by default and point users to documentation that covers the nuances of these decisions and how to enable or disable various checks in error messages. |
Since we first opened this issue, LLVM has added the The express purpose of this instruction is to stop the propagation of undef and poison values. I think this instruction first appeared in LLVM 10 or 11. We should perhaps revisit this now, and see if consensus has arisen about these issues. |
LLVM 12 also introduces a dedicated |
We handle `freeze` instructions in `crucible-llvm` by simply passing the argument value unchanged. This isn't the intended semantics of `freeze` (see https://releases.llvm.org/12.0.0/docs/LangRef.html#id323), where `undef` or `poison` arguments should result in `freeze` returning an arbitrary value. Unfortunately, Crucible isn't quite equipped to accurately determine if an given value is `undef` or `poison` in all situations (see #366), so this is the best that we can do for now.
This bumps the `llvm-pretty` submodule to bring in the following PRs: * GaloisInc/llvm-pretty#86 (support LLVM 11 `IsDefault` fields in metadata) * GaloisInc/llvm-pretty#87 (Relax the Symbol requirement to `ConstBlockAddr`) * GaloisInc/llvm-pretty#88 (add LLVM12 poison value) This also bumps the `llvm-pretty-bc-parser` submodule to bring in the following PRs: * GaloisInc/llvm-pretty-bc-parser#171 (don't consume fast math flag entry) * GaloisInc/llvm-pretty-bc-parser#169 (make `getBitString` assumptions explicit) * GaloisInc/llvm-pretty-bc-parser#168 (support for LLVM 11 `IsDefault` fields in metadata) * GaloisInc/llvm-pretty-bc-parser#170 (remove `Aligned` from `SubWord`) * GaloisInc/llvm-pretty-bc-parser#172 (fix BLOCKADDRESS parsing) * GaloisInc/llvm-pretty-bc-parser#173 (parse LLVM12 poison value) * GaloisInc/llvm-pretty-bc-parser#178 (use Seq over list in PartialModule) Of these, the only PR that requires code changes in Crucible is GaloisInc/llvm-pretty#88 (add LLVM12 poison value), as this adds a new `ValPoison` constructor to `Value'`. Properly handling LLVM poison is a large task that I do not wish to tackle here (see #366 for discussion of how one might do this). For now, this PR does the bare minimum: * In `Lang.Crucible.LLVM.Translation.BlockInfo.useVal`, treat `ValPoison` as not referencing any identifiers. * The other two potential sites where `ValPoison` could be matched on are in `Lang.Crucible.LLVM.Translation.Constant.transConstant'` and `Lang.Crucible.LLVM.Translation.Expr.transValue`. Since we are not handling `ValPoison` for now, both of these functions will simply error if they encounter `ValPoison`. * I have added a section to `crucible-llvm`'s `limitations.md` document which describes the above limitations of poison handling.
This bumps the `llvm-pretty` submodule to bring in the following PRs: * GaloisInc/llvm-pretty#86 (support LLVM 11 `IsDefault` fields in metadata) * GaloisInc/llvm-pretty#87 (Relax the Symbol requirement to `ConstBlockAddr`) * GaloisInc/llvm-pretty#88 (add LLVM12 poison value) * GaloisInc/llvm-pretty#90 (Support structs with bitfields in `Text.LLVM.DebugUtils`) This also bumps the `llvm-pretty-bc-parser` submodule to bring in the following PRs: * GaloisInc/llvm-pretty-bc-parser#171 (don't consume fast math flag entry) * GaloisInc/llvm-pretty-bc-parser#169 (make `getBitString` assumptions explicit) * GaloisInc/llvm-pretty-bc-parser#168 (support for LLVM 11 `IsDefault` fields in metadata) * GaloisInc/llvm-pretty-bc-parser#170 (remove `Aligned` from `SubWord`) * GaloisInc/llvm-pretty-bc-parser#172 (fix BLOCKADDRESS parsing) * GaloisInc/llvm-pretty-bc-parser#173 (parse LLVM12 poison value) * GaloisInc/llvm-pretty-bc-parser#178 (use Seq over list in PartialModule) Of these, the only PR that requires code changes in Crucible is GaloisInc/llvm-pretty#88 (add LLVM12 poison value), as this adds a new `ValPoison` constructor to `Value'`. Properly handling LLVM poison is a large task that I do not wish to tackle here (see #366 for discussion of how one might do this). For now, this PR does the bare minimum: * In `Lang.Crucible.LLVM.Translation.BlockInfo.useVal`, treat `ValPoison` as not referencing any identifiers. * The other two potential sites where `ValPoison` could be matched on are in `Lang.Crucible.LLVM.Translation.Constant.transConstant'` and `Lang.Crucible.LLVM.Translation.Expr.transValue`. Since we are not handling `ValPoison` for now, both of these functions will simply error if they encounter `ValPoison`. * I have added a section to `crucible-llvm`'s `limitations.md` document which describes the above limitations of poison handling.
This bumps the `llvm-pretty` submodule to bring in the following PRs: * GaloisInc/llvm-pretty#86 (support LLVM 11 `IsDefault` fields in metadata) * GaloisInc/llvm-pretty#87 (Relax the Symbol requirement to `ConstBlockAddr`) * GaloisInc/llvm-pretty#88 (add LLVM12 poison value) * GaloisInc/llvm-pretty#90 (Support structs with bitfields in `Text.LLVM.DebugUtils`) This also bumps the `llvm-pretty-bc-parser` submodule to bring in the following PRs: * GaloisInc/llvm-pretty-bc-parser#171 (don't consume fast math flag entry) * GaloisInc/llvm-pretty-bc-parser#169 (make `getBitString` assumptions explicit) * GaloisInc/llvm-pretty-bc-parser#168 (support for LLVM 11 `IsDefault` fields in metadata) * GaloisInc/llvm-pretty-bc-parser#170 (remove `Aligned` from `SubWord`) * GaloisInc/llvm-pretty-bc-parser#172 (fix BLOCKADDRESS parsing) * GaloisInc/llvm-pretty-bc-parser#173 (parse LLVM12 poison value) * GaloisInc/llvm-pretty-bc-parser#178 (use Seq over list in PartialModule) Of these, the only PR that requires code changes in Crucible is GaloisInc/llvm-pretty#88 (add LLVM12 poison value), as this adds a new `ValPoison` constructor to `Value'`. Properly handling LLVM poison is a large task that I do not wish to tackle here (see #366 for discussion of how one might do this). For now, this PR does the bare minimum: * In `Lang.Crucible.LLVM.Translation.BlockInfo.useVal`, treat `ValPoison` as not referencing any identifiers. * The other two potential sites where `ValPoison` could be matched on are in `Lang.Crucible.LLVM.Translation.Constant.transConstant'` and `Lang.Crucible.LLVM.Translation.Expr.transValue`. Since we are not handling `ValPoison` for now, both of these functions will simply error if they encounter `ValPoison`. * I have added a section to `crucible-llvm`'s `limitations.md` document which describes the above limitations of poison handling.
More motivation for implementing this is supporting Rust programs which use enums. For example, in this program: // test.rs
#[no_mangle]
fn test() -> Option<u64> {
None
}
fn main() {
test();
} If you attempt to run this with the following
It will fail thusly:
The LLVM code for ; Function Attrs: nonlazybind uwtable
define dso_local { i64, i64 } @test() unnamed_addr #1 {
start:
%0 = alloca { i64, i64 }, align 8
store i64 0, ptr %0, align 8
%1 = getelementptr inbounds { i64, i64 }, ptr %0, i32 0, i32 0
%2 = load i64, ptr %1, align 8, !range !5, !noundef !4
%3 = getelementptr inbounds { i64, i64 }, ptr %0, i32 0, i32 1
%4 = load i64, ptr %3, align 8
%5 = insertvalue { i64, i64 } undef, i64 %2, 0
%6 = insertvalue { i64, i64 } %5, i64 %4, 1
ret { i64, i64 } %6
} At the LLVM level, the enum type In the LLVM code above, we see that it stack-allocates an The |
We might want to extend the LLVM memory model to allow tracking of undefinedness and/or poison values with bit-level granularity.
There are some perfectly reasonable C programs that we can't verify with Crucible/LLVM right now because the memory model is too strict regarding uninitialized memory. For example, when a C function zero-initializes a bit field of a struct, the compiled LLVM implements it as a
load
, anand
with a mask, and then astore
. Currently this program would be rejected by Crucible/LLVM because theload
reads uninitialized memory and fails.This would be a lot of work to implement. Currently we represent the result of LLVM memory loads using the
Partial
type, which includes a (symbolic) bitvector along with a single definedness predicate; after each load we immediately assert that the definedness predicate is true. We would need to change this to a definedness bitmask, and we would also need to defer any assertions of well-definedness to some later time. We would need to define how undef/poison is propagated by all the bitvector operations. We would also have to model stores of partially-defined bitvectors to memory.Another challenge is that there isn't an agreed-upon standard semantics for undefinedness/poison in LLVM, so we'd have to figure out a lot of details for ourselves. One risk would be that if we make our semantics too permissive, Crucible/LLVM might verify some programs that actually fail when they are compiled and run.
In summary, implementing this feature would bring both potential benefits and also risks. The goal of this ticket thread is to lay out all the risks and benefits that we can think of, so that we can make an informed decision about when/if we are ready to implement this feature.
The text was updated successfully, but these errors were encountered: