-
Notifications
You must be signed in to change notification settings - Fork 129
Replace CBMC output parser #1433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace CBMC output parser #1433
Conversation
kani-driver/src/session.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a risk using pipe? I know in some implementations, if the pipe isn't drained fast enough, the upstream process blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't think it's risk-free. But I haven't found any issues so far.
For example, I think the scenario you mention would correspond to running a non-terminating example through Kani. I've tried doing that locally and it's still going on after 45mins. Kani's memory % is below 0.1, and CBMC's memory % has slowly increased to 4.3% (it's doing automatic unwinding).
|
Thank you so much for this! Just so we have a fair idea of what the output looks like now, can you add a few screenshots to the PR description? |
Screenshots added! fn rev(mut x: [i32; 4]) -> [i32; 4] {
x.reverse();
x
}
#[cfg(kani)]
#[kani::proof]
fn example() {
let var = kani::any();
let reversed = rev(var);
let double_reversed = rev(reversed);
assert_eq!(var, double_reversed);
}The main differences are highlighted in the call-outs section (e.g., no color). Other than that, I've tried to keep it as faithful as possible. |
zhassan-aws
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First round of comments. Only half way through cbmc_output_parser.rs.
be24eca to
71df083
Compare
|
Added colored output. Will update screenshots in a moment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would using something like the StreamDeserializer be possible here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
I tried using StreamDeserializer for parsing from the buffer in one of my earliest attempts, but it wasn't successful. The problem is that it expects self-delineating values, but the output we process is a JSON array with delineated values. In other words, the output we get is of the form:
[
Program,
Message,
...,
Message,
Result,
Message,
ProverStatus
]
But for StreamDeserializer to work we would need an output of the form:
Program
Message
...
Message
Result
Message
ProverStatus
Note that the parsing logic is mostly concerned with this problem: It assumes it's already within a JSON array, ignoring its delimiters. Changing the CBMC output to have the second form described is a no-go, at least according to the discussions I had then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be cleaner to use write!(&mut fmt_str, ":{line}")? instead of these two lines. Similar in the rest of the function (and maybe other instances in the file.
28b7930 to
0527ff0
Compare
| write!(&mut fmt_str, " in function {function}")?; | ||
| } | ||
|
|
||
| write! {f, "{}", fmt_str} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we're first writing to fmt_str and then write fmt_str to f, can't we just write to f directly?
zhassan-aws
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a huge effort! Thanks @adpaco-aws!
Only have a few more minor comments.
| if input.starts_with('[') || input.starts_with(']') { | ||
| return Some(Action::ClearInput); | ||
| } | ||
| if input.starts_with(" }") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we handle whitespace separately, i.e. remove it before matching on the patterns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spaces are important here: Because we're iterating over a JSON array, matching on this specific string guarantees that we'll always get an item when we attempt to process a parser item. Otherwise, we wouldn't have this guarantee, which would likely result in a slow-down of the parser.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can we guarantee that CBMC's JSON format will not change in a way that would break the current matching (that uses a specific number of spaces)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we pin to a specific cbmc version in our releases, we'd at least get notification that it broke in all our tests.
But I do think this explanation of what's going on here should be added as a comment on this line of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But can we guarantee that CBMC's JSON format will not change in a way that would break the current matching (that uses a specific number of spaces)?
We don't guarantee that. This could be changed to make it more resilient, but in my opinion it's better to break here than in some other places.
Added a comment to explain this.
|
|
||
| /// Clears the input accumulated so far. | ||
| fn clear_input(&mut self) { | ||
| self.input_so_far = String::new(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| self.input_so_far = String::new(); | |
| self.input_so_far.clear(); |
Using clear instead of new would keep the same capacity, which should reduce allocations/deallocations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, thanks!
| } | ||
| let complete_string = &self.input_so_far[0..self.input_so_far.len()]; | ||
| let result_item: Result<ParserItem, _> = serde_json::from_str(complete_string); | ||
| assert!(result_item.is_ok()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assert shouldn't be necessary since the unwrap on the next line already includes a panic.
| // Both formatting and printing could be handled by objects which | ||
| // implement a trait `Printer`. | ||
| let formatted_item = format_item(&processed_item, output_format); | ||
| if formatted_item.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
| /// * Curly closing bracket ('}') preceded by two spaces will trigger the | ||
| /// `ProcessItem` action. | ||
| fn triggers_action(&self, input: String) -> Option<Action> { | ||
| if input.starts_with('[') || input.starts_with(']') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may ignore any characters that appear after [ or ]. If CBMC's current output never includes characters, perhaps we should assert that the line only has [ or ]?
| if let Some(alt_descriptions) = description_alternatives { | ||
| for (desc_to_match, opt_desc_to_replace) in alt_descriptions { | ||
| if original.contains(desc_to_match) { | ||
| if opt_desc_to_replace.is_some() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if let Some
tedinski
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like your description of how you tested it. Seems like it should be pretty high-confidence?
I think there's some things that could be improved, but nothing I think should block this improvement. Do you want to merge before release tomorrow?
kani-driver/src/call_cbmc.rs
Outdated
| if self.args.output_format != OutputFormat::Old { | ||
| args.push("--json-ui".into()); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this impact --visualize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out. It wasn't clear to me: --visualize still works, but I don't know if it's adding more files. So I reverted this change and added a comment.
| }; | ||
| use structopt::lazy_static::lazy_static; | ||
|
|
||
| lazy_static! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not want this to block this PR, but IIRC lazy_static isn't really maintained anymore and I forget what crate is supposed to replace it. once or oncecell or something like that.
| result_str.push_str(&check_id); | ||
| result_str.push_str(&status_msg); | ||
| result_str.push_str(&description_msg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use write! on a String, but I'd not worry about this now, do a follow-up refactoring!
|
Addressed all comments except "use |
Description of changes:
Replaces the
cbmc_json_parser.pyscript with the Rust modulecbmc_json_parser.rs.In contrast to the old parser, this one doesn't wait to read from a file. Instead, it pipes CBMC output into Kani, which allows us to perform parsing on the fly. The main advantage of doing this is that users will get feedback even if CBMC is stuck in automatic unwinding (i.e., this solves #493).
While the basic functionality is being thoroughly tested by our regression tests, there are some places where we aren't doing the exact same thing as in the old script. See the "Callouts" section for more details.
Here's the output for a non-terminating example:
For the test in

tests/expected/assert-eq/main.rs(regularoutput format):For the test in

tests/expected/assert-eq/main.rs(terseoutput format):Resolved issues:
Resolves #493
Resolves #1431
Call-outs:
diff_pathin order to print locations, which are relative to the current working directory. The old script had different branches based on the location of the involved files (i.e., it distinguished between files located below the current working directory, the user directory and others).Colored results are gone, but will come back soon: New parser doesn't produce colored output #1431dry-runregression because now we don't produce a CBMC output file.Testing:
How is this change tested? Existing regression. In addition, I used an ad-hoc script that
diffs the output between the old and new versions. No "Status" lines appeared there.Is this a refactor change? Mostly yes.
Checklist
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.