Skip to content

Conversation

@adpaco-aws
Copy link
Contributor

@adpaco-aws adpaco-aws commented Aug 1, 2022

Description of changes:

Replaces the cbmc_json_parser.py script with the Rust module cbmc_json_parser.rs.

In contrast to the old parser, this one doesn't wait to read from a file. Instead, it pipes CBMC output into Kani, which allows us to perform parsing on the fly. The main advantage of doing this is that users will get feedback even if CBMC is stuck in automatic unwinding (i.e., this solves #493).

While the basic functionality is being thoroughly tested by our regression tests, there are some places where we aren't doing the exact same thing as in the old script. See the "Callouts" section for more details.

Here's the output for a non-terminating example:

Screen Shot 2022-08-01 at 16 37 37

For the test in tests/expected/assert-eq/main.rs (regular output format):
Screen Shot 2022-08-03 at 14 57 28

For the test in tests/expected/assert-eq/main.rs (terse output format):
Screen Shot 2022-08-03 at 14 57 56

Resolved issues:

Resolves #493
Resolves #1431

Call-outs:

  • The new parser relays on diff_path in order to print locations, which are relative to the current working directory. The old script had different branches based on the location of the involved files (i.e., it distinguished between files located below the current working directory, the user directory and others).
  • Colored results are gone, but will come back soon: New parser doesn't produce colored output #1431
  • Messages produces after looking a result should be handled as regular messages. I've some idea on how to handle this, but will have to wait until then: Messages based on results should be printed as regular messages #1432
  • Some Python regexes used to match on Kani reachability IDs were too greedy in Rust, therefore I changed them to NOT match on square brackets, which gives the expected result.
  • Had to change a dry-run regression because now we don't produce a CBMC output file.
  • I'm sure there are places where we can use more idiomatic Rust. Please point those out!

Testing:

  • How is this change tested? Existing regression. In addition, I used an ad-hoc script that diffs the output between the old and new versions. No "Status" lines appeared there.

  • Is this a refactor change? Mostly yes.

Checklist

  • Each commit message has a non-empty body, explaining why the change was made
  • Methods or procedures are documented
  • Regression or unit tests are included, or existing tests cover the modified code
  • My PR is restricted to a single feature or bugfix

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@adpaco-aws adpaco-aws requested a review from a team as a code owner August 1, 2022 19:41
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk using pipe? I know in some implementations, if the pipe isn't drained fast enough, the upstream process blocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I don't think it's risk-free. But I haven't found any issues so far.

For example, I think the scenario you mention would correspond to running a non-terminating example through Kani. I've tried doing that locally and it's still going on after 45mins. Kani's memory % is below 0.1, and CBMC's memory % has slowly increased to 4.3% (it's doing automatic unwinding).

@jaisnan
Copy link
Contributor

jaisnan commented Aug 1, 2022

Thank you so much for this! Just so we have a fair idea of what the output looks like now, can you add a few screenshots to the PR description?

@adpaco-aws
Copy link
Contributor Author

Thank you so much for this! Just so we have a fair idea of what the output looks like now, can you add a few screenshots to the PR description?

Screenshots added!
They correspond to one test (with output format regular and terse) and the non-terminating example:

fn rev(mut x: [i32; 4]) -> [i32; 4] {
    x.reverse();
    x
}

#[cfg(kani)]
#[kani::proof]
fn example() {
    let var = kani::any();
    let reversed = rev(var);
    let double_reversed = rev(reversed);

    assert_eq!(var, double_reversed);
}

The main differences are highlighted in the call-outs section (e.g., no color). Other than that, I've tried to keep it as faithful as possible.

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First round of comments. Only half way through cbmc_output_parser.rs.

@adpaco-aws adpaco-aws force-pushed the replace-cbmc-output-parser branch from be24eca to 71df083 Compare August 3, 2022 15:23
@adpaco-aws
Copy link
Contributor Author

Added colored output. Will update screenshots in a moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would using something like the StreamDeserializer be possible here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

I tried using StreamDeserializer for parsing from the buffer in one of my earliest attempts, but it wasn't successful. The problem is that it expects self-delineating values, but the output we process is a JSON array with delineated values. In other words, the output we get is of the form:

[
    Program,
    Message,
    ...,
    Message,
    Result,
    Message,
    ProverStatus
]

But for StreamDeserializer to work we would need an output of the form:

Program
Message
...
Message
Result
Message
ProverStatus

Note that the parsing logic is mostly concerned with this problem: It assumes it's already within a JSON array, ignoring its delimiters. Changing the CBMC output to have the second form described is a no-go, at least according to the discussions I had then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be cleaner to use write!(&mut fmt_str, ":{line}")? instead of these two lines. Similar in the rest of the function (and maybe other instances in the file.

@adpaco-aws adpaco-aws force-pushed the replace-cbmc-output-parser branch from 28b7930 to 0527ff0 Compare August 8, 2022 15:22
write!(&mut fmt_str, " in function {function}")?;
}

write! {f, "{}", fmt_str}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're first writing to fmt_str and then write fmt_str to f, can't we just write to f directly?

Copy link
Contributor

@zhassan-aws zhassan-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a huge effort! Thanks @adpaco-aws!

Only have a few more minor comments.

if input.starts_with('[') || input.starts_with(']') {
return Some(Action::ClearInput);
}
if input.starts_with(" }") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we handle whitespace separately, i.e. remove it before matching on the patterns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spaces are important here: Because we're iterating over a JSON array, matching on this specific string guarantees that we'll always get an item when we attempt to process a parser item. Otherwise, we wouldn't have this guarantee, which would likely result in a slow-down of the parser.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can we guarantee that CBMC's JSON format will not change in a way that would break the current matching (that uses a specific number of spaces)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we pin to a specific cbmc version in our releases, we'd at least get notification that it broke in all our tests.

But I do think this explanation of what's going on here should be added as a comment on this line of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But can we guarantee that CBMC's JSON format will not change in a way that would break the current matching (that uses a specific number of spaces)?

We don't guarantee that. This could be changed to make it more resilient, but in my opinion it's better to break here than in some other places.

Added a comment to explain this.


/// Clears the input accumulated so far.
fn clear_input(&mut self) {
self.input_so_far = String::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.input_so_far = String::new();
self.input_so_far.clear();

Using clear instead of new would keep the same capacity, which should reduce allocations/deallocations.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, thanks!

}
let complete_string = &self.input_so_far[0..self.input_so_far.len()];
let result_item: Result<ParserItem, _> = serde_json::from_str(complete_string);
assert!(result_item.is_ok());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert shouldn't be necessary since the unwrap on the next line already includes a panic.

// Both formatting and printing could be handled by objects which
// implement a trait `Printer`.
let formatted_item = format_item(&processed_item, output_format);
if formatted_item.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if let Some

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see.

/// * Curly closing bracket ('}') preceded by two spaces will trigger the
/// `ProcessItem` action.
fn triggers_action(&self, input: String) -> Option<Action> {
if input.starts_with('[') || input.starts_with(']') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may ignore any characters that appear after [ or ]. If CBMC's current output never includes characters, perhaps we should assert that the line only has [ or ]?

if let Some(alt_descriptions) = description_alternatives {
for (desc_to_match, opt_desc_to_replace) in alt_descriptions {
if original.contains(desc_to_match) {
if opt_desc_to_replace.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if let Some

Copy link
Contributor

@tedinski tedinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like your description of how you tested it. Seems like it should be pretty high-confidence?

I think there's some things that could be improved, but nothing I think should block this improvement. Do you want to merge before release tomorrow?

Comment on lines 112 to 114
if self.args.output_format != OutputFormat::Old {
args.push("--json-ui".into());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this impact --visualize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out. It wasn't clear to me: --visualize still works, but I don't know if it's adding more files. So I reverted this change and added a comment.

};
use structopt::lazy_static::lazy_static;

lazy_static! {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not want this to block this PR, but IIRC lazy_static isn't really maintained anymore and I forget what crate is supposed to replace it. once or oncecell or something like that.

Comment on lines +607 to +609
result_str.push_str(&check_id);
result_str.push_str(&status_msg);
result_str.push_str(&description_msg);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use write! on a String, but I'd not worry about this now, do a follow-up refactoring!

@adpaco-aws
Copy link
Contributor Author

Addressed all comments except "use write! on strings directly", which I'm leaving as a refactoring task in #1480

@tedinski tedinski merged commit bca694a into model-checking:main Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

New parser doesn't produce colored output Kani produces no output on non-termination

6 participants