Skip to content

Fuzz reloading with a stale monitor in chanmon_consistency #3113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 12, 2024

Conversation

TheBlueMatt
Copy link
Collaborator

Now that we are gearing up to support fully async monitor storage, we really need to fuzz monitor updates not completing before a reload, which we do here in the chanmon_consistency fuzzer.

While there are more parts to async monitor updating that we need to fuzz, this at least gets us started by having basic async restart cases handled. In the future, we should extend this to make sure some basic properties (eg claim/balance consistency) remain true through chanmon_consistency runs.

@TheBlueMatt TheBlueMatt force-pushed the 2024-04-async-monitor-fuzz branch from fd0ecbd to 7446a79 Compare June 10, 2024 21:01
@codecov-commenter
Copy link

codecov-commenter commented Jun 10, 2024

Codecov Report

Attention: Patch coverage is 97.50000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 90.65%. Comparing base (1d0c6c6) to head (2a0c900).
Report is 15 commits behind head on main.

Files Patch % Lines
lightning/src/ln/chanmon_update_fail_tests.rs 97.46% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3113      +/-   ##
==========================================
+ Coverage   89.84%   90.65%   +0.80%     
==========================================
  Files         119      119              
  Lines       97811   102825    +5014     
  Branches    97811   102825    +5014     
==========================================
+ Hits        87883    93217    +5334     
+ Misses       7364     7108     -256     
+ Partials     2564     2500      -64     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheBlueMatt TheBlueMatt force-pushed the 2024-04-async-monitor-fuzz branch 2 times, most recently from b7a57bc to 61aef9b Compare June 10, 2024 21:46
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, just a few questions/comments.

&|v: &mut Vec<_>| if !v.is_empty() { Some(v.remove(0)) } else { None }),
0xf1 =>
complete_monitor_update(&monitor_a, &chan_1_funding,
&|v: &mut Vec<_>| if v.len() > 1 { Some(v.remove(1)) } else { None }),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure why we alternate between 0 and 1 here? Would we gain anything in terms of coverage by choosing randomly from 0..len?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't chose randomly because fuzzer results need to be deterministic based on the input. We could read another byte of fuzz input and use that, but we really want the fuzz input to be as dense as possible, so using a single bit for it seems sufficient. I'm not sure that we'd have any issues that can't be expressed through finishing (first, second, last), but could see some issues arising where we need to finish something in the middle that isnt first or last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't chose randomly because fuzzer results need to be deterministic based on the input. We could read another byte of fuzz input and use that

Right, I imagined the latter.

but we really want the fuzz input to be as dense as possible, so using a single bit for it seems sufficient. I'm not sure that we'd have any issues that can't be expressed through finishing (first, second, last), but could see some issues arising where we need to finish something in the middle that isnt first or last.

Yeah, that's why I thought covering the entire Vec might be worth it, but maybe it's not too important.

@TheBlueMatt TheBlueMatt force-pushed the 2024-04-async-monitor-fuzz branch from 3a944d2 to c05e508 Compare June 11, 2024 13:57
@TheBlueMatt TheBlueMatt force-pushed the 2024-04-async-monitor-fuzz branch from c05e508 to 9d0c1a1 Compare June 12, 2024 02:26
@TheBlueMatt
Copy link
Collaborator Author

Addressed feedback.

@tnull
Copy link
Contributor

tnull commented Jun 12, 2024

Addressed feedback.

Unfortunately the new test now reliably fails:

thread 'ln::chanmon_update_fail_tests::test_sync_async_persist_doesnt_hang' panicked at 'explicit panic', lightning/src/ln/chanmon_update_fail_tests.rs:3410:14

@TheBlueMatt TheBlueMatt force-pushed the 2024-04-async-monitor-fuzz branch from 384d691 to 2a0c900 Compare June 12, 2024 14:11
@TheBlueMatt
Copy link
Collaborator Author

Ugh, sorry, pushed right before bed last night and didn't test enough :/

@valentinewallace
Copy link
Contributor

LGTM, feel free to squash IMO.

Now that we are gearing up to support fully async monitor storage,
we really need to fuzz monitor updates not completing before a
reload, which we do here in the `chanmon_consistency` fuzzer.

While there are more parts to async monitor updating that we need
to fuzz, this at least gets us started by having basic async
restart cases handled. In the future, we should extend this to make
sure some basic properties (eg claim/balance consistency) remain
true through `chanmon_consistency` runs.
When we have `ChannelMonitorUpdate`s which are completing both
synchronously and asynchronously, we need to consider a channel as
unblocked based on the `ChannelManager` monitor update queue,
rather than by checking the `update_id`s.

Consider the case where a channel is updated, leading to a
`ChannelMonitorUpdate` which completes asynchronously. The update
completes, but prior to the `ChannelManager` receiving the
`MonitorEvent::Completed` it generates a further
`ChannelMonitorUpdate`. This second update completes synchronously.
As a result, when the `MonitorEvent` is processed, the event's
`monitor_update_id` is the first update, but there are no updates
queued and the channel should be free to return to be unblocked.

Here we fix this by looking only at the `ChannelManager` update
queue, rather than the update_id of the `MonitorEvent`.

While we don't anticipate many users having both synchronous and
asynchronous persists in the same application, there isn't much
cost to supporting it, which we do here.

Found by the chanmon_consistency target.
@TheBlueMatt TheBlueMatt force-pushed the 2024-04-async-monitor-fuzz branch from 2a0c900 to 920d96e Compare June 12, 2024 15:32
@TheBlueMatt
Copy link
Collaborator Author

Squashed without further changes.

@tnull tnull merged commit 5e3056e into lightningdevkit:main Jun 12, 2024
14 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants