Skip to content
This repository has been archived by the owner on Jun 6, 2023. It is now read-only.

Implement off-chain Window PoSt verification #1327

Merged
merged 65 commits into from
Jan 15, 2021
Merged

Conversation

Stebalien
Copy link
Member

@Stebalien Stebalien commented Dec 10, 2020

Proposal: filecoin-project/FIPs#42

TODO

  • Get the tests/lints passing.
  • Test (or retain tests) for SubmitWindowPoSt method
    • Skipping all sectors should prevent submission
  • Test the DisputeWindowedPoSt method.
    • With a correct post.
    • With a bad post.
    • After compaction.
    • While the deadline is open.
    • Test with a sector terminated after a post is submitted but before the end of the deadline.
  • (?) Refactor RecordProvenSectors to simplify it.
  • Break most of DisputeWindowedPoSt into a deadline-state method.
  • Decide on the final penalties for falsifying proofs (@zixuanzh?)
  • Audit for and address remaining TODOs in the code.

Changes

In addition to the changes below,

Change: SubmitWindowedPoSt

Proofs that do not recover faulty power are not checked on-chain. Instead,

  • The proof is recorded in an AMT of "optimistic" proofs.
  • The proof is not verified.
  • The proof may be "disputed" by anyone for 1800 epochs after the challenge window ends.

As usual, skipped sectors are marked as faulty and "unproven" sectors are marked as "proven".

Notes:

  • If a proof recovers faulty power, it is checked on-chain immediately to prevent "recovering" faulty sectors with invalid proofs.
  • "Unproven" sectors (sectors newly added to a partition but never proven with a window PoSt) do not force an on-chain proof verification. This is intentional.

Change: Deadline End

As usual, partitions without proofs are marked as faulty, "new" (unproven) sectors are activated if proofs for them are present, etc.

Immediately after processing all proof related logic, and before terminating any sectors:

  • We record a snapshot of the partition AMT so proofs can be properly disputed later (PartitionsSnapshot)
  • We record a snapshot of the proofs AMT (ProofsSnapshot).

We then proceed to terminate sectors as usual.

Change: CompactPartitions

As CompactPartitions can move sectors between partitions, it makes it difficult mark these sectors as faulty when disputed. While the disputer could submit bitfields describing the new locations of the sectors in question, this dispute could be defeated by repeatedly re-compacting.

To simplify the prevent this attack and simplify the disputer's job, partition compaction is now prohibited for WPoStProofDisputePeriod (currently 2x finality or 15 deadlines) epochs after the deadline's last challenge window. This leaves a window of 960 epochs (8 hours) for partition compaction.

Proofs may not be disputed after this time has passed.

Questions

  • Is the Finality blackout correct? We can make it longer (but probably not shorter). 2x finality would be safer.
    • I've made it 2x finality which should be much safer.

New: DisputeWindowedPoSt

This patch introduces a new method: DisputeWindowedPoSt. This method selects a proof from the ProofsSnapshot AMT and verifies it.

  1. The epoch and deadline are checked to make sure that the window for disputing proofs is still open.
  2. The proof is verified against the partition state snapshot from the end of the last challenge window. This represents the state that was (supposedly) proven.
  3. If the proof validates, this method aborts with no change. Otherwise, we continue.
  4. The active sectors in the partition snapshot are declared as faulty. Terminated/moved sectors are left alone. The faulty termination epoch is set as if the sectors became faulty at the end of the previous challenge window for the deadline in question.
  5. The target miner looses power based on the sectors marked faulty in step 3. If a sector moves, the power will be retained.
  6. The disputed proof is removed from the proofs AMT so it can't be disputed twice.
  7. The target miner is fined a penalty based on the active power in the partitions that were incorrectly prove. This "active power" is taken from the partition snapshot.

Questions

  • Should we allow the miner to specify a sector/partition mapping to work around the case where a sector can be moved due to compaction? I'm concerned that this could lead to large, expensive disputes.
    • We're limiting disputes to the a dispute window.
  • What should the penalty be? (to be determined by FIP)
  • What should the reporter be paid, if anything?
  • Do we really need to snapshot the sectors AMT? I was concerned about compaction deleting sectors. However, we could also just make the window in which miners can compact a deadline disjoint from the window where proofs can be disputed. This might require extending the dispute window a bit (e.g., 2x finality) (done).
    • We're limiting disputes to a dispute window.

@Stebalien
Copy link
Member Author

It looks like this is causing us to kill (OOM?) the migration test. Not sure what that's about.

@Stebalien Stebalien changed the title Implement off-chain Window PoSt verificatoin Implement off-chain Window PoSt verification Dec 11, 2020
@Stebalien
Copy link
Member Author

Turns out "immutable" deadlines aren't quite so immutable because we still allow sector termination. This is moderately annoying to fix because we can terminate sectors from cron (e.g., when terminating too many). We can fix the issue for all practical purposes, but getting a fully correct fix could be annoying.

A simple fix is to snapshot the partitions/sectors along with the proofs, instead of at the end of the deadline. Honestly, this is the "better" approach in terms of correctness. Unfortunately, while it will only marginally increase gas costs (proofs need 2 extra CIDs) it'll increase the size of the state tree by quite a bit. On the other hand, I'm not sure how much that matters.

One of the significant upsides here is that the "WindowPoSt" objects are now completely self-contained.

Thoughts?

type WindowedPoSt struct {
    ProvenPartitions bitfield.BitField
    PartitionsSnapshot cid.Cid // new
    SectorsSnapshot cid.Cid // new
    Proofs []proofs.PoStProof
}

@Stebalien
Copy link
Member Author

Ah, wait. I'm confusing myself.

We do two things:

  1. Actually terminate sectors.
  2. Process early terminations.

1 is triggered from TerminateSectors and at deadline end. This is the part that actually marks sectors as dead.

2 can be triggered from cron, but just pays fines.

TL;DR: all we need to do is block manual calls to TerminateSectors during the "blackout" period. That's pretty reasonable.

@Stebalien
Copy link
Member Author

TL;DR: all we need to do is block manual calls to TerminateSectors during the "blackout" period. That's pretty reasonable.

Done. This also fixes the incentive mismatch where a logical miner will wait until missing data is first challenged to terminate a sector.

@Stebalien
Copy link
Member Author

Next step: migration.

This should be fairly straight-forward.

  1. If the PoSt is submitted before the upgrade, it'll be verified and the partitions will be marked proven.
  2. If the PoSt is submitted after the upgrade, it'll be stored.

The tricky part is that the "all proven sectors have proofs in the proofs AMT" invariant won't be true during this period. I was hoping we could use this as a global invariant but life just isn't that simple.

@Stebalien
Copy link
Member Author

Hm. Looks like the migration isn't fully implemented yet anyways. @anorth, should I pick that up? Or just move on for now.

@anorth
Copy link
Member

anorth commented Dec 13, 2020

Re compaction and finality, could we simplify to get these windows completely non-overlapping? E.g. challenge window of 18hrs, compaction window of 6 hrs? It would be more robust of course to allow complete overlapping, but it doesn't look like that can be done without being quite complex. So if we can't do that, I think it might be worth removing the possibility of collision between those two actions ("sectors that have moved partitions will not be marked faulty, the miner will still be fined"). Edit: I see you mentioned that idea later.

Re migration, correct that's a work in progress. It's probably better not to attempt it in this PR, but note for follow-up.

Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is looking good.

I trust that all the TODOs will either be resolved here, or have an issue filed, or be rewritten as commentary about alternative possibilities, before merging.

actors/builtin/miner/deadline_state.go Outdated Show resolved Hide resolved
actors/builtin/miner/deadline_state.go Outdated Show resolved Hide resolved
actors/builtin/miner/deadline_state.go Outdated Show resolved Hide resolved
actors/builtin/miner/deadline_state.go Outdated Show resolved Hide resolved
actors/builtin/miner/deadlines.go Outdated Show resolved Hide resolved
actors/builtin/miner/miner_actor.go Outdated Show resolved Hide resolved
actors/builtin/miner/miner_actor.go Outdated Show resolved Hide resolved
actors/builtin/miner/miner_actor.go Outdated Show resolved Hide resolved
actors/builtin/miner/miner_actor.go Outdated Show resolved Hide resolved
gen/gen.go Outdated Show resolved Hide resolved
@Stebalien
Copy link
Member Author

Re compaction and finality, could we simplify to get these windows completely non-overlapping? E.g. challenge window of 18hrs, compaction window of 6 hrs? It would be more robust of course to allow complete overlapping, but it doesn't look like that can be done without being quite complex. So if we can't do that, I think it might be worth removing the possibility of collision between those two actions ("sectors that have moved partitions will not be marked faulty, the miner will still be fined"). Edit: I see you mentioned that idea later.

Yes, we can do this. And it's probably the simplest approach (and lets us not snapshot sectors).

Comment on lines 1120 to 1133
if deadline.LiveSectors == 0 {
// TODO: do we still need to clear the post submissions here? I
// think we're technically fine, but this is a strange
// edge-case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes, they should be cleared.

It's a strange edge case that can be directly reached by a miner, simply by terminating a deadline's sectors. Unclear how this may interact with ChallengeWindowedPoSt, but I don't think we want the old snapshots to stick around!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I don't think this can happen anymore (after this patch). Given this patch:

  1. I can't "prove" no sectors. We reject those proofs.
  2. I can't terminate sectors during the challenge window (or the previous deadline).

Given this, I shouldn't need to touch this array, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be done for safety, regardless. If the snapshots aren't reset each EOD cron, they may be technically valid when submitted to ChallengeWindowedPoSt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I can just check without spending too much time. That, and log an error.

Currently, the payout is zero. However, the logic now exists and we
can discuss the ramifications.
// take reward out of the penalty as the miner
// could end up receiving a substantial
// portion of their fee back as a reward.
penaltyTarget := big.Add(penaltyBase, rewardTarget)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not how consensus faults work, but I believe it's the "more correct" approach.


requestUpdatePower(rt, powerDelta)

if !toReward.IsZero() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check isn't present in consensus faults reports and we may want to change that.

// If we fail, log and burn the reward to make sure the balances remain correct.
if !code.IsSuccess() {
rt.Log(rtt.ERROR, "failed to send reward")
toBurn = big.Add(toBurn, toReward)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "burn it" isn't present in consensus fault reporting, but it's necessary to ensure that pledge values remain consistent.

Alternatively, we could just fail the message. Honestly, not failing the message is a bit of a footgun as lotus will think "the message succeeded" and will just send it to the network instead of kicking it back to the user to ensure the user gets paid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather the report succeeded even if we can't pay the reward. Burning it sounds good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For something like this, I assume the submitter would just try again. Any reason we need it to succeed the first time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would it succeed the second time? I don't know what might have gone wrong to cause this failure (low balance?) but it is likely to be permanent. I don't think we should allow inability to pay the reward to stop the fraud being reported at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. I was thinking something like the actor that sent the message suddenly got deleted (not sure how this can happen...).

But this isn't going to fix the "low balance" issue. If the actor doesn't have enough funds for the reward, they'll fail on the next step (burning).

@Stebalien
Copy link
Member Author

In SubmitWindowedPoSt, is params.Proofs[0] checked for a maximum length? If not, it should be.

This is done.

@Stebalien
Copy link
Member Author

My idea is to replace the current DisputeWindowedPoStParams with the following:

Discussed out of band: We can do this, but it would require changing maximum bitfield sizes. They're currently sized to fit up to a partition's worth of sector numbers worst-case, not more. This is not a hard change, it's just a bit tricky to make sure we don't mess it up somewhere.

I've also filed a proposal for the next upgrade to make this impossible: #1352

TL;DR: I'm going to leave things as they are unless we have a compelling reason to do otherwise.

actors/builtin/miner/deadline_state.go Show resolved Hide resolved
actors/builtin/miner/deadline_state.go Show resolved Hide resolved
actors/builtin/miner/deadline_state.go Show resolved Hide resolved
// NOTE: This also includes power that was
// activated at the end of the last challenge
// window, and power from sectors that have since
// expired.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO these comments belong in the data structure, more discoverable to callers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where? I could put it on top of the ParitionsSnapshot field, but it seems kind of obvious that the partition snapshot includes power that's no longer active. I put it here because the fact that we're "disputing" no longer active power is interesting.

// If we fail, log and burn the reward to make sure the balances remain correct.
if !code.IsSuccess() {
rt.Log(rtt.ERROR, "failed to send reward")
toBurn = big.Add(toBurn, toReward)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather the report succeeded even if we can't pay the reward. Burning it sounds good.

actors/builtin/miner/policy.go Show resolved Hide resolved
actors/builtin/miner/miner_actor.go Outdated Show resolved Hide resolved
},
exitcode.Ok)

rt.ExpectAbortContainsMessage(exitcode.ErrForbidden, "can only dispute window posts during the dispute window", func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: you should be able to call actor.disputeWindowPoSt inside rt.ExpectAbortContainsMessage, and I would expect that to encapsulate all the expectation set-up above that you had to replicate. Reset() the mock afterwards some some expectations won't have been reached.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is a bit of a hole in our tests that we should probably fix. I'm trying to test that the expectations are met, even on the failure case. Otherwise, we can get gas mismatches on failure cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair. I hope that some kind of golden file approach in the future will abstract this away. Your call in the meantime.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree that golden files will make this kind of thing unnecessary.

actors/builtin/miner/miner_test.go Outdated Show resolved Hide resolved
@Stebalien Stebalien force-pushed the refactor/wd-post branch 2 times, most recently from b371ebf to 8bb8458 Compare January 14, 2021 21:11
* The reward ensures that, at a minimum, the gas fees for disputing a
  proof is covered.
* The base penalty ensures that the dispute method cannot be used to
  "take an advance" on vesting rewards.
actors/builtin/miner/monies.go Outdated Show resolved Hide resolved
@Stebalien Stebalien merged commit be9804e into master Jan 15, 2021
bibibong pushed a commit to EpiK-Protocol/go-epik-actors that referenced this pull request Feb 20, 2021
This patch implements two changes:

1. New Window PoSts, except those that restore faulty sectors, are
   optimistically accepted, assumed correct, and recorded in the
   state-tree for one proving period.
2. Optimistically accepted window posts can be disputed until
   `WPoStProofDisputeWindow` epochs after the challenge window in which
   they were submitted closes. When a dispute successfully refutes an
   optimistically accepted Window PoSt, the miner is fined one IPF per
   active sector in the partition (at the moment when the proof was
   submitted) plus a flat fee of 20FIL, all incorrectly proved sectors
   are marked faulty, and the disputer (address that submitted the
   dispute) is rewarded a fixed `DipsuteReward`.
@ZenGround0 ZenGround0 mentioned this pull request Apr 8, 2021
28 tasks
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants