-
Notifications
You must be signed in to change notification settings - Fork 102
Implement off-chain Window PoSt verification #1327
Conversation
It looks like this is causing us to kill (OOM?) the migration test. Not sure what that's about. |
Turns out "immutable" deadlines aren't quite so immutable because we still allow sector termination. This is moderately annoying to fix because we can terminate sectors from cron (e.g., when terminating too many). We can fix the issue for all practical purposes, but getting a fully correct fix could be annoying. A simple fix is to snapshot the partitions/sectors along with the proofs, instead of at the end of the deadline. Honestly, this is the "better" approach in terms of correctness. Unfortunately, while it will only marginally increase gas costs (proofs need 2 extra CIDs) it'll increase the size of the state tree by quite a bit. On the other hand, I'm not sure how much that matters. One of the significant upsides here is that the "WindowPoSt" objects are now completely self-contained. Thoughts? type WindowedPoSt struct {
ProvenPartitions bitfield.BitField
PartitionsSnapshot cid.Cid // new
SectorsSnapshot cid.Cid // new
Proofs []proofs.PoStProof
} |
Ah, wait. I'm confusing myself. We do two things:
1 is triggered from TerminateSectors and at deadline end. This is the part that actually marks sectors as dead. 2 can be triggered from cron, but just pays fines. TL;DR: all we need to do is block manual calls to TerminateSectors during the "blackout" period. That's pretty reasonable. |
Done. This also fixes the incentive mismatch where a logical miner will wait until missing data is first challenged to terminate a sector. |
Next step: migration. This should be fairly straight-forward.
The tricky part is that the "all proven sectors have proofs in the proofs AMT" invariant won't be true during this period. I was hoping we could use this as a global invariant but life just isn't that simple. |
Hm. Looks like the migration isn't fully implemented yet anyways. @anorth, should I pick that up? Or just move on for now. |
Re compaction and finality, could we simplify to get these windows completely non-overlapping? E.g. challenge window of 18hrs, compaction window of 6 hrs? It would be more robust of course to allow complete overlapping, but it doesn't look like that can be done without being quite complex. So if we can't do that, I think it might be worth removing the possibility of collision between those two actions ("sectors that have moved partitions will not be marked faulty, the miner will still be fined"). Edit: I see you mentioned that idea later. Re migration, correct that's a work in progress. It's probably better not to attempt it in this PR, but note for follow-up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, this is looking good.
I trust that all the TODOs will either be resolved here, or have an issue filed, or be rewritten as commentary about alternative possibilities, before merging.
Yes, we can do this. And it's probably the simplest approach (and lets us not snapshot sectors). |
dc3ddd1
to
63004e3
Compare
actors/builtin/miner/miner_state.go
Outdated
if deadline.LiveSectors == 0 { | ||
// TODO: do we still need to clear the post submissions here? I | ||
// think we're technically fine, but this is a strange | ||
// edge-case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think yes, they should be cleared.
It's a strange edge case that can be directly reached by a miner, simply by terminating a deadline's sectors. Unclear how this may interact with ChallengeWindowedPoSt
, but I don't think we want the old snapshots to stick around!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, I don't think this can happen anymore (after this patch). Given this patch:
- I can't "prove" no sectors. We reject those proofs.
- I can't terminate sectors during the challenge window (or the previous deadline).
Given this, I shouldn't need to touch this array, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be done for safety, regardless. If the snapshots aren't reset each EOD cron, they may be technically valid when submitted to ChallengeWindowedPoSt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I can just check without spending too much time. That, and log an error.
Re-organize proof recording logic to move it inside RecordProvenSectors. This means we can test it in the deadline invariant check.
Currently, the payout is zero. However, the logic now exists and we can discuss the ramifications.
// take reward out of the penalty as the miner | ||
// could end up receiving a substantial | ||
// portion of their fee back as a reward. | ||
penaltyTarget := big.Add(penaltyBase, rewardTarget) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not how consensus faults work, but I believe it's the "more correct" approach.
|
||
requestUpdatePower(rt, powerDelta) | ||
|
||
if !toReward.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check isn't present in consensus faults reports and we may want to change that.
// If we fail, log and burn the reward to make sure the balances remain correct. | ||
if !code.IsSuccess() { | ||
rt.Log(rtt.ERROR, "failed to send reward") | ||
toBurn = big.Add(toBurn, toReward) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This "burn it" isn't present in consensus fault reporting, but it's necessary to ensure that pledge values remain consistent.
Alternatively, we could just fail the message. Honestly, not failing the message is a bit of a footgun as lotus will think "the message succeeded" and will just send it to the network instead of kicking it back to the user to ensure the user gets paid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather the report succeeded even if we can't pay the reward. Burning it sounds good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For something like this, I assume the submitter would just try again. Any reason we need it to succeed the first time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it succeed the second time? I don't know what might have gone wrong to cause this failure (low balance?) but it is likely to be permanent. I don't think we should allow inability to pay the reward to stop the fraud being reported at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. I was thinking something like the actor that sent the message suddenly got deleted (not sure how this can happen...).
But this isn't going to fix the "low balance" issue. If the actor doesn't have enough funds for the reward, they'll fail on the next step (burning).
This is done. |
Discussed out of band: We can do this, but it would require changing maximum bitfield sizes. They're currently sized to fit up to a partition's worth of sector numbers worst-case, not more. This is not a hard change, it's just a bit tricky to make sure we don't mess it up somewhere. I've also filed a proposal for the next upgrade to make this impossible: #1352 TL;DR: I'm going to leave things as they are unless we have a compelling reason to do otherwise. |
// NOTE: This also includes power that was | ||
// activated at the end of the last challenge | ||
// window, and power from sectors that have since | ||
// expired. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO these comments belong in the data structure, more discoverable to callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where? I could put it on top of the ParitionsSnapshot
field, but it seems kind of obvious that the partition snapshot includes power that's no longer active. I put it here because the fact that we're "disputing" no longer active power is interesting.
// If we fail, log and burn the reward to make sure the balances remain correct. | ||
if !code.IsSuccess() { | ||
rt.Log(rtt.ERROR, "failed to send reward") | ||
toBurn = big.Add(toBurn, toReward) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather the report succeeded even if we can't pay the reward. Burning it sounds good.
}, | ||
exitcode.Ok) | ||
|
||
rt.ExpectAbortContainsMessage(exitcode.ErrForbidden, "can only dispute window posts during the dispute window", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you should be able to call actor.disputeWindowPoSt
inside rt.ExpectAbortContainsMessage
, and I would expect that to encapsulate all the expectation set-up above that you had to replicate. Reset()
the mock afterwards some some expectations won't have been reached.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is a bit of a hole in our tests that we should probably fix. I'm trying to test that the expectations are met, even on the failure case. Otherwise, we can get gas mismatches on failure cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair. I hope that some kind of golden file approach in the future will abstract this away. Your call in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree that golden files will make this kind of thing unnecessary.
b371ebf
to
8bb8458
Compare
* The reward ensures that, at a minimum, the gas fees for disputing a proof is covered. * The base penalty ensures that the dispute method cannot be used to "take an advance" on vesting rewards.
8bb8458
to
d07f47f
Compare
This patch implements two changes: 1. New Window PoSts, except those that restore faulty sectors, are optimistically accepted, assumed correct, and recorded in the state-tree for one proving period. 2. Optimistically accepted window posts can be disputed until `WPoStProofDisputeWindow` epochs after the challenge window in which they were submitted closes. When a dispute successfully refutes an optimistically accepted Window PoSt, the miner is fined one IPF per active sector in the partition (at the moment when the proof was submitted) plus a flat fee of 20FIL, all incorrectly proved sectors are marked faulty, and the disputer (address that submitted the dispute) is rewarded a fixed `DipsuteReward`.
Proposal: filecoin-project/FIPs#42
TODO
(?) Refactor RecordProvenSectors to simplify it.Changes
In addition to the changes below,
Change: SubmitWindowedPoSt
Proofs that do not recover faulty power are not checked on-chain. Instead,
As usual, skipped sectors are marked as faulty and "unproven" sectors are marked as "proven".
Notes:
Change: Deadline End
As usual, partitions without proofs are marked as faulty, "new" (unproven) sectors are activated if proofs for them are present, etc.
Immediately after processing all proof related logic, and before terminating any sectors:
PartitionsSnapshot
)ProofsSnapshot
).We then proceed to terminate sectors as usual.
Change: CompactPartitions
As
CompactPartitions
can move sectors between partitions, it makes it difficult mark these sectors as faulty when disputed. While the disputer could submit bitfields describing the new locations of the sectors in question, this dispute could be defeated by repeatedly re-compacting.To simplify the prevent this attack and simplify the disputer's job, partition compaction is now prohibited for WPoStProofDisputePeriod (currently 2x finality or 15 deadlines) epochs after the deadline's last challenge window. This leaves a window of 960 epochs (8 hours) for partition compaction.
Proofs may not be disputed after this time has passed.
Questions
New: DisputeWindowedPoSt
This patch introduces a new method:
DisputeWindowedPoSt
. This method selects a proof from the ProofsSnapshot AMT and verifies it.Questions
This might require extending the dispute window a bit (e.g., 2x finality)(done).