-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
itest: fix test flakes from open channel not found and tx not found in mempool #5611
itest: fix test flakes from open channel not found and tx not found in mempool #5611
Conversation
563df9c
to
9a7cd60
Compare
Will the channel update changes (haven't dove into the PR yet) address this flake?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flake hunting continues!
Great set of findings here, did an initial pass over the set of changes and only found one or two things worth discussing/revising.
lntest/node.go
Outdated
@@ -1590,3 +1535,75 @@ func (hn *HarnessNode) WaitForBalance(expectedBalance btcutil.Amount, confirmed | |||
|
|||
return nil | |||
} | |||
|
|||
// outpoint returns the outpoint of the channel's funding transaction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
outpoint
-> makeOutpoint
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
delete(hn.closeClients, op) | ||
} | ||
hn.handleClosedChannelUpdate(graphUpdate.ClosedChans) | ||
// TODO(yy): handle node updates too |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the current itests we have for gossip/node updates just check the channel graph directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so. We use describe graph iirc.
lntest/node.go
Outdated
// "advertisingNode1": [ | ||
// policy1, policy2,... | ||
// ], | ||
// "advertisingNode2": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: indentation is a bit off here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed.
lntest/node.go
Outdated
@@ -1222,6 +1243,10 @@ const ( | |||
// watchOpenChannel specifies that this is a request to watch an close | |||
// channel event. | |||
watchCloseChannel | |||
|
|||
// watchOpenChannel specifies that this is a request to watch an policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
watch for a*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed!
select { | ||
// Send a watch request every second. | ||
case <-ticker.C: | ||
hn.chanWatchRequests <- &chanWatchRequest{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why send a new watch request each time vs a single one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there is an easy way to check the want vs got policies inside handleChannelEdgeUpdates
. Unlike checking for open channels, where we just increment counts whenever we see a channel point and have the logic of deciding it being open or not inside the function, we have no knowledge about what policy to compare to. Plus I think it's a bit more explicit(imo) to code it this way.
@@ -1324,6 +1353,9 @@ func (hn *HarnessNode) lightningNetworkWatcher() { | |||
|
|||
case watchCloseChannel: | |||
hn.handleCloseChannelWatchRequest(watchRequest) | |||
|
|||
case watchPolicyUpdate: | |||
hn.handlePolicyUpdateWatchRequest(watchRequest) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the advantage of doing this here vs the graphUpdates
case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to handle the subscription and following checks in one place. And the lines of code decreased. And this is what I have in mind atm, to makeitest
better,
handle all lightning node related RPC calls at HarnessNode level
handle miner/backend related RPC calls at NetworkHarness level
handle assertions in assertions.go
Thus our test file would be better organized and the length should be greatly decreased.
@@ -1611,17 +1611,17 @@ func (s *server) Start() error { | |||
} | |||
cleanup = cleanup.add(s.utxoNursery.Stop) | |||
|
|||
if err := s.chainArb.Start(); err != nil { | |||
if err := s.breachArbiter.Start(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, looks like the very same bug fix is actually implemented in this very old PR: #1783
I wonder if we should cherry-pick it as is, as it attempts to address other known dependency startup issues like what you set out to fix here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that pr is pretty outdated and some of the suggested changes were already made so I just added the needed changes here manually.
aca5ead
to
cc98939
Compare
I think so. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great set of improvements! Really like how thoroughly you approach those flake fixes 💯
Just a few minor comments, looks mostly good to me!
lntest/node.go
Outdated
@@ -1590,3 +1535,81 @@ func (hn *HarnessNode) WaitForBalance(expectedBalance btcutil.Amount, confirmed | |||
|
|||
return nil | |||
} | |||
|
|||
// makeOutpoint returns the outpoint of the channel's funding transaction. | |||
func makeOutpoint(chanPoint *lnrpc.ChannelPoint) (wire.OutPoint, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: use rpcPointToWirePoint
in lnd_channel_backup_test.go
? Or move/rename that one to this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Use this method in rpcPointToWirePoint
instead.
lntest/node.go
Outdated
IncludeUnannounced: include, | ||
}) | ||
if err != nil { | ||
hn.AddToLog("DescribeGraph got err: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to return here as we'd run into a nil pointer further down the line if we don't. Or pass in t
to fail the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok the more I change the more I feel that, maybe we should use hn.AddToLog
to print to console instead of the log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...
lntest/node.go
Outdated
for _, newChan := range updates { | ||
op, err := makeOutpoint(newChan.ChanPoint) | ||
if err != nil { | ||
hn.AddToLog("failed to create outpoint for %v"+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to return here to make sure we don't use an invalid outpoint? Or pass in t
to fail the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a return
here.
We could perhaps use panic
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.
Also nit: add space after %v
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I ended up printing the error to the console. Thought about using panic
which made me panic, being afraid of it would panic where it shouldn't panic😂
Also the nits are fixed!
cc98939
to
86ae217
Compare
Lots of the itest failed, weird, will address them later |
77703a2
to
d73417d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a rebase!
Otherwise LGTM barring the discussion re return vs. panic 🎉
lntest/node.go
Outdated
for _, newChan := range updates { | ||
op, err := makeOutpoint(newChan.ChanPoint) | ||
if err != nil { | ||
hn.AddToLog("failed to create outpoint for %v"+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.
Also nit: add space after %v
.
lntest/node.go
Outdated
IncludeUnannounced: include, | ||
}) | ||
if err != nil { | ||
hn.AddToLog("DescribeGraph got err: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...
1e3557e
to
1dec6f3
Compare
1dec6f3
to
5eeb998
Compare
Needs a rebase. |
c4fbad0
to
334aec6
Compare
Rebased upon #5637 to see the cumulative itest results. |
334aec6
to
5614fad
Compare
0681eb7
to
81be9a9
Compare
Can be rebased now that the dependent PR is in! |
81be9a9
to
a4e2fab
Compare
Done! Now pending the build results! |
a4e2fab
to
25a1a7f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM ✏️
Just needs a rebase and this can land, death to flakes!
This commit removes the method waitForChannelUpdate, and uses node.WaitForChannelPolicyUpdate instead.
This commit adds the part of the changes made in this PR: lightningnetwork#1783. The origin PR is quite outdated, instead of rebasing it the relevant changes are taken out and put into this commit.
25a1a7f
to
87ab4de
Compare
Done! |
Depends #5637
This PR fixes the itest flakes from open channel not found and tx not found in mempool. FWIW, the node graph subscription is refactored so that the graph update stream is handled in the
node
level. We will use only one topology client to check for channel open/close/update.Meanwhile, I think the structure of
itest
can be more or less defined now, we could,HarnessNode
levelNetworkHarness
levelassertions.go
Thus our test file would be better organized and the length should be greatly decreased.
In addition, improved logging is added. We can now see the node's internal state, something like,