Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

itest: fix test flakes from open channel not found and tx not found in mempool #5611

Merged
merged 14 commits into from
Sep 17, 2021

Conversation

yyforyongyu
Copy link
Member

@yyforyongyu yyforyongyu commented Aug 8, 2021

Depends #5637
This PR fixes the itest flakes from open channel not found and tx not found in mempool. FWIW, the node graph subscription is refactored so that the graph update stream is handled in the node level. We will use only one topology client to check for channel open/close/update.

Meanwhile, I think the structure of itest can be more or less defined now, we could,

  • handle all lightning node related RPC calls at HarnessNode level
  • handle miner/backend related RPC calls at NetworkHarness level
  • handle assertions in assertions.go
    Thus our test file would be better organized and the length should be greatly decreased.

In addition, improved logging is added. We can now see the node's internal state, something like,

=== RUN   TestLightningNetworkDaemon/tranche00/28-of-82/btcd/invoice_routing_hints
    test_harness.go:91: Failed: (invoice routing hints): exited with error: 
        *errors.errorString timed out waiting for channel open alice-bob: channel:f97a3494fe9face5b75a55fa9ffa5439ed3c96baaa8cec11f5b5132269a4aeee:0 not opened before timeout: 
        node state: {
            "NodeID": 0,
            "Name": "Alice",
            "PubKey": "020696e7c9f64eecc0eb65c7f02094bbe0d291376f4160a8c29846698468cd249e",
            "OpenChans": {
                "188037d1670118bfaf6d99048f38ea100de8959e29133a65c98cb3d59b1d07cd:0": 2,
                "9c2a33d872dd201436ad2ecfcd6d9a440adfbe07a916d89d6b8a7d9e2ab9ad82:0": 2,
                "a53db446aec7066117c556595fce760dc2b2b02805991739ddca96af87dde415:0": 2,
                "c1f39c2265dd1346cc2f6b0cd777e1c7ac5f7e681c0cde7e7484db76472c43f8:0": 1,
                "f97a3494fe9face5b75a55fa9ffa5439ed3c96baaa8cec11f5b5132269a4aeee:0": 1
            },
            "ClosedChans": {},
            "PolicyUpdates": {
                "188037d1670118bfaf6d99048f38ea100de8959e29133a65c98cb3d59b1d07cd:0": {
                    "02bdd10b212a52bd97909c2c3b931765a0dc72001635917c3d19933c9152dbac8c": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                },
                "9c2a33d872dd201436ad2ecfcd6d9a440adfbe07a916d89d6b8a7d9e2ab9ad82:0": {
                    "03965bafe134a54b218c0e821d0200f044713c20c0bc668520d5ac0bed989684a0": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                },
                "a53db446aec7066117c556595fce760dc2b2b02805991739ddca96af87dde415:0": {
                    "03173454d3e554e64653369d4ea4a491ac7a5287f5aa77b40118e072b131459a0c": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                }
            },
            "NodeCfg": {
                "LogFilenamePrefix": "invoice_routing_hints",
                "ExtraArgs": [
                    "--default-remote-max-htlcs=483"
                ],
                "HasSeed": false,
                "P2PPort": 5586,
                "RPCPort": 5587,
                "RESTPort": 5589,
                "ProfilePort": 0,
                "AcceptKeySend": true,
                "AcceptAMP": false,
                "FeeURL": "http://localhost:5584/fee-estimates.json"
            }
        }
        .../lnd/lntest/itest/lnd_routing_test.go:1312 (0x1b94def)
            testInvoiceRoutingHints: t.Fatalf("timed out waiting for channel open %s: %v",
        .../lnd/lntest/itest/test_harness.go:115 (0x1b2f98e)
            (*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        .../lnd/lntest/itest/lnd_test.go:247 (0x1bd2d4f)
            TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        /usr/local/Cellar/go/1.16/libexec/src/testing/testing.go:1194 (0x111980f)
            tRunner: fn(t)
        /usr/local/Cellar/go/1.16/libexec/src/runtime/asm_amd64.s:1371 (0x10717e1)
            goexit: BYTE    $0x90   // NOP

@Roasbeef Roasbeef added this to the v0.14.0 milestone Aug 9, 2021
@Roasbeef Roasbeef modified the milestones: v0.14.0, flake hunting szn Aug 9, 2021
@Roasbeef Roasbeef added the itests Issues related to integration tests. label Aug 9, 2021
@Roasbeef
Copy link
Member

Will the channel update changes (haven't dove into the PR yet) address this flake?

    test_harness.go:91: Failed: (update channel status): exited with error: 
        *errors.errorString did not receive channel update
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/assertions.go:722 (0x15220b0)
        	waitForChannelUpdate: t.Fatalf("did not receive channel update")
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_channel_graph_test.go:203 (0x153dade)
        	testUpdateChanStatus: waitForChannelUpdate(
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/test_harness.go:115 (0x152eed5)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_test.go:247 (0x15d32d4)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/testing/testing.go:1193 (0xb0536f)
        	tRunner: fn(t)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/runtime/asm_amd64.s:1371 (0xa452a1)
        	goexit: BYTE	$0x90	// NOP

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake hunting continues!

Great set of findings here, did an initial pass over the set of changes and only found one or two things worth discussing/revising.

routing/notifications.go Show resolved Hide resolved
lntest/node.go Show resolved Hide resolved
lntest/node.go Outdated
@@ -1590,3 +1535,75 @@ func (hn *HarnessNode) WaitForBalance(expectedBalance btcutil.Amount, confirmed

return nil
}

// outpoint returns the outpoint of the channel's funding transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

outpoint -> makeOutpoint

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

lntest/node.go Show resolved Hide resolved
delete(hn.closeClients, op)
}
hn.handleClosedChannelUpdate(graphUpdate.ClosedChans)
// TODO(yy): handle node updates too
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC the current itests we have for gossip/node updates just check the channel graph directly?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so. We use describe graph iirc.

lntest/node.go Outdated
// "advertisingNode1": [
// policy1, policy2,...
// ],
// "advertisingNode2": [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: indentation is a bit off here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed.

lntest/node.go Outdated
@@ -1222,6 +1243,10 @@ const (
// watchOpenChannel specifies that this is a request to watch an close
// channel event.
watchCloseChannel

// watchOpenChannel specifies that this is a request to watch an policy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

watch for a*

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed!

select {
// Send a watch request every second.
case <-ticker.C:
hn.chanWatchRequests <- &chanWatchRequest{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why send a new watch request each time vs a single one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is an easy way to check the want vs got policies inside handleChannelEdgeUpdates. Unlike checking for open channels, where we just increment counts whenever we see a channel point and have the logic of deciding it being open or not inside the function, we have no knowledge about what policy to compare to. Plus I think it's a bit more explicit(imo) to code it this way.

@@ -1324,6 +1353,9 @@ func (hn *HarnessNode) lightningNetworkWatcher() {

case watchCloseChannel:
hn.handleCloseChannelWatchRequest(watchRequest)

case watchPolicyUpdate:
hn.handlePolicyUpdateWatchRequest(watchRequest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of doing this here vs the graphUpdates case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to handle the subscription and following checks in one place. And the lines of code decreased. And this is what I have in mind atm, to makeitest better,

handle all lightning node related RPC calls at HarnessNode level
handle miner/backend related RPC calls at NetworkHarness level
handle assertions in assertions.go
Thus our test file would be better organized and the length should be greatly decreased.

@@ -1611,17 +1611,17 @@ func (s *server) Start() error {
}
cleanup = cleanup.add(s.utxoNursery.Stop)

if err := s.chainArb.Start(); err != nil {
if err := s.breachArbiter.Start(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heh, looks like the very same bug fix is actually implemented in this very old PR: #1783

I wonder if we should cherry-pick it as is, as it attempts to address other known dependency startup issues like what you set out to fix here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that pr is pretty outdated and some of the suggested changes were already made so I just added the needed changes here manually.

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch 3 times, most recently from aca5ead to cc98939 Compare August 10, 2021 19:27
@yyforyongyu
Copy link
Member Author

Will the channel update changes (haven't dove into the PR yet) address this flake?

    test_harness.go:91: Failed: (update channel status): exited with error: 
        *errors.errorString did not receive channel update
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/assertions.go:722 (0x15220b0)
        	waitForChannelUpdate: t.Fatalf("did not receive channel update")
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_channel_graph_test.go:203 (0x153dade)
        	testUpdateChanStatus: waitForChannelUpdate(
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/test_harness.go:115 (0x152eed5)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_test.go:247 (0x15d32d4)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/testing/testing.go:1193 (0xb0536f)
        	tRunner: fn(t)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/runtime/asm_amd64.s:1371 (0xa452a1)
        	goexit: BYTE	$0x90	// NOP

I think so.

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great set of improvements! Really like how thoroughly you approach those flake fixes 💯

Just a few minor comments, looks mostly good to me!

lntest/node.go Outdated
@@ -1590,3 +1535,81 @@ func (hn *HarnessNode) WaitForBalance(expectedBalance btcutil.Amount, confirmed

return nil
}

// makeOutpoint returns the outpoint of the channel's funding transaction.
func makeOutpoint(chanPoint *lnrpc.ChannelPoint) (wire.OutPoint, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use rpcPointToWirePoint in lnd_channel_backup_test.go? Or move/rename that one to this file?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Use this method in rpcPointToWirePoint instead.

lntest/node.go Outdated Show resolved Hide resolved
lntest/node.go Outdated Show resolved Hide resolved
lntest/node.go Outdated Show resolved Hide resolved
lntest/node.go Outdated Show resolved Hide resolved
lntest/node.go Outdated
IncludeUnannounced: include,
})
if err != nil {
hn.AddToLog("DescribeGraph got err: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to return here as we'd run into a nil pointer further down the line if we don't. Or pass in t to fail the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok the more I change the more I feel that, maybe we should use hn.AddToLog to print to console instead of the log?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...

lntest/node.go Outdated
for _, newChan := range updates {
op, err := makeOutpoint(newChan.ChanPoint)
if err != nil {
hn.AddToLog("failed to create outpoint for %v"+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return here to make sure we don't use an invalid outpoint? Or pass in t to fail the test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a return here.
We could perhaps use panic instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.

Also nit: add space after %v.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I ended up printing the error to the console. Thought about using panic which made me panic, being afraid of it would panic where it shouldn't panic😂
Also the nits are fixed!

lntest/node.go Outdated Show resolved Hide resolved
@yyforyongyu
Copy link
Member Author

Lots of the itest failed, weird, will address them later

Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a rebase!

Otherwise LGTM barring the discussion re return vs. panic 🎉

lntest/node.go Outdated
for _, newChan := range updates {
op, err := makeOutpoint(newChan.ChanPoint)
if err != nil {
hn.AddToLog("failed to create outpoint for %v"+
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.

Also nit: add space after %v.

lntest/node.go Outdated
IncludeUnannounced: include,
})
if err != nil {
hn.AddToLog("DescribeGraph got err: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch 2 times, most recently from 1e3557e to 1dec6f3 Compare August 24, 2021 02:42
@guggero
Copy link
Collaborator

guggero commented Aug 24, 2021

Needs a rebase.

@yyforyongyu yyforyongyu force-pushed the itest-flake-chan-open branch 2 times, most recently from c4fbad0 to 334aec6 Compare August 24, 2021 23:39
@yyforyongyu
Copy link
Member Author

Rebased upon #5637 to see the cumulative itest results.

@Roasbeef
Copy link
Member

Can be rebased now that the dependent PR is in!

@yyforyongyu
Copy link
Member Author

Can be rebased now that the dependent PR is in!

Done! Now pending the build results!

Copy link
Member

@Roasbeef Roasbeef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✏️

Just needs a rebase and this can land, death to flakes!

@yyforyongyu
Copy link
Member Author

LGTM ✏️

Just needs a rebase and this can land, death to flakes!

Done!

@guggero guggero merged commit 583ccfe into lightningnetwork:master Sep 17, 2021
@yyforyongyu yyforyongyu deleted the itest-flake-chan-open branch September 18, 2021 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flake fix itests Issues related to integration tests. P2 should be fixed if one has time
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants