itest: fix test flakes from open channel not found and tx not found in mempool #5611

yyforyongyu · 2021-08-08T10:21:34Z

Depends #5637
This PR fixes the itest flakes from open channel not found and tx not found in mempool. FWIW, the node graph subscription is refactored so that the graph update stream is handled in the node level. We will use only one topology client to check for channel open/close/update.

Meanwhile, I think the structure of itest can be more or less defined now, we could,

handle all lightning node related RPC calls at HarnessNode level
handle miner/backend related RPC calls at NetworkHarness level
handle assertions in assertions.go
Thus our test file would be better organized and the length should be greatly decreased.

In addition, improved logging is added. We can now see the node's internal state, something like,

=== RUN   TestLightningNetworkDaemon/tranche00/28-of-82/btcd/invoice_routing_hints
    test_harness.go:91: Failed: (invoice routing hints): exited with error: 
        *errors.errorString timed out waiting for channel open alice-bob: channel:f97a3494fe9face5b75a55fa9ffa5439ed3c96baaa8cec11f5b5132269a4aeee:0 not opened before timeout: 
        node state: {
            "NodeID": 0,
            "Name": "Alice",
            "PubKey": "020696e7c9f64eecc0eb65c7f02094bbe0d291376f4160a8c29846698468cd249e",
            "OpenChans": {
                "188037d1670118bfaf6d99048f38ea100de8959e29133a65c98cb3d59b1d07cd:0": 2,
                "9c2a33d872dd201436ad2ecfcd6d9a440adfbe07a916d89d6b8a7d9e2ab9ad82:0": 2,
                "a53db446aec7066117c556595fce760dc2b2b02805991739ddca96af87dde415:0": 2,
                "c1f39c2265dd1346cc2f6b0cd777e1c7ac5f7e681c0cde7e7484db76472c43f8:0": 1,
                "f97a3494fe9face5b75a55fa9ffa5439ed3c96baaa8cec11f5b5132269a4aeee:0": 1
            },
            "ClosedChans": {},
            "PolicyUpdates": {
                "188037d1670118bfaf6d99048f38ea100de8959e29133a65c98cb3d59b1d07cd:0": {
                    "02bdd10b212a52bd97909c2c3b931765a0dc72001635917c3d19933c9152dbac8c": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                },
                "9c2a33d872dd201436ad2ecfcd6d9a440adfbe07a916d89d6b8a7d9e2ab9ad82:0": {
                    "03965bafe134a54b218c0e821d0200f044713c20c0bc668520d5ac0bed989684a0": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                },
                "a53db446aec7066117c556595fce760dc2b2b02805991739ddca96af87dde415:0": {
                    "03173454d3e554e64653369d4ea4a491ac7a5287f5aa77b40118e072b131459a0c": [
                        {
                            "time_lock_delta": 40,
                            "min_htlc": 1000,
                            "fee_base_msat": 1000,
                            "fee_rate_milli_msat": 1,
                            "max_htlc_msat": 99000000
                        }
                    ]
                }
            },
            "NodeCfg": {
                "LogFilenamePrefix": "invoice_routing_hints",
                "ExtraArgs": [
                    "--default-remote-max-htlcs=483"
                ],
                "HasSeed": false,
                "P2PPort": 5586,
                "RPCPort": 5587,
                "RESTPort": 5589,
                "ProfilePort": 0,
                "AcceptKeySend": true,
                "AcceptAMP": false,
                "FeeURL": "http://localhost:5584/fee-estimates.json"
            }
        }
        .../lnd/lntest/itest/lnd_routing_test.go:1312 (0x1b94def)
            testInvoiceRoutingHints: t.Fatalf("timed out waiting for channel open %s: %v",
        .../lnd/lntest/itest/test_harness.go:115 (0x1b2f98e)
            (*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        .../lnd/lntest/itest/lnd_test.go:247 (0x1bd2d4f)
            TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        /usr/local/Cellar/go/1.16/libexec/src/testing/testing.go:1194 (0x111980f)
            tRunner: fn(t)
        /usr/local/Cellar/go/1.16/libexec/src/runtime/asm_amd64.s:1371 (0x10717e1)
            goexit: BYTE    $0x90   // NOP

Roasbeef · 2021-08-10T01:21:28Z

Will the channel update changes (haven't dove into the PR yet) address this flake?

    test_harness.go:91: Failed: (update channel status): exited with error: 
        *errors.errorString did not receive channel update
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/assertions.go:722 (0x15220b0)
        	waitForChannelUpdate: t.Fatalf("did not receive channel update")
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_channel_graph_test.go:203 (0x153dade)
        	testUpdateChanStatus: waitForChannelUpdate(
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/test_harness.go:115 (0x152eed5)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_test.go:247 (0x15d32d4)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/testing/testing.go:1193 (0xb0536f)
        	tRunner: fn(t)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/runtime/asm_amd64.s:1371 (0xa452a1)
        	goexit: BYTE	$0x90	// NOP

Roasbeef

Flake hunting continues!

Great set of findings here, did an initial pass over the set of changes and only found one or two things worth discussing/revising.

routing/notifications.go

lntest/node.go

Roasbeef · 2021-08-10T02:41:13Z

lntest/node.go

@@ -1590,3 +1535,75 @@ func (hn *HarnessNode) WaitForBalance(expectedBalance btcutil.Amount, confirmed

 	return nil
 }
+
+// outpoint returns the outpoint of the channel's funding transaction.


outpoint -> makeOutpoint

lntest/node.go

Roasbeef · 2021-08-10T02:46:04Z

lntest/node.go

-				delete(hn.closeClients, op)
-			}
+			hn.handleClosedChannelUpdate(graphUpdate.ClosedChans)
+			// TODO(yy): handle node updates too


IIRC the current itests we have for gossip/node updates just check the channel graph directly?

I think so. We use describe graph iirc.

Roasbeef · 2021-08-10T02:52:21Z

lntest/node.go

+//      "advertisingNode1": [
+//              policy1, policy2,...
+//         ],
+//          "advertisingNode2": [


Style nit: indentation is a bit off here.

Roasbeef · 2021-08-10T02:53:28Z

lntest/node.go

@@ -1222,6 +1243,10 @@ const (
 	// watchOpenChannel specifies that this is a request to watch an close
 	// channel event.
 	watchCloseChannel
+
+	// watchOpenChannel specifies that this is a request to watch an policy


watch for a*

Roasbeef · 2021-08-10T02:56:52Z

lntest/node.go

+		select {
+		// Send a watch request every second.
+		case <-ticker.C:
+			hn.chanWatchRequests <- &chanWatchRequest{


Why send a new watch request each time vs a single one?

I don't think there is an easy way to check the want vs got policies inside handleChannelEdgeUpdates. Unlike checking for open channels, where we just increment counts whenever we see a channel point and have the logic of deciding it being open or not inside the function, we have no knowledge about what policy to compare to. Plus I think it's a bit more explicit(imo) to code it this way.

Roasbeef · 2021-08-10T02:57:14Z

lntest/node.go

@@ -1324,6 +1353,9 @@ func (hn *HarnessNode) lightningNetworkWatcher() {

 			case watchCloseChannel:
 				hn.handleCloseChannelWatchRequest(watchRequest)
+
+			case watchPolicyUpdate:
+				hn.handlePolicyUpdateWatchRequest(watchRequest)


What's the advantage of doing this here vs the graphUpdates case?

I think it's better to handle the subscription and following checks in one place. And the lines of code decreased. And this is what I have in mind atm, to makeitest better,

handle all lightning node related RPC calls at HarnessNode level
handle miner/backend related RPC calls at NetworkHarness level
handle assertions in assertions.go
Thus our test file would be better organized and the length should be greatly decreased.

Roasbeef · 2021-08-10T03:06:48Z

server.go

@@ -1611,17 +1611,17 @@ func (s *server) Start() error {
 		}
 		cleanup = cleanup.add(s.utxoNursery.Stop)

-		if err := s.chainArb.Start(); err != nil {
+		if err := s.breachArbiter.Start(); err != nil {


Heh, looks like the very same bug fix is actually implemented in this very old PR: #1783

I wonder if we should cherry-pick it as is, as it attempts to address other known dependency startup issues like what you set out to fix here.

Turns out that pr is pretty outdated and some of the suggested changes were already made so I just added the needed changes here manually.

yyforyongyu · 2021-08-10T19:28:33Z

Will the channel update changes (haven't dove into the PR yet) address this flake?

    test_harness.go:91: Failed: (update channel status): exited with error: 
        *errors.errorString did not receive channel update
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/assertions.go:722 (0x15220b0)
        	waitForChannelUpdate: t.Fatalf("did not receive channel update")
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_channel_graph_test.go:203 (0x153dade)
        	testUpdateChanStatus: waitForChannelUpdate(
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/test_harness.go:115 (0x152eed5)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        C:/Users/travis/gopath/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_test.go:247 (0x15d32d4)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/testing/testing.go:1193 (0xb0536f)
        	tRunner: fn(t)
        C:/Users/travis/.gimme/versions/go1.16.3.windows.amd64/src/runtime/asm_amd64.s:1371 (0xa452a1)
        	goexit: BYTE	$0x90	// NOP

I think so.

guggero

Great set of improvements! Really like how thoroughly you approach those flake fixes 💯

Just a few minor comments, looks mostly good to me!

guggero · 2021-08-11T08:32:34Z

lntest/node.go

@@ -1590,3 +1535,81 @@ func (hn *HarnessNode) WaitForBalance(expectedBalance btcutil.Amount, confirmed

 	return nil
 }
+
+// makeOutpoint returns the outpoint of the channel's funding transaction.
+func makeOutpoint(chanPoint *lnrpc.ChannelPoint) (wire.OutPoint, error) {


nit: use rpcPointToWirePoint in lnd_channel_backup_test.go? Or move/rename that one to this file?

Fixed. Use this method in rpcPointToWirePoint instead.

lntest/node.go

guggero · 2021-08-11T08:59:30Z

lntest/node.go

+		IncludeUnannounced: include,
+	})
+	if err != nil {
+		hn.AddToLog("DescribeGraph got err: %v", err)


I think we need to return here as we'd run into a nil pointer further down the line if we don't. Or pass in t to fail the test.

Ok the more I change the more I feel that, maybe we should use hn.AddToLog to print to console instead of the log?

I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...

guggero · 2021-08-11T11:50:54Z

lntest/node.go

+	for _, newChan := range updates {
+		op, err := makeOutpoint(newChan.ChanPoint)
+		if err != nil {
+			hn.AddToLog("failed to create outpoint for %v"+


Do we need to return here to make sure we don't use an invalid outpoint? Or pass in t to fail the test?

Added a return here.
We could perhaps use panic instead?

Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.

Also nit: add space after %v.

So I ended up printing the error to the console. Thought about using panic which made me panic, being afraid of it would panic where it shouldn't panic😂
Also the nits are fixed!

lntest/node.go

yyforyongyu · 2021-08-12T05:49:55Z

Lots of the itest failed, weird, will address them later

guggero

Needs a rebase!

Otherwise LGTM barring the discussion re return vs. panic 🎉

guggero · 2021-08-23T12:03:07Z

lntest/node.go

+	for _, newChan := range updates {
+		op, err := makeOutpoint(newChan.ChanPoint)
+		if err != nil {
+			hn.AddToLog("failed to create outpoint for %v"+


Hmm, yeah, a panic could work here. Though we don't use panics anywhere else in our tests just yet.

Also nit: add space after %v.

guggero · 2021-08-23T12:06:34Z

lntest/node.go

+		IncludeUnannounced: include,
+	})
+	if err != nil {
+		hn.AddToLog("DescribeGraph got err: %v", err)


I was thinking the same... Because downloading the logs from Travis is a bit tedious anyway. Or we don't log but indeed start to panic in cases like this. That should make it very obvious why the itests fail...

guggero · 2021-08-24T08:46:30Z

Needs a rebase.

yyforyongyu · 2021-08-24T23:43:05Z

Rebased upon #5637 to see the cumulative itest results.

Roasbeef · 2021-09-14T03:36:49Z

Can be rebased now that the dependent PR is in!

yyforyongyu · 2021-09-14T03:50:15Z

Can be rebased now that the dependent PR is in!

Done! Now pending the build results!

Roasbeef

LGTM ✏️

Just needs a rebase and this can land, death to flakes!

This commit removes the method waitForChannelUpdate, and uses node.WaitForChannelPolicyUpdate instead.

This commit adds the part of the changes made in this PR: lightningnetwork#1783. The origin PR is quite outdated, instead of rebasing it the relevant changes are taken out and put into this commit.

yyforyongyu · 2021-09-16T23:51:42Z

LGTM ✏️

Just needs a rebase and this can land, death to flakes!

Done!

yyforyongyu force-pushed the itest-flake-chan-open branch from 563df9c to 9a7cd60 Compare August 8, 2021 10:52

yyforyongyu requested review from Roasbeef and guggero August 9, 2021 13:31

Roasbeef added this to the v0.14.0 milestone Aug 9, 2021

Roasbeef added the flake fix label Aug 9, 2021

Roasbeef modified the milestones: v0.14.0, flake hunting szn Aug 9, 2021

Roasbeef added the itests Issues related to integration tests. label Aug 9, 2021

Roasbeef requested changes Aug 10, 2021

View reviewed changes

yyforyongyu force-pushed the itest-flake-chan-open branch 3 times, most recently from aca5ead to cc98939 Compare August 10, 2021 19:27

guggero reviewed Aug 11, 2021

View reviewed changes

yyforyongyu force-pushed the itest-flake-chan-open branch from cc98939 to 86ae217 Compare August 12, 2021 02:21

yyforyongyu force-pushed the itest-flake-chan-open branch 2 times, most recently from 77703a2 to d73417d Compare August 20, 2021 10:32

yyforyongyu mentioned this pull request Aug 20, 2021

itest flake: revoked_uncooperative_close_retribution_zero_value_remote_output #5645

Closed

guggero approved these changes Aug 23, 2021

View reviewed changes

yyforyongyu force-pushed the itest-flake-chan-open branch 2 times, most recently from 1e3557e to 1dec6f3 Compare August 24, 2021 02:42

yyforyongyu requested a review from Roasbeef August 24, 2021 02:55

yyforyongyu force-pushed the itest-flake-chan-open branch from 1dec6f3 to 5eeb998 Compare August 24, 2021 04:07

yyforyongyu force-pushed the itest-flake-chan-open branch 2 times, most recently from c4fbad0 to 334aec6 Compare August 24, 2021 23:39

yyforyongyu force-pushed the itest-flake-chan-open branch from 334aec6 to 5614fad Compare August 25, 2021 01:07

yyforyongyu force-pushed the itest-flake-chan-open branch 3 times, most recently from 0681eb7 to 81be9a9 Compare September 13, 2021 23:35

yyforyongyu mentioned this pull request Sep 14, 2021

itest: tests randomly failed with neutrino backend #4376

Closed

yyforyongyu force-pushed the itest-flake-chan-open branch from 81be9a9 to a4e2fab Compare September 14, 2021 03:42

yyforyongyu force-pushed the itest-flake-chan-open branch from a4e2fab to 25a1a7f Compare September 15, 2021 09:23

Roasbeef approved these changes Sep 16, 2021

View reviewed changes

Roasbeef enabled auto-merge September 16, 2021 21:57

yyforyongyu added 14 commits September 17, 2021 07:50

routing: increase log level when notifying topology change

eadbd69

itest: add more verbose log and print node state

a102416

lntest: refactor handle update open channel

0701834

lntest: refactor handle close channel update

92cd665

itest: remove extra graph topology subscription

a58543d

itest: replace chanOpen bool with chanWatchType

d2277ac

itest: watch channel policy updates in harness node

87c13d3

itest: move tests by their category

06fa175

itest: remove the method waitForChannelUpdate

cdec34c

This commit removes the method waitForChannelUpdate, and uses node.WaitForChannelPolicyUpdate instead.

itest: fix typo

7038d0e

contractcourt+lnd: add debug log

64f4e21

lnd: change start/stop order of subsystems

e0e1bfb

This commit adds the part of the changes made in this PR: lightningnetwork#1783. The origin PR is quite outdated, instead of rebasing it the relevant changes are taken out and put into this commit.

itest: put node.CloseChannel inside wait

66dae6e

docs: add release note

87ab4de

yyforyongyu force-pushed the itest-flake-chan-open branch from 25a1a7f to 87ab4de Compare September 16, 2021 23:51

guggero disabled auto-merge September 17, 2021 07:51

guggero merged commit 583ccfe into lightningnetwork:master Sep 17, 2021

yyforyongyu deleted the itest-flake-chan-open branch September 18, 2021 00:55

yyforyongyu mentioned this pull request Nov 21, 2023

[hardening]: synchronously handle on-chain events during startup #8166

Open

itest: fix test flakes from open channel not found and tx not found in mempool #5611

itest: fix test flakes from open channel not found and tx not found in mempool #5611

Conversation

yyforyongyu commented Aug 8, 2021 • edited Loading

Roasbeef commented Aug 10, 2021

Roasbeef left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yyforyongyu commented Aug 10, 2021

guggero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yyforyongyu commented Aug 12, 2021

guggero left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guggero commented Aug 24, 2021

yyforyongyu commented Aug 24, 2021

Roasbeef commented Sep 14, 2021

yyforyongyu commented Sep 14, 2021

Roasbeef left a comment

Choose a reason for hiding this comment

yyforyongyu commented Sep 16, 2021

yyforyongyu commented Aug 8, 2021 •

edited

Loading