Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

accounts/keystore: fix flaky test TestWalletNotifications #30053

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Halimao
Copy link
Contributor

@Halimao Halimao commented Jun 23, 2024

This PR fixes the checkEvents part of the flaky test TestWalletNotifications. There is another flaky failure for checkAccounts.

// checkEvents checks that all events in 'want' are present in 'have'. Events may be present multiple times.
func checkEvents(t *testing.T, want []walletEvent, have []walletEvent) {}

The original logic of checkEvents seems to ask for both wantEvents and haveEvents to keep a consistent order so that this function could work correctly.

But actually, the order of both is not consistent. See the failed log comment on #29830

keystore_test.go:455: can't find event with Kind=2 for 56fe5c503fb7f1850298b7d6896cfd8019b525e1

Before this patch, the stress test of TestWalletNotifications failed by checkAccounts and checkEvents

5s: 237 runs so far, 0 failures

/tmp/go-stress-20240623T163016-2051068115
--- FAIL: TestWalletNotifications (0.55s)
    keystore_test.go:427: wallet list doesn't match required accounts: have 519, want 518
    keystore_test.go:455: can't find event with Kind=0 for bd535fefc07673125b61b29febbc0cc078046a54
FAIL


ERROR: exit status 1

10s: 490 runs so far, 1 failures (0.20%)
15s: 745 runs so far, 1 failures (0.13%)
20s: 997 runs so far, 1 failures (0.10%)
25s: 1253 runs so far, 1 failures (0.08%)
30s: 1489 runs so far, 1 failures (0.07%)
35s: 1708 runs so far, 1 failures (0.06%)
40s: 1925 runs so far, 1 failures (0.05%)

/tmp/go-stress-20240623T163016-1363815113
--- FAIL: TestWalletNotifications (0.52s)
    keystore_test.go:427: wallet list doesn't match required accounts: have 539, want 538
    keystore_test.go:455: can't find event with Kind=0 for d96d2f8b7d825b01038487e904fda016b8a47a9a
FAIL

After this patch, the stress test of TestWalletNotifications only failed by checkAccounts.

halimao@Galaxy:keystore$ stress ./keystore.test -test.run=TestWalletNotifications -test.cpu=5
5s: 241 runs so far, 0 failures
10s: 493 runs so far, 0 failures
15s: 741 runs so far, 0 failures
20s: 988 runs so far, 0 failures
25s: 1238 runs so far, 0 failures
30s: 1467 runs so far, 0 failures
35s: 1679 runs so far, 0 failures

/tmp/go-stress-20240623T155222-1929018433
--- FAIL: TestWalletNotifications (0.65s)
    keystore_test.go:428: wallet list doesn't match required accounts: have 517, want 516
FAIL


ERROR: exit status 1

40s: 1879 runs so far, 1 failures (0.05%)
45s: 2092 runs so far, 1 failures (0.05%)
50s: 2302 runs so far, 1 failures (0.04%)
55s: 2517 runs so far, 1 failures (0.04%)
1m0s: 2728 runs so far, 1 failures (0.04%)
1m5s: 2934 runs so far, 1 failures (0.03%)

/tmp/go-stress-20240623T155222-687466893
--- FAIL: TestWalletNotifications (0.81s)
    keystore_test.go:428: wallet list doesn't match required accounts: have 531, want 530
FAIL

@MariusVanDerWijden
Copy link
Member

Could you dump out the difference between live and wallets in L427 where the remaining error occurs?
Would be great to fix the test once and for all

@Halimao
Copy link
Contributor Author

Halimao commented Jun 24, 2024

@MariusVanDerWijden I have Dumped out the difference between live and wallets, but this does a little help to find the bug.

It's more difficult to fix checkAccounts than checkEvents, I'm not sure if I can make it, but I will try it again.

@Halimao
Copy link
Contributor Author

Halimao commented Jun 25, 2024

@MariusVanDerWijden I have found the reason for checkAccounts issue.

  1. The following call chain will start watcher, and call scanAccounts after about 500ms:
    Keystore.NewAccount/Keystore.Delete -> Keystore.refreshWallets -> accountCache.accounts -> accountCache.maybeReload
  2. There is no lock to ensure Keystore.Delete/Keystore.Add and scanAccounts are mutually exclusive

image

If we only fix the flaky test issue, to avoid executing scanAccounts we can reduce the round number 1024 to 500, following is the stress test result.

4m15s: 22560 runs so far, 0 failures
4m20s: 22988 runs so far, 0 failures
4m25s: 23383 runs so far, 0 failures
4m30s: 23813 runs so far, 0 failures
4m35s: 24238 runs so far, 0 failures
4m40s: 24664 runs so far, 0 failures
4m45s: 25082 runs so far, 0 failures
4m50s: 25537 runs so far, 0 failures
4m55s: 25990 runs so far, 0 failures
5m0s: 26422 runs so far, 0 failures
5m5s: 26863 runs so far, 0 failures
5m10s: 27315 runs so far, 0 failures
5m15s: 27758 runs so far, 0 failures
5m20s: 28207 runs so far, 0 failures
5m25s: 28631 runs so far, 0 failures

@Halimao
Copy link
Contributor Author

Halimao commented Jun 25, 2024

@MariusVanDerWijden Upgraded to use file cache lock, to ensure delete account and scanAccounts add account won't process at the same time.

@Halimao Halimao changed the title accounts/keystore: fix checkEvents of TestWalletNotifications accounts/keystore: fix flaky test TestWalletNotifications Jun 25, 2024
@MariusVanDerWijden
Copy link
Member

Cool nice find, reaching into the cache to acquire the lock kinda feels a bit weird to me. Feels like we are crossing the boundaries of our abstraction. I will think a bit about it

@fjl
Copy link
Contributor

fjl commented Jun 25, 2024

I'm not sure about the changes in checkEvents. For some events, the order matters. For example, wallet addition and removal should be emitted in the correct order. We can't verify that anymore with the way you rewrote the function.

@Halimao
Copy link
Contributor Author

Halimao commented Jun 25, 2024

I'm not sure about the changes in checkEvents. For some events, the order matters. For example, wallet addition and removal should be emitted in the correct order. We can't verify that anymore with the way you rewrote the function.

Then there must be something wrong with the order of events sent by wallet addition and removal. All events were handled correctly indeed, no events were missing actually, but they were not in the expected order.

@rjl493456442 rjl493456442 self-assigned this Jun 25, 2024
@sgerhardt
Copy link

sgerhardt commented Jul 11, 2024

I wonder if the changes to accounts/keystore/keystore_test.go here fix the account issue you mentioned?

edit: nm, it looks like these changes now fix the checkAccounts issue as well - at first I just read the initial comment which was focused on checkEvents, but after reading the rest of the comments and pulling your changes and running the tests I don't see the error related to checkAccounts now either. 👍

@Halimao
Copy link
Contributor Author

Halimao commented Jul 12, 2024

I wonder if the changes to accounts/keystore/keystore_test.go here fix the account issue you mentioned?

There is no event missing.

All events were handled correctly indeed, no events were missing actually, but they were not in the expected order.

We need to find out why they are out of order.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants