Skip to content

Conversation

stevemilk
Copy link
Contributor

@stevemilk stevemilk commented May 28, 2024

This pull request fixes a flaky test TestUpdatedKeyfileContents in #29830

Problem Statement:
In this test, we start a watcher to monitor file changes and then update memory cache. If the first file change event is sent before the watcher is ready, the watcher may miss this event , resulting in a perpetual blockage and failure to update the cache. This is the root cause behind occasional errors such as "First replacement failed" encountered in TestUpdatedKeyfileContents.

Repro Steps:

  1. Add time.Sleep(500 * time.Millisecond) to func (w *watcher) loop() to delay the loop.
func (w *watcher) loop() {
	time.Sleep(500 * time.Millisecond)
	...
}
  1. Run TestUpdatedKeyfileContents.

Solution:
Use WaitGroup to ensure that the watcher loop starts before sending file change events.

@fjl fjl self-requested a review May 28, 2024 19:31
Copy link
Contributor

@fjl fjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea!

@stevemilk
Copy link
Contributor Author

stevemilk commented May 29, 2024

After further investigation, I discovered a function called waitWatcherStart intended to ensure the watcher begins before file updates. However, this function fails in this test due to the following steps:

  1. NewKeyStore fails to initiate the watcher loop since the directory has not been created. Consequently, the watcher is halted, and watcher.runEnded is set to true.
  2. MakeDir creates the directory.
  3. The watcher loop restarts, and we wait for the watcher to commence. However, watcher.runEnded has been set to true, causing waitWatcherStart to immediately return true. This can occur before the loop actually begins.
  4. Consequently, the watcher may miss the file change event and become stuck in a perpetual block.

The initial two commits attempt to rectify the situation after the aforementioned issues arise. However, a simpler solution to fix this test is to create the directory before NewKeyStore. This ensures the watcher successfully starts on the first attempt, and waitWatcherStart functions as intended.

Considering TestUpdatedKeyfileContents aims to verify if the cache updates correctly when the keystore file changes, the straightforward fix is more appropriate. Nonetheless, the initial two commits remain valuable if, in practical usage, the directory is created after NewKeyStore.

@fjl I would like to hear your suggestions.

Copy link
Contributor

@holiman holiman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this fix fixes the issue, then LGTM. For this specific test, I don't think creating the keystore beforehand makes any difference for the test

@rjl493456442
Copy link
Member

I don't think creating the keystore beforehand makes any difference for the test

This will prevent the file watcher failure due to "non-existent folder" and fix the test accordingly.

@namiloh
Copy link

namiloh commented May 29, 2024

I don't think creating the keystore beforehand makes any difference for the test

This will prevent the file watcher failure due to "non-existent folder" and fix the test accordingly.

Right, yes I meant that it will not negatively impact the "things being tested"

@fjl fjl merged commit b8cf163 into ethereum:master May 29, 2024
@fjl fjl added this to the 1.14.4 milestone May 29, 2024
jorgemmsilva pushed a commit to iotaledger/go-ethereum that referenced this pull request Jun 17, 2024
Create the directory before NewKeyStore. This ensures the watcher successfully starts on
the first attempt, and waitWatcherStart functions as intended.
@Halimao Halimao mentioned this pull request Jun 23, 2024
8 tasks
gzliudan added a commit to gzliudan/XDPoSChain that referenced this pull request Jan 24, 2025
Create the directory before NewKeyStore. This ensures the watcher successfully starts on
the first attempt, and waitWatcherStart functions as intended.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants