-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: should return exitCode even if cmd isn't nil #16137
Conversation
For the pkg/expect package, if the process has been stopped but there is no `Close()` call, the `ExitCode()` won't return exit code correctly. The `ExitCode()` should check `exitErr` and return exit code if cmd isn't nil. And introduces `exitCode` to return correct exit code based on the process is signaled or exited. Signed-off-by: Wei Fu <fuweid89@gmail.com>
It seem that the failure is not related to this change. Updated: If the process exits, the Lines 152 to 201 in 31b20ef
Maybe we should wait for |
ping @serathius @ahrtr @jmhbnz @chaochn47 please take a look when you have time. Thanks |
Merging as this is test only and it's causing flakes. |
Let's say there is command which outputs one line and exit with error. There are three goroutines to acquire the lock: 1. ep.read() 2. ep.waitSaveExitErr() 3. ep.Expect() When ep.read goroutine reads the log but it doesn't acquire the lock in time, the ep.waitSaveExitErr acquires the lock and updates the `exitErr`. And then ep.Expect acquires lock but it doesn't see any log yet and then returns err. It's hard to reproduce it in local. Add the extra sleep can reproduce it. ```diff diff --git a/pkg/expect/expect.go b/pkg/expect/expect.go index a512a3ce4..602bea73f 100644 --- a/pkg/expect/expect.go +++ b/pkg/expect/expect.go @@ -128,6 +128,7 @@ func (ep *ExpectProcess) tryReadNextLine(r *bufio.Reader) error { printDebugLines := os.Getenv("EXPECT_DEBUG") != "" l, err := r.ReadString('\n') + time.Sleep(10 * time.Millisecond) ep.mu.Lock() defer ep.mu.Unlock() ``` See it once in Github Action [1]. In order to fix it, the patch introduces `readCloseCh` to wait for ep.read to get all the data and retry it. [1]: etcd-io#16137 (comment) Signed-off-by: Wei Fu <fuweid89@gmail.com>
Let's say there is command which outputs one line and exit with error. There are three goroutines to acquire the lock: 1. ep.read() 2. ep.waitSaveExitErr() 3. ep.Expect() When ep.read goroutine reads the log but it doesn't acquire the lock in time, the ep.waitSaveExitErr acquires the lock and updates the `exitErr`. And then ep.Expect acquires lock but it doesn't see any log yet and then returns err. It's hard to reproduce it in local. Add the extra sleep can reproduce it. ```diff diff --git a/pkg/expect/expect.go b/pkg/expect/expect.go index a512a3ce4..602bea73f 100644 --- a/pkg/expect/expect.go +++ b/pkg/expect/expect.go @@ -128,6 +128,7 @@ func (ep *ExpectProcess) tryReadNextLine(r *bufio.Reader) error { printDebugLines := os.Getenv("EXPECT_DEBUG") != "" l, err := r.ReadString('\n') + time.Sleep(10 * time.Millisecond) ep.mu.Lock() defer ep.mu.Unlock() ``` See it once in Github Action [1]. In order to fix it, the patch introduces `readCloseCh` to wait for ep.read to get all the data and retry it. [1]: etcd-io#16137 (comment) Signed-off-by: Wei Fu <fuweid89@gmail.com>
Let's say there is command which outputs one line and exit with error. There are three goroutines to acquire the lock: 1. ep.read() 2. ep.waitSaveExitErr() 3. ep.Expect() When ep.read goroutine reads the log but it doesn't acquire the lock in time, the ep.waitSaveExitErr acquires the lock and updates the `exitErr`. And then ep.Expect acquires lock but it doesn't see any log yet and then returns err. It's hard to reproduce it in local. Add the extra sleep can reproduce it. ```diff diff --git a/pkg/expect/expect.go b/pkg/expect/expect.go index a512a3ce4..602bea73f 100644 --- a/pkg/expect/expect.go +++ b/pkg/expect/expect.go @@ -128,6 +128,7 @@ func (ep *ExpectProcess) tryReadNextLine(r *bufio.Reader) error { printDebugLines := os.Getenv("EXPECT_DEBUG") != "" l, err := r.ReadString('\n') + time.Sleep(10 * time.Millisecond) ep.mu.Lock() defer ep.mu.Unlock() ``` See it once in Github Action [1]. In order to fix it, the patch introduces `readCloseCh` to wait for ep.read to get all the data and retry it. [1]: etcd-io#16137 (comment) Signed-off-by: Wei Fu <fuweid89@gmail.com>
Let's say there is command which outputs one line and exit with error. There are three goroutines to acquire the lock: 1. ep.read() 2. ep.waitSaveExitErr() 3. ep.Expect() When ep.read goroutine reads the log but it doesn't acquire the lock in time, the ep.waitSaveExitErr acquires the lock and updates the `exitErr`. And then ep.Expect acquires lock but it doesn't see any log yet and then returns err. It's hard to reproduce it in local. Add the extra sleep can reproduce it. ```diff diff --git a/pkg/expect/expect.go b/pkg/expect/expect.go index a512a3ce4..602bea73f 100644 --- a/pkg/expect/expect.go +++ b/pkg/expect/expect.go @@ -128,6 +128,7 @@ func (ep *ExpectProcess) tryReadNextLine(r *bufio.Reader) error { printDebugLines := os.Getenv("EXPECT_DEBUG") != "" l, err := r.ReadString('\n') + time.Sleep(10 * time.Millisecond) ep.mu.Lock() defer ep.mu.Unlock() ``` See it once in Github Action [1]. In order to fix it, the patch introduces `readCloseCh` to wait for ep.read to get all the data and retry it. [1]: etcd-io#16137 (comment) Signed-off-by: Wei Fu <fuweid89@gmail.com>
Let's say there is command which outputs one line and exit with error. There are three goroutines to acquire the lock: 1. ep.read() 2. ep.waitSaveExitErr() 3. ep.Expect() When ep.read goroutine reads the log but it doesn't acquire the lock in time, the ep.waitSaveExitErr acquires the lock and updates the `exitErr`. And then ep.Expect acquires lock but it doesn't see any log yet and then returns err. It's hard to reproduce it in local. Add the extra sleep can reproduce it. ```diff diff --git a/pkg/expect/expect.go b/pkg/expect/expect.go index a512a3ce4..602bea73f 100644 --- a/pkg/expect/expect.go +++ b/pkg/expect/expect.go @@ -128,6 +128,7 @@ func (ep *ExpectProcess) tryReadNextLine(r *bufio.Reader) error { printDebugLines := os.Getenv("EXPECT_DEBUG") != "" l, err := r.ReadString('\n') + time.Sleep(10 * time.Millisecond) ep.mu.Lock() defer ep.mu.Unlock() ``` See it once in Github Action [1]. In order to fix it, the patch introduces `readCloseCh` to wait for ep.read to get all the data and retry it. [1]: etcd-io#16137 (comment) Signed-off-by: Wei Fu <fuweid89@gmail.com>
For the pkg/expect package, if the process has been stopped but there is no
Close()
call, theExitCode()
won't return exit code correctly. TheExitCode()
should checkexitErr
and return exit code if cmd isn't nil.And introduces
exitCode
to return correct exit code based on the process is signaled or exited.Please read https://github.com/etcd-io/etcd/blob/main/CONTRIBUTING.md#contribution-flow.
Context: #16094 (comment)
cc @serathius @tjungblu
fixes: #16134