-
Notifications
You must be signed in to change notification settings - Fork 971
Fix AppState when Engine connection is terminated #6722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix AppState when Engine connection is terminated #6722
Conversation
|
@joaopamaral please fix the code style. (simply run |
|
reopen to retest |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6722 +/- ##
=======================================
Coverage 0.00% 0.00%
=======================================
Files 684 687 +3
Lines 42282 42445 +163
Branches 5767 5793 +26
=======================================
- Misses 42282 42445 +163 ☔ View full report in Codecov by Sentry. 🚨 Try these New Features:
|
abbb946 to
8409eac
Compare
# 🔍 Description
## Issue References 🔗
This issue was noticed a few times when the batch `state` was `set` to `ERROR`, but the `appState` kept the non-terminal state forever (e.g. `RUNNING`), even if the application was finished (in this case Yarn Application).
```json
{
"id": "********",
"user": "****",
"batchType": "SPARK",
"name": "*********",
"appStartTime": 0,
"appId": "********",
"appUrl": "********",
"appState": "RUNNING",
"appDiagnostic": "",
"kyuubiInstance": "*********",
"state": "ERROR",
"createTime": 1725343207318,
"endTime": 1725343300986,
"batchInfo": {}
}
```
It seems that this happens when there is some intermittent failure during the monitoring step and the batch ends with ERROR, leaving the application metadata without an update. This can lead to some misinterpretation that the application is still running. We need to set this to `UNKNOWN` state to avoid errors.
## Describe Your Solution 🔧
This is a simple fix that only checks if the batch state is `ERROR` and the appState is not in a terminal state and changes the `appState` to `UNKNOWN`, in these cases (during the batch metadata update).
## Types of changes 🔖
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
#### Behavior Without This Pull Request ⚰️
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR` state and the application keeps the last know state (e.g. RUNNING).
#### Behavior With This Pull Request 🎉
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR `state and the application has a non-terminal state, it is forced to `UNKNOWN` state.
#### Related Unit Tests
I've tried to implement a unit test to replicate this behavior but I didn't make it. We need to force an exception in the Engine Request (e.g. `YarnClient.getApplication`) but we need to wait for the application to be in the RUNNING state before raising this exception, or maybe block the connection between kyuubi and the engine.
---
# Checklist 📝
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes #6722 from joaopamaral/fix/app-state-on-batch-error.
Closes #6722
8409eac [Wang, Fei] fix
da8c356 [Joao Amaral] format fix
73b77b3 [Joao Amaral] use isTerminated
64f96a2 [Joao Amaral] Remove test
1eb80ef [Joao Amaral] Remove test
13498fa [Joao Amaral] Remove test
60ce55e [Joao Amaral] add todo
3a3ba16 [Joao Amaral] Fix
215ac66 [Joao Amaral] Fix AppState when Engine connection is terminated
Lead-authored-by: Joao Amaral <7281460+joaopamaral@users.noreply.github.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
(cherry picked from commit 27c734e)
Signed-off-by: Wang, Fei <fwang12@ebay.com>
# 🔍 Description
## Issue References 🔗
This issue was noticed a few times when the batch `state` was `set` to `ERROR`, but the `appState` kept the non-terminal state forever (e.g. `RUNNING`), even if the application was finished (in this case Yarn Application).
```json
{
"id": "********",
"user": "****",
"batchType": "SPARK",
"name": "*********",
"appStartTime": 0,
"appId": "********",
"appUrl": "********",
"appState": "RUNNING",
"appDiagnostic": "",
"kyuubiInstance": "*********",
"state": "ERROR",
"createTime": 1725343207318,
"endTime": 1725343300986,
"batchInfo": {}
}
```
It seems that this happens when there is some intermittent failure during the monitoring step and the batch ends with ERROR, leaving the application metadata without an update. This can lead to some misinterpretation that the application is still running. We need to set this to `UNKNOWN` state to avoid errors.
## Describe Your Solution 🔧
This is a simple fix that only checks if the batch state is `ERROR` and the appState is not in a terminal state and changes the `appState` to `UNKNOWN`, in these cases (during the batch metadata update).
## Types of changes 🔖
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
#### Behavior Without This Pull Request ⚰️
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR` state and the application keeps the last know state (e.g. RUNNING).
#### Behavior With This Pull Request 🎉
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR `state and the application has a non-terminal state, it is forced to `UNKNOWN` state.
#### Related Unit Tests
I've tried to implement a unit test to replicate this behavior but I didn't make it. We need to force an exception in the Engine Request (e.g. `YarnClient.getApplication`) but we need to wait for the application to be in the RUNNING state before raising this exception, or maybe block the connection between kyuubi and the engine.
---
# Checklist 📝
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes #6722 from joaopamaral/fix/app-state-on-batch-error.
Closes #6722
8409eac [Wang, Fei] fix
da8c356 [Joao Amaral] format fix
73b77b3 [Joao Amaral] use isTerminated
64f96a2 [Joao Amaral] Remove test
1eb80ef [Joao Amaral] Remove test
13498fa [Joao Amaral] Remove test
60ce55e [Joao Amaral] add todo
3a3ba16 [Joao Amaral] Fix
215ac66 [Joao Amaral] Fix AppState when Engine connection is terminated
Lead-authored-by: Joao Amaral <7281460+joaopamaral@users.noreply.github.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
(cherry picked from commit 27c734e)
Signed-off-by: Wang, Fei <fwang12@ebay.com>
|
thanks, merged to master/1.10.1/1.9.3 |
# 🔍 Description
## Issue References 🔗
This issue was noticed a few times when the batch `state` was `set` to `ERROR`, but the `appState` kept the non-terminal state forever (e.g. `RUNNING`), even if the application was finished (in this case Yarn Application).
```json
{
"id": "********",
"user": "****",
"batchType": "SPARK",
"name": "*********",
"appStartTime": 0,
"appId": "********",
"appUrl": "********",
"appState": "RUNNING",
"appDiagnostic": "",
"kyuubiInstance": "*********",
"state": "ERROR",
"createTime": 1725343207318,
"endTime": 1725343300986,
"batchInfo": {}
}
```
It seems that this happens when there is some intermittent failure during the monitoring step and the batch ends with ERROR, leaving the application metadata without an update. This can lead to some misinterpretation that the application is still running. We need to set this to `UNKNOWN` state to avoid errors.
## Describe Your Solution 🔧
This is a simple fix that only checks if the batch state is `ERROR` and the appState is not in a terminal state and changes the `appState` to `UNKNOWN`, in these cases (during the batch metadata update).
## Types of changes 🔖
- [x] Bugfix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
## Test Plan 🧪
#### Behavior Without This Pull Request ⚰️
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR` state and the application keeps the last know state (e.g. RUNNING).
#### Behavior With This Pull Request 🎉
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with `ERROR `state and the application has a non-terminal state, it is forced to `UNKNOWN` state.
#### Related Unit Tests
I've tried to implement a unit test to replicate this behavior but I didn't make it. We need to force an exception in the Engine Request (e.g. `YarnClient.getApplication`) but we need to wait for the application to be in the RUNNING state before raising this exception, or maybe block the connection between kyuubi and the engine.
---
# Checklist 📝
- [ ] This patch was not authored or co-authored using [Generative Tooling](https://www.apache.org/legal/generative-tooling.html)
**Be nice. Be informative.**
Closes apache#6722 from joaopamaral/fix/app-state-on-batch-error.
Closes apache#6722
8409eac [Wang, Fei] fix
da8c356 [Joao Amaral] format fix
73b77b3 [Joao Amaral] use isTerminated
64f96a2 [Joao Amaral] Remove test
1eb80ef [Joao Amaral] Remove test
13498fa [Joao Amaral] Remove test
60ce55e [Joao Amaral] add todo
3a3ba16 [Joao Amaral] Fix
215ac66 [Joao Amaral] Fix AppState when Engine connection is terminated
Lead-authored-by: Joao Amaral <7281460+joaopamaral@users.noreply.github.com>
Co-authored-by: Wang, Fei <fwang12@ebay.com>
Signed-off-by: Wang, Fei <fwang12@ebay.com>
🔍 Description
Issue References 🔗
This issue was noticed a few times when the batch
statewassettoERROR, but theappStatekept the non-terminal state forever (e.g.RUNNING), even if the application was finished (in this case Yarn Application).{ "id": "********", "user": "****", "batchType": "SPARK", "name": "*********", "appStartTime": 0, "appId": "********", "appUrl": "********", "appState": "RUNNING", "appDiagnostic": "", "kyuubiInstance": "*********", "state": "ERROR", "createTime": 1725343207318, "endTime": 1725343300986, "batchInfo": {} }It seems that this happens when there is some intermittent failure during the monitoring step and the batch ends with ERROR, leaving the application metadata without an update. This can lead to some misinterpretation that the application is still running. We need to set this to
UNKNOWNstate to avoid errors.Describe Your Solution 🔧
This is a simple fix that only checks if the batch state is
ERRORand the appState is not in a terminal state and changes theappStatetoUNKNOWN, in these cases (during the batch metadata update).Types of changes 🔖
Test Plan 🧪
Behavior Without This Pull Request ⚰️
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with
ERRORstate and the application keeps the last know state (e.g. RUNNING).Behavior With This Pull Request 🎉
If there is some error between the Kyuubi and the Application request (e.g. YARN client), the batch is finished with
ERRORstate and the application has a non-terminal state, it is forced toUNKNOWNstate.Related Unit Tests
I've tried to implement a unit test to replicate this behavior but I didn't make it. We need to force an exception in the Engine Request (e.g.
YarnClient.getApplication) but we need to wait for the application to be in the RUNNING state before raising this exception, or maybe block the connection between kyuubi and the engine.Checklist 📝
Be nice. Be informative.