You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched in the issues and found no similar issues.
Describe the bug
We've added some new nodes to our production environment, and sometimes these nodes experience RPC timeouts (that's a separate issue). This can cause a lot of tasks to fail, and the reason for the failures is that some nodes haven't received heartbeats for a long time, leading to the app data being cleared from the server.
We found that there's a bit of an issue with the timeoutMs in ShuffleWriteClientImpl#sendAppHeartbeat method. The timeoutMs for per RPC is similar to the timeoutMs of ThreadUtils#executeTasks` for all servers RPC execute, and the logic here is flawed.
Affects Version(s)
master
Uniffle Server Log Output
No response
Uniffle Engine Log Output
No response
Uniffle Server Configurations
No response
Uniffle Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?
Yes I am willing to submit a PR!
The text was updated successfully, but these errors were encountered:
xumanbu
added a commit
to xumanbu/incubator-uniffle
that referenced
this issue
Oct 16, 2024
Code of Conduct
Search before asking
Describe the bug
We've added some new nodes to our production environment, and sometimes these nodes experience RPC timeouts (that's a separate issue). This can cause a lot of tasks to fail, and the reason for the failures is that some nodes haven't received heartbeats for a long time, leading to the app data being cleared from the server.
We found that there's a bit of an issue with the
timeoutMs
inShuffleWriteClientImpl#sendAppHeartbeat
method. ThetimeoutMs
for per RPC is similar to thetimeoutMs of
ThreadUtils#executeTasks` for all servers RPC execute, and the logic here is flawed.Affects Version(s)
master
Uniffle Server Log Output
No response
Uniffle Engine Log Output
No response
Uniffle Server Configurations
No response
Uniffle Engine Configurations
No response
Additional context
No response
Are you willing to submit PR?
The text was updated successfully, but these errors were encountered: