Skip to content

Commit aeedd0b

Browse files
author
Chris Elion
authored
surface specific GRPC errors more visibly (#4930)
1 parent 77c2504 commit aeedd0b

File tree

2 files changed

+29
-1
lines changed

2 files changed

+29
-1
lines changed

com.unity.ml-agents/CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,8 @@ and this project adheres to
5858
reduced the amount of memory allocated by approximately 25%. (#4887)
5959
- Removed several memory allocations that happened during inference with discrete actions. (#4922)
6060
- Properly catch permission errors when writing timer files. (#4921)
61+
- Unexpected gRPC exceptions during training are now logged before stopping training. If you see
62+
"noisy" log, please let us know! (#4930)
6163

6264
#### ml-agents / ml-agents-envs / gym-unity (Python)
6365
- Fixed a bug that would cause an exception when `RunOptions` was deserialized via `pickle`. (#4842)

com.unity.ml-agents/Runtime/Communicator/RpcCommunicator.cs

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -440,6 +440,7 @@ UnityInputProto Exchange(UnityOutputProto unityOutput)
440440
{
441441
return null;
442442
}
443+
443444
try
444445
{
445446
var message = m_Client.Exchange(WrapMessage(unityOutput, 200));
@@ -455,8 +456,33 @@ UnityInputProto Exchange(UnityOutputProto unityOutput)
455456
QuitCommandReceived?.Invoke();
456457
return message.UnityInput;
457458
}
458-
catch
459+
catch (RpcException rpcException)
460+
{
461+
// Log more verbose errors if they're something the user can possibly do something about.
462+
switch (rpcException.Status.StatusCode)
463+
{
464+
case StatusCode.Unavailable:
465+
// This can happen when python disconnects. Ignore it to avoid noisy logs.
466+
break;
467+
case StatusCode.ResourceExhausted:
468+
// This happens is the message body is too large. There's no way to
469+
// gracefully handle this, but at least we can show the message and the
470+
// user can try to reduce the number of agents or observation sizes.
471+
Debug.LogError($"GRPC Exception: {rpcException.Message}. Disconnecting from trainer.");
472+
break;
473+
default:
474+
// Other unknown errors. Log at INFO level.
475+
Debug.Log($"GRPC Exception: {rpcException.Message}. Disconnecting from trainer.");
476+
break;
477+
}
478+
m_IsOpen = false;
479+
QuitCommandReceived?.Invoke();
480+
return null;
481+
}
482+
catch (Exception ex)
459483
{
484+
// Fall-through for other error types
485+
Debug.LogError($"GRPC Exception: {ex.Message}. Disconnecting from trainer.");
460486
m_IsOpen = false;
461487
QuitCommandReceived?.Invoke();
462488
return null;

0 commit comments

Comments
 (0)