Skip to content

Commit 1abde63

Browse files
tushar00jainfacebook-github-bot
authored andcommitted
handle exception waiting for work
Summary: work.wait() can throw so wrap that in a try/catch to handle it gracefully by reporting error to the manager, leading the should_commit to fail Differential Revision: D84880993
1 parent b3be7ad commit 1abde63

File tree

1 file changed

+11
-6
lines changed

1 file changed

+11
-6
lines changed

torchft/manager.py

Lines changed: 11 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1244,14 +1244,19 @@ def _assert_same_stream(self) -> None:
12441244
def wait(self, timeout: Optional[timedelta] = None) -> bool:
12451245
self._assert_same_stream()
12461246

1247-
with get_stream_context(self._stream):
1248-
self._work.wait()
1249-
self._set_future_callback()
1247+
try:
1248+
with get_stream_context(self._stream):
1249+
self._work.wait()
1250+
self._set_future_callback()
12501251

1251-
with get_stream_context(self._stream):
1252-
self._managed_fut_tail.wait()
1252+
with get_stream_context(self._stream):
1253+
self._managed_fut_tail.wait()
12531254

1254-
return True
1255+
return True
1256+
except Exception as e:
1257+
self._manager._logger.exception(f"got exception waiting for work {e}")
1258+
self._manager.report_error(e)
1259+
return False
12551260

12561261
def block_current_stream(self, timeout: Optional[timedelta] = None) -> None:
12571262
self._assert_same_stream()

0 commit comments

Comments
 (0)