Make del call shutdown and close in DataCollectors and Envs #209

OdedKrams · 2022-06-21T12:01:07Z

Rearrange the code and remove unneeded error handling so shutdown and close functions will be called from the del function.
From the unit tests it seems that:

Envs don't need to be close nor deleted before getting out of scope. The del and the GC are doing the job well.
In DataCollectors it seems there is some race condition in the GC so that the internal Env may be deleted before the DataCollector. So the DataCollector must be deleted before getting out of scope. Its del function delete and remove all objects in the proper way. Without calling del collector in the end to the unit test, the test hangs in the end.

vmoens · 2022-06-21T12:33:19Z

In DataCollectors it seems there is some race condition in the GC so that the internal Env may be deleted before the DataCollector. So the DataCollector must be deleted before getting out of scope. Its del function delete and remove all objects in the proper way. Without calling del collector in the end to the unit test, the test hangs in the end.

I noticed that too, thanks for identifying the issue!
So what's your take on this?
It might be the reason why we decided to explicitly ask the user to close data collectors and envs, otherwise there may be a silent problem with their script (ie the script never terminates). This may have really bad consequences (e.g. a training script on a cluster that borrows nodes for ages).

I see basically 2 options: either we make sure that this phenomenon never happens or we revert and ask users to close the collector / env when it's not used, otherwise an error is thrown.

…ut shutdown before

Introduce failure of processes when idle for a while

vmoens

Let's think about what would be the behaviour of the test without the new feature.

vmoens · 2022-06-23T10:43:36Z

test/test_collector.py

+        pin_memory=False,
+    )
+    for i, d in enumerate(ccollector):
+        if i == 0:


we can just break when i == 2 or something

vmoens · 2022-06-23T10:43:48Z

test/test_collector.py

+            b2c = d
+        else:
+            break
+    with pytest.raises(AssertionError):


this is tested elsewhere

vmoens · 2022-06-23T10:44:29Z

test/test_collector.py

@@ -107,7 +107,7 @@ def env_fn(seed):
            break
    with pytest.raises(AssertionError):
        assert_allclose_td(b1, b2)
-    collector.shutdown()


can we keep these, as we test it somewhere else?
Like this if for some reason the new feature breaks only that test will fail

Sounds good

vmoens · 2022-06-23T10:45:37Z

test/test_env.py

@@ -129,7 +129,6 @@ def test_env_seed(env_name, frame_skip, seed=0):
        assert_allclose_td(td0a, td0c.select(*td0a.keys()))
    with pytest.raises(AssertionError):
        assert_allclose_td(td1a, td1c)
-    env.close()


I would keep these though, for the same reason as above.
But if we can test that things work properly in a separate function that's even better!

vmoens · 2022-06-23T10:46:57Z

test/test_collector.py

+        assert_allclose_td(b1c, b2c)
+
+    if should_shutdown:
+        ccollector.shutdown()


If the function hangs forever, the test will not fail (it will not close)
Do you think that without the new feature we'd get a meaningful error message with this?

vmoens · 2022-06-23T10:48:12Z

torchrl/__init__.py

@@ -87,7 +87,7 @@ def _check_for_faulty_process(processes):
            break
    if terminate:
        raise RuntimeError(
-            "At least on process failed. Check for more infos in the log."
+            "At least one process failed. Check for more infos in the log."


good catch!

OdedKrams added 5 commits June 21, 2022 12:02

change parallel test to reproduce out of scope errors

22df519

replace collector shutdown call with del call

59f1751

remove all env close calls (del from gc will do that job)

070fe62

remove the __del__ function from _BatchedEnv to use the super one

98be0ba

remove unneeded 'pass'

bfe59b8

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 21, 2022

pre commit formatting

be6d299

OdedKrams marked this pull request as ready for review June 21, 2022 12:08

vmoens changed the title ~~T123527109: TorchRL: Make __del__ call shutdown and close in DataCollectors and Envs~~ Make __del__ call shutdown and close in DataCollectors and Envs Jun 21, 2022

OdedKrams and others added 8 commits June 22, 2022 14:51

add a runtime error is trying to delete the _MultiDataCollector witho…

50fbe60

…ut shutdown before

fix error message and shutdown only the async data collector

c397e05

init

4673b4d

amend

e958a7e

Merge pull request #1 from vmoens/vince_addons

bc6489d

Introduce failure of processes when idle for a while

add test to check runtime with and without shutdown the collector

dc10caf

pre-commit format

f5e542c

remove unneeded raise

8c85310

vmoens added the enhancement New feature or request label Jun 23, 2022

vmoens reviewed Jun 23, 2022

View reviewed changes

vmoens merged commit 44b2fe5 into pytorch:main Jun 24, 2022

vmoens pushed a commit that referenced this pull request Jun 24, 2022

Make __del__ call shutdown and close in DataCollectors and Envs (#209)

4c0d836

vmoens pushed a commit that referenced this pull request Jun 24, 2022

Make __del__ call shutdown and close in DataCollectors and Envs (#209)

d3e022c

vmoens linked an issue Jun 24, 2022 that may be closed by this pull request

Make __del__ call shutdown and close in DataCollectors and Envs #197

Closed

vmoens mentioned this pull request Jun 24, 2022

Make __del__ call shutdown and close in DataCollectors and Envs #197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Make del call shutdown and close in DataCollectors and Envs #209

Make del call shutdown and close in DataCollectors and Envs #209

Uh oh!

OdedKrams commented Jun 21, 2022

Uh oh!

vmoens commented Jun 21, 2022

Uh oh!

vmoens left a comment

Uh oh!

vmoens Jun 23, 2022

Uh oh!

vmoens Jun 23, 2022

Uh oh!

vmoens Jun 23, 2022

Uh oh!

OdedKrams Jun 26, 2022

Uh oh!

vmoens Jun 23, 2022

Uh oh!

OdedKrams Jun 26, 2022

Uh oh!

vmoens Jun 23, 2022

Uh oh!

vmoens Jun 23, 2022

Uh oh!

Uh oh!

Make __del__ call shutdown and close in DataCollectors and Envs #209

Make __del__ call shutdown and close in DataCollectors and Envs #209

Uh oh!

Conversation

OdedKrams commented Jun 21, 2022

Uh oh!

vmoens commented Jun 21, 2022

Uh oh!

vmoens left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Make del call shutdown and close in DataCollectors and Envs #209

Make del call shutdown and close in DataCollectors and Envs #209