Skip to content

Fix some bugs in conftest.py, one in _destroy_dist_context and the other in distributed #2788

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 1, 2022

Conversation

sadra-barikbin
Copy link
Collaborator

Description:

  1. One step in NCCL and Gloo single node distributed settings is to setup a port shared by the processes. This is done using a temporary file in which the rank zero writes the port number and the others read it. Currently this file isn't deleted at the end, so in the case a process other than rank 0 opens the file first, it reads the port being used in the past which results in timeout error in all ranks. This bug is not encountered in GitHub CI because disk resources are newly allocated, so the temporary file does not exist.
  2. To reduce setup and teardown time, I had set the scope of distributed fixture to module so as to have just one distributed setup and teardown for all distributed tests in a module but this caused some non-distributed tests to fail. It turned out that the teardown of the last instance of the fixture (we have an instance for nccl, one for gloo, one for gloo_cpu and...) takes place after running the last test of the module, even if it has not requested the fixture. Running the snippet below which is originally from pytest website shows this fact:
import pytest


@pytest.fixture(scope="module", params=["mod1", "mod2"])
def modarg(request):
    param = request.param
    print("  SETUP modarg", param)
    yield param
    print("  TEARDOWN modarg", param)


@pytest.fixture(scope="function", params=[1, 2])
def otherarg(request):
    param = request.param
    print("  SETUP otherarg", param)
    yield param
    print("  TEARDOWN otherarg", param)


def test_0(otherarg):
    print("  RUN test0 with otherarg", otherarg)


def test_1(modarg):
    print("  RUN test1 with modarg", modarg)

def test_11():
    print("hi")

def test_2(otherarg, modarg):
    print("  RUN test2 with otherarg {} and modarg {}".format(otherarg, modarg))

def test_3():
    print("there!")

Output:
image

So I changed the scope of the fixture from module to function to solve the issue.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 27, 2022

@sadra-barikbin do you think it would make sense and can help if we were using a class with fixture, something like

@pytest.mark.usefixtures("cleandir")
class TestDirectoryInit:
    def test_cwd_starts_empty(self):
        assert os.listdir(os.getcwd()) == []
        with open("myfile", "w") as f:
            f.write("hello")

    def test_cwd_again_starts_empty(self):
        assert os.listdir(os.getcwd()) == []

(in this example fixture is called for each function, but maybe there is a way to set it up once per class)

@sadra-barikbin sadra-barikbin force-pushed the Fix-two-bugs-in-conftest branch from 97a68aa to 63a62a3 Compare November 27, 2022 22:17
@sadra-barikbin sadra-barikbin changed the title Fix two bugs in conftest.py, one in _destroy_dist_context and the other in distributed Fix some bugs in conftest.py, one in _destroy_dist_context and the other in distributed Nov 27, 2022
@sadra-barikbin
Copy link
Collaborator Author

@sadra-barikbin do you think it would make sense and can help if we were using a class with fixture, something like

@pytest.mark.usefixtures("cleandir")
class TestDirectoryInit:
    def test_cwd_starts_empty(self):
        assert os.listdir(os.getcwd()) == []
        with open("myfile", "w") as f:
            f.write("hello")

    def test_cwd_again_starts_empty(self):
        assert os.listdir(os.getcwd()) == []

(in this example fixture is called for each function, but maybe there is a way to set it up once per class)

Yes we could and it helps. Teardown takes place exactly after the last test of the class:

import pytest


@pytest.fixture(scope="class", params=[1, 2])
def cls_fixt(request):
    param = request.param
    print("  SETUP cls_fixt", param)
    yield param
    print("  TEARDOWN cls_fixt", param)


@pytest.fixture(scope="function", params=[1, 2])
def fun_fixt(request):
    param = request.param
    print("  SETUP fun_fixt", param)
    yield param
    print("  TEARDOWN fun_fixt", param)

@pytest.mark.usefixtures("cls_fixt")
class Test2:
    def test_1(self):
        print("hello")
    def test_4(self, fun_fixt):
        print("world")

def test_2():
    print("hi")

def test_3(fun_fixt):
    print("there!")

Output:
image

By the way, current behavior of module scope does not seem to be correct or consistent, considering the behavior for class and function scopes. Am I wrong?

@sadra-barikbin
Copy link
Collaborator Author

In pytest they've considered this behavior, an inefficiency:
pytest-dev/pytest#3806

@sadra-barikbin sadra-barikbin force-pushed the Fix-two-bugs-in-conftest branch from 63a62a3 to 22692c1 Compare November 27, 2022 23:02
@sadra-barikbin sadra-barikbin force-pushed the Fix-two-bugs-in-conftest branch from 22692c1 to f1670ee Compare November 29, 2022 15:03
@sadra-barikbin
Copy link
Collaborator Author

The latest change that I introduced to distributed fixture is due to an error that came up when I removed scope='module'.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Nov 29, 2022

@sadra-barikbin CI is still failing somehow

@sadra-barikbin sadra-barikbin force-pushed the Fix-two-bugs-in-conftest branch from ef7dd05 to c54baf7 Compare November 30, 2022 18:48
Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sadra-barikbin , lgtm !

@sadra-barikbin
Copy link
Collaborator Author

Won't you merge it?

@vfdev-5 vfdev-5 merged commit 79c4235 into pytorch:master Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants