Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent creation of local temp directory #1494

Merged
merged 7 commits into from
Nov 2, 2021

Conversation

aphedges
Copy link
Contributor

Fixes #1493.

This code essentially undoes the distutils behavior that causes the problem in the first place. It's hacky, but I can't think of a better solution.

I tested that this fix works both with and without libaio present on the system, and I compiled both with and without async_io enabled. I only tested on Linux and macOS, so I hope the code works properly on a Windows filesystem.

# Attempt to compile the C program into an object file.
cflags = shlex.split(os.environ.get('CFLAGS', ""))
objs = compiler.compile([filename],
output_dir=output_dir,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjruwase and I were looking at this a bit today. If we just pass tempdir as the output_dir this seems to do the correct thing as well. We haven't tested this on Windows, but our support for Windows isn't comprehensive currently anyway. We also didn't test this on MacOS, if you have access. Not sure we need lines 215-217?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it would work as desired, though. If tempdir is /path/to/temp/dir/, then filename would be /path/to/temp/dir/test.c. The resulting output file would then be to /path/to/temp/dir/path/to/temp/dir/test.c, which has undesirable repetition. I think the following diff would work instead:

diff --git a/op_builder/builder.py b/op_builder/builder.py
index de8c940..10f6d36 100644
--- a/op_builder/builder.py
+++ b/op_builder/builder.py
@@ -212,7 +212,8 @@ class OpBuilder(ABC):
 
             # Attempt to compile the C program into an object file.
             cflags = shlex.split(os.environ.get('CFLAGS', ""))
-            objs = compiler.compile([filename],
+            objs = compiler.compile(['test.c'],
+                                    output_dir=tempdir,
                                     extra_preargs=self.strip_empty_entries(cflags))
 
             # Attempt to link the object file into an executable.

I know I tried the above while creating this PR, but I didn't use it for some reason I can't remember. If it works on both my Linux and macOS machine, do you want me to replace the current commit with it?

I also haven't tested it with Windows, so it's good to know fully Windows support isn't expected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aphedges, yes please do update the PR if the simpler change works in your testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I somewhat wish I had written this down, but I wrote the code this way for a convoluted reason. Inside the call to distutils.ccompiler.CCompiler.compiler(), a dict is constructed to map object file paths to source file paths. The source file paths are the first argument to the function ([filename] in this case), and the object file paths are automatically constructed from them with the following steps:

  1. The original (source) extension is removed
  2. The drive is removed from the path
  3. A leading / is removed from the path, if present
  4. The new (object) extension is added
  5. The value of output_dir is added in front of the path

I at least remembered correctly about what would happen if I passed the arguments as [filename] and output_dir=tempdir. The function uses the directory path twice, which matches what I wrote in my previous comment.

The problem with passing ['test.c'] as the first argument is that output_dir is only used to construct the output path. Without the full input path, the file will not be found.

I feel that unintuitive, undocumented behavior like this is part of the reason why distutils is deprecated in the first place. PEP 632 (which deprecates the module) recommends using setuptools instead, although I don't know if setuptools is better with regards to this particular bug.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, this is subtle. Thank you for all the details here. It would be really awesome if we could switch this code path to use setuptools instead. Like you say though it’s hard to tell if this issue would be resolved there or not.

If you have anymore time to try using setuptools that would be amazing. If not I totally understand, we can defer changing to setuptools for another PR. Other wise I’d say we can merge this PR as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I first encountered this bug, I assumed it'd be a quick fix. I was so wrong...

Unfortunately, switching to setuptools might not fix it, at least if we do it the easy way. One of the first things I checked when I figured out it was a distutils issue was to see if it was fixed in the version of distutils vendored into setuptools. Unfortunately, it was not.

I was curious why the function works the way it does, so I just ran git blame on the original file. The code to strip the drive and leading / was added in python/cpython@10da45e, which has the associated issue 668662. At 18 years old, it's one of the more recent changes to the file. The issue fortunately points me towards a --build-temp flag that might be relevant, but like the rest of distutils, it's insufficiently documented.

I'm still new to Python packaging. Although I've been reading up on packaging recently, this PR was still a big learning experience. I do not think I am experienced enough to switch DeepSpeed from distutils to setuptools yet. distutils will still be in a supported version of Python for another 3 years, so it's not like there is a big hurry to switch. Should I make an issue so someone remembers to upgrade at some point?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's totally fair, thanks for all your work on investigating this @aphedges. It's very much appreciated :) We'll merge this current PR and work on investigating the setuptools approach at a later time.

@jeffra jeffra enabled auto-merge (squash) November 2, 2021 20:41
@jeffra jeffra merged commit 91defd7 into microsoft:master Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Local tmp directory created when running ds_report
3 participants