-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent creation of local temp directory #1494
Conversation
# Attempt to compile the C program into an object file. | ||
cflags = shlex.split(os.environ.get('CFLAGS', "")) | ||
objs = compiler.compile([filename], | ||
output_dir=output_dir, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tjruwase and I were looking at this a bit today. If we just pass tempdir
as the output_dir this seems to do the correct thing as well. We haven't tested this on Windows, but our support for Windows isn't comprehensive currently anyway. We also didn't test this on MacOS, if you have access. Not sure we need lines 215-217?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it would work as desired, though. If tempdir
is /path/to/temp/dir/
, then filename
would be /path/to/temp/dir/test.c
. The resulting output file would then be to /path/to/temp/dir/path/to/temp/dir/test.c
, which has undesirable repetition. I think the following diff would work instead:
diff --git a/op_builder/builder.py b/op_builder/builder.py
index de8c940..10f6d36 100644
--- a/op_builder/builder.py
+++ b/op_builder/builder.py
@@ -212,7 +212,8 @@ class OpBuilder(ABC):
# Attempt to compile the C program into an object file.
cflags = shlex.split(os.environ.get('CFLAGS', ""))
- objs = compiler.compile([filename],
+ objs = compiler.compile(['test.c'],
+ output_dir=tempdir,
extra_preargs=self.strip_empty_entries(cflags))
# Attempt to link the object file into an executable.
I know I tried the above while creating this PR, but I didn't use it for some reason I can't remember. If it works on both my Linux and macOS machine, do you want me to replace the current commit with it?
I also haven't tested it with Windows, so it's good to know fully Windows support isn't expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aphedges, yes please do update the PR if the simpler change works in your testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I somewhat wish I had written this down, but I wrote the code this way for a convoluted reason. Inside the call to distutils.ccompiler.CCompiler.compiler()
, a dict
is constructed to map object file paths to source file paths. The source file paths are the first argument to the function ([filename]
in this case), and the object file paths are automatically constructed from them with the following steps:
- The original (source) extension is removed
- The drive is removed from the path
- A leading
/
is removed from the path, if present - The new (object) extension is added
- The value of
output_dir
is added in front of the path
I at least remembered correctly about what would happen if I passed the arguments as [filename]
and output_dir=tempdir
. The function uses the directory path twice, which matches what I wrote in my previous comment.
The problem with passing ['test.c']
as the first argument is that output_dir
is only used to construct the output path. Without the full input path, the file will not be found.
I feel that unintuitive, undocumented behavior like this is part of the reason why distutils
is deprecated in the first place. PEP 632 (which deprecates the module) recommends using setuptools
instead, although I don't know if setuptools
is better with regards to this particular bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, this is subtle. Thank you for all the details here. It would be really awesome if we could switch this code path to use setuptools instead. Like you say though it’s hard to tell if this issue would be resolved there or not.
If you have anymore time to try using setuptools that would be amazing. If not I totally understand, we can defer changing to setuptools for another PR. Other wise I’d say we can merge this PR as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I first encountered this bug, I assumed it'd be a quick fix. I was so wrong...
Unfortunately, switching to setuptools
might not fix it, at least if we do it the easy way. One of the first things I checked when I figured out it was a distutils
issue was to see if it was fixed in the version of distutils
vendored into setuptools
. Unfortunately, it was not.
I was curious why the function works the way it does, so I just ran git blame
on the original file. The code to strip the drive and leading /
was added in python/cpython@10da45e, which has the associated issue 668662. At 18 years old, it's one of the more recent changes to the file. The issue fortunately points me towards a --build-temp
flag that might be relevant, but like the rest of distutils
, it's insufficiently documented.
I'm still new to Python packaging. Although I've been reading up on packaging recently, this PR was still a big learning experience. I do not think I am experienced enough to switch DeepSpeed from distutils
to setuptools
yet. distutils
will still be in a supported version of Python for another 3 years, so it's not like there is a big hurry to switch. Should I make an issue so someone remembers to upgrade at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's totally fair, thanks for all your work on investigating this @aphedges. It's very much appreciated :) We'll merge this current PR and work on investigating the setuptools
approach at a later time.
Fixes #1493.
This code essentially undoes the
distutils
behavior that causes the problem in the first place. It's hacky, but I can't think of a better solution.I tested that this fix works both with and without libaio present on the system, and I compiled both with and without async_io enabled. I only tested on Linux and macOS, so I hope the code works properly on a Windows filesystem.