Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable async io op on powerpc architectures #1224

Merged
merged 3 commits into from
Jul 15, 2021

Conversation

adammoody
Copy link
Contributor

@adammoody adammoody commented Jul 13, 2021

Similar to the changes made for the CPU Adam op, this updates the asynchronous I/O op to build on PowerPC systems by using -mcpu=native instead of -march=native and by guarding the includes for cpuid.h and x86intrin.h.

This additionally moves the simd_width and cpu_arch methods from op_builder/cpu_adam.py to op_builder/builder.py, where both are also called from op_builder/async_io.py.

Note that the simd_width method in builder.py has changed slightly in that it returns -D__SCALAR__ as a fallback instead of an empty string. This was to copy the behavior used in cpu_adam.py. I think that change should be safe, but please double check me.

A couple of open items for simd_width:

  • It doesn't seem like anything actually uses -D__SCALAR__.
  • It looks like an empty string is still returned if lscpu cannot be found. Should that also be -D__SCALAR__ to be consistent?

@tjruwase
Copy link
Contributor

@adammoody, thanks.

@stas00, FYI. I recall there was an issue with empty string on some hardware.

@stas00
Copy link
Collaborator

stas00 commented Jul 13, 2021

Yes, except it was AVX.... which was "" on some hardware, which was invisible in the gcc command line printed, but was breaking gcc which was run via subprocess.

A better long-term solution is to change the builder to filter out any empty string options as these guarantee a problem and difficult debug time.

@adammoody
Copy link
Contributor Author

adammoody commented Jul 13, 2021

I see the async_io.py avoids appending empty strings here:

simd_width = self.simd_width()
if len(simd_width) > 0:
args.append(simd_width)

Perhaps that logic should be copied to cpu_adam, as well?

And it could be done for both CPU_ARCH and SIMD_WIDTH.

Or maybe just always add those values, but scan the list of args to drop any empty strings.

@tjruwase
Copy link
Contributor

@adammoody, are you able to apply the empty string filtering to this PR?

@adammoody
Copy link
Contributor Author

Yes, I'll rework this later today to filter out any empty strings.

@stas00
Copy link
Collaborator

stas00 commented Jul 13, 2021

Probably the best long term solution would be to do it here for all sub-classes at once:

def builder(self):

it will make the function a bit bloated, but can also wrap each of these calls to remove empty string elements in the list.

e.g. by adding a wrapper strip_empty_entries(self.cxx_args()) (for each of these calls) (strip_empty_entries needs to be written)

We can probably then remove this workaround: #1224 (comment)


and even better an upstream patch to torch.utils.cpp_extension.CppExtension to do the same, but it won't help much because DS has to support older pytorch versions.

@adammoody
Copy link
Contributor Author

adammoody commented Jul 14, 2021

I pushed those changes as separate commits. I took a first pass before I saw the later comments. The second commit is to align with those directions. I can rebase all of these into one commit later, but I thought it may be easier to review the incremental changes.

I found a three places where cxx_args is invoked from the builder classes. Please let me know if I missed any or got too many.

I also rearranged the order of flags in cpu_adam to align better with async_io. Also, please let me know if you'd like me to roll back that change to restore the original order. While doing that, I did notice that the async op is built with -O0 while cpu_adam uses -O3.

I also changed simd_width to return -D__SCALAR__ when lscpu is not found.

@adammoody
Copy link
Contributor Author

I decided to restore the option order and rebase the last few commits to reduce the number of changes.

@tjruwase
Copy link
Contributor

@adammoody, looks good. Right now the CI is failing because of formatting issues. Please see here for simple steps to fix these issues. Thanks!

@stas00
Copy link
Collaborator

stas00 commented Jul 14, 2021

What @tjruwase said, I use the manual:

pre-commit run --all-files

I'd include the wrapper for all 4 calls to future proof the buillder code:

sources=self.sources(),
include_dirs=self.include_paths(),
extra_compile_args={'cxx': self.cxx_args()},
extra_link_args=self.extra_ldflags())

but of course this can be done as the need arises.


While doing that, I did notice that the async op is built with -O0 while cpu_adam uses -O3.

Yes, I raises this issue a while back - I think we have these flags repeated more than twice too in the final gcc call - I forget which one of -OX takes precedence when repeated.

Basically need to look at the final gcc sequence and see if it needs to be cleaned up - remember we have these coming from pytorch too. Perhaps a separate issue/PR?

@adammoody
Copy link
Contributor Author

@adammoody, looks good. Right now the CI is failing because of formatting issues. Please see here for simple steps to fix these issues. Thanks!

Yes, thanks. I haven't integrated the format checking into my development environment yet. I'll do that next.

@adammoody
Copy link
Contributor Author

adammoody commented Jul 14, 2021

@stas00 , I just pushed a commit to also strip the other flags you mentioned. I also see some for cuda flags passed to nvcc, like:

extra_cuda_cflags=self.nvcc_args(),

I'm guessing we should probably also include those. Is that right?

@stas00
Copy link
Collaborator

stas00 commented Jul 14, 2021

I'd say Yes! since the specific builders override it

@adammoody
Copy link
Contributor Author

This strips empty strings from the nvcc flags now, as well.

@tjruwase tjruwase merged commit 89b0fb4 into microsoft:master Jul 15, 2021
@adammoody adammoody deleted the asyncio branch December 18, 2023 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants