-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enable async io op on powerpc architectures #1224
Conversation
@adammoody, thanks. @stas00, FYI. I recall there was an issue with empty string on some hardware. |
Yes, except it was AVX.... which was A better long-term solution is to change the builder to filter out any empty string options as these guarantee a problem and difficult debug time. |
I see the async_io.py avoids appending empty strings here: DeepSpeed/op_builder/async_io.py Lines 46 to 48 in 54bed32
Perhaps that logic should be copied to cpu_adam, as well? And it could be done for both CPU_ARCH and SIMD_WIDTH. Or maybe just always add those values, but scan the list of args to drop any empty strings. |
@adammoody, are you able to apply the empty string filtering to this PR? |
Yes, I'll rework this later today to filter out any empty strings. |
Probably the best long term solution would be to do it here for all sub-classes at once: DeepSpeed/op_builder/builder.py Line 221 in 54bed32
it will make the function a bit bloated, but can also wrap each of these calls to remove empty string elements in the list. e.g. by adding a wrapper We can probably then remove this workaround: #1224 (comment) and even better an upstream patch to |
I pushed those changes as separate commits. I took a first pass before I saw the later comments. The second commit is to align with those directions. I can rebase all of these into one commit later, but I thought it may be easier to review the incremental changes. I found a three places where I also rearranged the order of flags in cpu_adam to align better with async_io. Also, please let me know if you'd like me to roll back that change to restore the original order. While doing that, I did notice that the async op is built with I also changed |
I decided to restore the option order and rebase the last few commits to reduce the number of changes. |
@adammoody, looks good. Right now the CI is failing because of formatting issues. Please see here for simple steps to fix these issues. Thanks! |
What @tjruwase said, I use the manual:
I'd include the wrapper for all 4 calls to future proof the buillder code: DeepSpeed/op_builder/builder.py Lines 224 to 227 in 54bed32
but of course this can be done as the need arises.
Yes, I raises this issue a while back - I think we have these flags repeated more than twice too in the final gcc call - I forget which one of -OX takes precedence when repeated. Basically need to look at the final gcc sequence and see if it needs to be cleaned up - remember we have these coming from pytorch too. Perhaps a separate issue/PR? |
Yes, thanks. I haven't integrated the format checking into my development environment yet. I'll do that next. |
@stas00 , I just pushed a commit to also strip the other flags you mentioned. I also see some for cuda flags passed to nvcc, like: DeepSpeed/op_builder/builder.py Line 274 in 54bed32
I'm guessing we should probably also include those. Is that right? |
I'd say Yes! since the specific builders override it |
This strips empty strings from the nvcc flags now, as well. |
Similar to the changes made for the CPU Adam op, this updates the asynchronous I/O op to build on PowerPC systems by using
-mcpu=native
instead of-march=native
and by guarding the includes forcpuid.h
andx86intrin.h
.This additionally moves the
simd_width
andcpu_arch
methods fromop_builder/cpu_adam.py
toop_builder/builder.py
, where both are also called fromop_builder/async_io.py
.Note that the
simd_width
method inbuilder.py
has changed slightly in that it returns-D__SCALAR__
as a fallback instead of an empty string. This was to copy the behavior used incpu_adam.py
. I think that change should be safe, but please double check me.A couple of open items for
simd_width
:-D__SCALAR__
.lscpu
cannot be found. Should that also be-D__SCALAR__
to be consistent?