enable async io op on powerpc architectures #1224

adammoody · 2021-07-13T08:03:57Z

Similar to the changes made for the CPU Adam op, this updates the asynchronous I/O op to build on PowerPC systems by using -mcpu=native instead of -march=native and by guarding the includes for cpuid.h and x86intrin.h.

This additionally moves the simd_width and cpu_arch methods from op_builder/cpu_adam.py to op_builder/builder.py, where both are also called from op_builder/async_io.py.

Note that the simd_width method in builder.py has changed slightly in that it returns -D__SCALAR__ as a fallback instead of an empty string. This was to copy the behavior used in cpu_adam.py. I think that change should be safe, but please double check me.

A couple of open items for simd_width:

It doesn't seem like anything actually uses -D__SCALAR__.
It looks like an empty string is still returned if lscpu cannot be found. Should that also be -D__SCALAR__ to be consistent?

tjruwase · 2021-07-13T13:58:59Z

@adammoody, thanks.

@stas00, FYI. I recall there was an issue with empty string on some hardware.

stas00 · 2021-07-13T19:05:39Z

Yes, except it was AVX.... which was "" on some hardware, which was invisible in the gcc command line printed, but was breaking gcc which was run via subprocess.

A better long-term solution is to change the builder to filter out any empty string options as these guarantee a problem and difficult debug time.

adammoody · 2021-07-13T19:35:29Z

I see the async_io.py avoids appending empty strings here:

DeepSpeed/op_builder/async_io.py

Lines 46 to 48 in 54bed32

    
           simd_width = self.simd_width() 
        
           if len(simd_width) > 0: 
        
               args.append(simd_width)

Perhaps that logic should be copied to cpu_adam, as well?

And it could be done for both CPU_ARCH and SIMD_WIDTH.

Or maybe just always add those values, but scan the list of args to drop any empty strings.

tjruwase · 2021-07-13T19:42:28Z

@adammoody, are you able to apply the empty string filtering to this PR?

adammoody · 2021-07-13T20:05:00Z

Yes, I'll rework this later today to filter out any empty strings.

stas00 · 2021-07-13T20:24:36Z

Probably the best long term solution would be to do it here for all sub-classes at once:

DeepSpeed/op_builder/builder.py

Line 221 in 54bed32

def builder(self):

it will make the function a bit bloated, but can also wrap each of these calls to remove empty string elements in the list.

e.g. by adding a wrapper strip_empty_entries(self.cxx_args()) (for each of these calls) (strip_empty_entries needs to be written)

We can probably then remove this workaround: #1224 (comment)

and even better an upstream patch to torch.utils.cpp_extension.CppExtension to do the same, but it won't help much because DS has to support older pytorch versions.

adammoody · 2021-07-14T06:50:29Z

I pushed those changes as separate commits. I took a first pass before I saw the later comments. The second commit is to align with those directions. I can rebase all of these into one commit later, but I thought it may be easier to review the incremental changes.

I found a three places where cxx_args is invoked from the builder classes. Please let me know if I missed any or got too many.

I also rearranged the order of flags in cpu_adam to align better with async_io. Also, please let me know if you'd like me to roll back that change to restore the original order. While doing that, I did notice that the async op is built with -O0 while cpu_adam uses -O3.

I also changed simd_width to return -D__SCALAR__ when lscpu is not found.

adammoody · 2021-07-14T07:09:18Z

I decided to restore the option order and rebase the last few commits to reduce the number of changes.

tjruwase · 2021-07-14T14:19:54Z

@adammoody, looks good. Right now the CI is failing because of formatting issues. Please see here for simple steps to fix these issues. Thanks!

stas00 · 2021-07-14T16:24:08Z

What @tjruwase said, I use the manual:

pre-commit run --all-files

I'd include the wrapper for all 4 calls to future proof the buillder code:

DeepSpeed/op_builder/builder.py

Lines 224 to 227 in 54bed32

    
           sources=self.sources(), 
        
           include_dirs=self.include_paths(), 
        
           extra_compile_args={'cxx': self.cxx_args()}, 
        
           extra_link_args=self.extra_ldflags())

but of course this can be done as the need arises.

While doing that, I did notice that the async op is built with -O0 while cpu_adam uses -O3.

Yes, I raises this issue a while back - I think we have these flags repeated more than twice too in the final gcc call - I forget which one of -OX takes precedence when repeated.

Basically need to look at the final gcc sequence and see if it needs to be cleaned up - remember we have these coming from pytorch too. Perhaps a separate issue/PR?

adammoody · 2021-07-14T16:50:26Z

@adammoody, looks good. Right now the CI is failing because of formatting issues. Please see here for simple steps to fix these issues. Thanks!

Yes, thanks. I haven't integrated the format checking into my development environment yet. I'll do that next.

adammoody · 2021-07-14T19:57:53Z

@stas00 , I just pushed a commit to also strip the other flags you mentioned. I also see some for cuda flags passed to nvcc, like:

DeepSpeed/op_builder/builder.py

Line 274 in 54bed32

extra_cuda_cflags=self.nvcc_args(),

I'm guessing we should probably also include those. Is that right?

stas00 · 2021-07-14T20:02:56Z

I'd say Yes! since the specific builders override it

adammoody · 2021-07-14T22:54:59Z

This strips empty strings from the nvcc flags now, as well.

adammoody requested review from awan-10, cli99, conglongli, eltonzheng, jeffra, minjiaz, niumanar, RezaYazdaniAminabadi, samyam, ShadenSmith and tjruwase as code owners July 13, 2021 08:03

adammoody force-pushed the asyncio branch from 791a3f1 to 2cd5ed2 Compare July 14, 2021 07:05

enable async io op on powerpc architectures

38f34d8

adammoody force-pushed the asyncio branch from 2cd5ed2 to d74e2c3 Compare July 14, 2021 07:24

tjruwase approved these changes Jul 14, 2021

View reviewed changes

adammoody force-pushed the asyncio branch from f6deae1 to 276fa2a Compare July 14, 2021 19:45

drop any empty strings returned by cxx_args

5ffef82

adammoody force-pushed the asyncio branch from 61370df to 5ffef82 Compare July 14, 2021 22:51

Merge branch 'master' into asyncio

c9735f5

tjruwase merged commit 89b0fb4 into microsoft:master Jul 15, 2021

adammoody mentioned this pull request Jul 16, 2021

[FEEDSTOCK REQUEST] DeepSpeed v0.3.16 open-ce/open-ce#415

Closed

adammoody deleted the asyncio branch December 18, 2023 04:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable async io op on powerpc architectures #1224

enable async io op on powerpc architectures #1224

adammoody commented Jul 13, 2021 •

edited

Loading

tjruwase commented Jul 13, 2021

stas00 commented Jul 13, 2021 •

edited

Loading

adammoody commented Jul 13, 2021 •

edited

Loading

tjruwase commented Jul 13, 2021

adammoody commented Jul 13, 2021

stas00 commented Jul 13, 2021 •

edited

Loading

adammoody commented Jul 14, 2021 •

edited

Loading

adammoody commented Jul 14, 2021

tjruwase commented Jul 14, 2021

stas00 commented Jul 14, 2021

adammoody commented Jul 14, 2021

adammoody commented Jul 14, 2021 •

edited

Loading

stas00 commented Jul 14, 2021

adammoody commented Jul 14, 2021

enable async io op on powerpc architectures #1224

enable async io op on powerpc architectures #1224

Conversation

adammoody commented Jul 13, 2021 • edited Loading

tjruwase commented Jul 13, 2021

stas00 commented Jul 13, 2021 • edited Loading

adammoody commented Jul 13, 2021 • edited Loading

tjruwase commented Jul 13, 2021

adammoody commented Jul 13, 2021

stas00 commented Jul 13, 2021 • edited Loading

adammoody commented Jul 14, 2021 • edited Loading

adammoody commented Jul 14, 2021

tjruwase commented Jul 14, 2021

stas00 commented Jul 14, 2021

adammoody commented Jul 14, 2021

adammoody commented Jul 14, 2021 • edited Loading

stas00 commented Jul 14, 2021

adammoody commented Jul 14, 2021

adammoody commented Jul 13, 2021 •

edited

Loading

stas00 commented Jul 13, 2021 •

edited

Loading

adammoody commented Jul 13, 2021 •

edited

Loading

stas00 commented Jul 13, 2021 •

edited

Loading

adammoody commented Jul 14, 2021 •

edited

Loading

adammoody commented Jul 14, 2021 •

edited

Loading