Skip to content

Add _allgather_base & _reduce_scatter_base to dist backend #3919

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

hgt312
Copy link
Collaborator

@hgt312 hgt312 commented Aug 23, 2022

Add support for dist._allgather_base and dist._reduce_scatter_base for pytorch xla dist backend. These two ops are simpler so we needn't to use concat/split when implement them and they are used by DeepSpeed/FSDP

@hgt312
Copy link
Collaborator Author

hgt312 commented Aug 23, 2022

@JackCaoG what's wrong with CI?
cc @hjm-aws Please help review this pr

@JackCaoG
Copy link
Collaborator

rebase should solve the issue. Can you fill out the description with what issue this pr intend to solve?

@hgt312 hgt312 force-pushed the dist_backend_patch branch from 02ff40e to 5ba3aae Compare August 24, 2022 23:09
@hjm-aws
Copy link
Collaborator

hjm-aws commented Aug 25, 2022

@hgt312 Hi Guangtai, it seems your purpose is to have simpler all-gather and reduce-scatter APIs. In that case, your application code can directly call xm.all_gather or xm.reduce_scatter. Adding a new hidden API to torch.distributed doesn't provide much value in my opinion.

@hgt312 hgt312 force-pushed the dist_backend_patch branch from 52647d3 to b2458c1 Compare August 29, 2022 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants