Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] add device abstraction to allow other device than CUDA be used #2221

Merged
merged 86 commits into from
Mar 7, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
0a849d5
[device abstraction] add device abstraction to allow other device tha…
delock Aug 16, 2022
e4f40f0
Merge branch '202208-base' into 202208
delock Aug 24, 2022
4a216ea
[rebase-202208] additional changes needed when rebase to 202208
delock Aug 24, 2022
2137642
Merge branch '20220824-base' into 20220824
delock Aug 24, 2022
089657e
[rebase] cleanup direct cuda usage after merge
delock Aug 24, 2022
d5a8424
[precommit] fix pre-commit issues
delock Aug 25, 2022
96d0765
Merge branch 'master' into gma/device-abstraction
tjruwase Aug 30, 2022
ac64c7a
[pin_memory] make pin_memory select device type
delock Sep 1, 2022
02c3a57
Merge branch 'master' into gma/device-abstraction
delock Sep 8, 2022
522b24b
[downstream] merge from xpu support downstream
delock Sep 9, 2022
a3b1e02
Merge branch 'master' into gma/device-abstraction
tjruwase Sep 12, 2022
4557c33
Merge branch 'master' into gma/device-abstraction
tjruwase Sep 13, 2022
2ef7d6c
Merge branch 'up-master' into gma/merge-upstream-20220921
delock Sep 21, 2022
9656321
[device] port cuda device to literal_device() in new tests
delock Sep 21, 2022
65729e3
[accel_runtime] add pin_memory to accelerator runtime interface.
delock Sep 22, 2022
f94d53e
[accelerator abstraction] merge from #2320
delock Sep 26, 2022
6005abe
Merge branch 'up-master' into gma/device-abstraction
delock Sep 26, 2022
31c0997
change call site of literal_device, on_accel_device and accel_runtime…
delock Oct 12, 2022
1785c26
add new interface definition from olruwase/accelerator_abstraction
delock Oct 12, 2022
17203a4
[accelerator abstraction] remove name() from interface, device_name()…
delock Oct 14, 2022
e8daea6
merge with master (ec13da6ba7cabc44bb4745a64a208b8580792954)
delock Oct 14, 2022
cfd23ed
Merge branch 'up-master' into gma/device-abstraction
delock Oct 14, 2022
13bbbdf
[OpBuilder] Add op builder abstraction
delock Oct 23, 2022
06e39a5
Merge branch 'up-master' into gma/device-abstraction
delock Oct 23, 2022
257490f
convert op builder usage in merged code
delock Oct 23, 2022
c93b999
[OpBuilder] add create_op_builder interface in abstract_accelerator.py
delock Oct 23, 2022
9858d42
[OpBuilder] fix op builder usage in tests
delock Oct 23, 2022
68ce006
[OpBuilder] fix <op builder>.NAME usage in tests to follow op builder…
delock Oct 23, 2022
4b62dab
import get_accelerator from deepspeed.accelerator directly
delock Oct 23, 2022
c5b2070
[OpBuilder] remove unused function and sync with main
delock Oct 23, 2022
9532843
add missing get_accelerator import
delock Oct 25, 2022
0729695
fix obsolete name in CPU Adam which should be create_op_builder
delock Oct 25, 2022
be517d8
fix create_op_builder calls
delock Oct 25, 2022
3af870f
fix misuse of new accelerator abstraction interface in tests
delock Oct 25, 2022
8fa64b9
Merge from downstream for bug fixing
delock Oct 28, 2022
4873538
merge from downstream
delock Nov 3, 2022
61b10b0
remove SYCL_KERNEL specific code
delock Nov 4, 2022
457d281
Merge branch 'up-master(9cfcf7431a02a)' into gma/device-abstraction
delock Nov 8, 2022
fea4604
Merge branch 'up-master(6f77da1bae506)' into gma/device-abstraction
delock Nov 10, 2022
f80a907
Merge branch 'up-master(3ca9878d8e92a)' into gma/device-abstraction
delock Nov 10, 2022
3b0b14c
merge from downstream for bugs fixes
delock Nov 10, 2022
b375e46
Merge branch 'up-master(be5ec506bd5219a)' into gma/device-abstraction
delock Nov 11, 2022
18b3c95
fix torch.cuda in new files
delock Nov 11, 2022
97695f5
use OpBuilder name symbol, improve env_report, fix typo, fix get_acce…
delock Nov 13, 2022
93e157b
Merge branch 'master' into gma/device-abstraction
tjruwase Nov 13, 2022
b1c5384
fix missing () in get_accelerator for ds_attention.py
delock Nov 14, 2022
91fb948
import deepspeed.accelerator.get_accelerator only when torch_availabl…
delock Nov 14, 2022
8f89c2b
Merge branch 'up-master' into gma/device-abstraction
delock Dec 1, 2022
26e628d
Change reference of InferenceSpecializedBuilder to name string, Infer…
delock Dec 1, 2022
91f5cb2
convert new code with CUDA references
delock Dec 1, 2022
5a1ae0e
remove unneeded get_accelerator import in op_builder/__init__.py
delock Dec 1, 2022
05842b6
[setup] fix build error when pytorch is not installed in environment
delock Dec 1, 2022
24d2b38
Handle the case when torch is not installed during deepspeed installa…
delock Dec 1, 2022
c26e5d4
Merge branch 'master' into gma/device-abstraction
tjruwase Dec 2, 2022
4116ba5
Merge branch 'up-master' into gma/device-abstraction
delock Jan 8, 2023
bea648f
port new cuda specific code
delock Jan 8, 2023
94253d4
revert changes in __init__.py since new mechanism no longer requires …
delock Jan 8, 2023
2acad48
Merge branch 'up-master' into gma/device-abstraction
delock Jan 27, 2023
77af66a
use old op builder interface
delock Jan 27, 2023
8ec0905
Merge branch 'up-master' into gma/device-abstraction
delock Jan 27, 2023
bd9d275
remove bypass code in set_accelerator_visible
delock Jan 27, 2023
f1e75ff
revert changes in quantizer according to latest op builder interface
delock Jan 27, 2023
9860282
Merge branch 'master' into gma/device-abstraction
delock Jan 30, 2023
c26da46
port additional torch.cuda code in deepspeed
delock Jan 27, 2023
cb46cf4
Merge branch 'master' into gma/device-abstraction
delock Jan 31, 2023
b74a47c
Merge branch 'master' into gma/device-abstraction
delock Feb 3, 2023
6e55729
Merge branch 'master' into gma/device-abstraction
delock Feb 6, 2023
3c186d2
Merge branch 'master' into gma/device-abstraction
delock Feb 7, 2023
667c878
follow comments
delock Feb 9, 2023
d693dad
Merge branch 'up-master' into gma/device-abstraction
delock Feb 9, 2023
7a9e7ea
fix format
delock Feb 9, 2023
538148b
fix new code with cuda specific code
delock Feb 9, 2023
af8cee2
Merge branch 'master' into gma/device-abstraction
delock Feb 11, 2023
3dd816c
Merge branch 'master' into gma/device-abstraction
delock Feb 15, 2023
abf31b6
Merge branch 'master' into gma/device-abstraction
delock Feb 17, 2023
9539def
Merge branch 'master' into gma/device-abstraction
delock Feb 20, 2023
6ac4de4
Merge branch 'master' into gma/device-abstraction
delock Feb 22, 2023
b551304
Merge branch 'master' into gma/device-abstraction
delock Feb 22, 2023
238dc1e
port cuda specific code in module injection
delock Feb 23, 2023
da254d7
Merge branch 'master' into gma/device-abstraction
delock Feb 24, 2023
33ace54
Merge branch 'master' into gma/device-abstraction
delock Feb 26, 2023
3d572bb
Merge branch 'up-master' into gma/device-abstraction
delock Mar 1, 2023
4f9f6c2
add licensing message
delock Mar 1, 2023
e92fd92
Merge branch 'master' into gma/device-abstraction
delock Mar 2, 2023
136ba27
Merge branch 'master' into gma/device-abstraction
tjruwase Mar 7, 2023
9569b46
Merge branch 'master' into gma/device-abstraction
jeffra Mar 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'up-master' into gma/device-abstraction
  • Loading branch information
delock committed Dec 1, 2022
commit 8f89c2b09f5910f2d09db403afa7795cde779657
50 changes: 41 additions & 9 deletions deepspeed/accelerator/abstract_accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,6 @@ class DeepSpeedAccelerator(ABC):
def __init__(self):
self._name = None
self._communication_backend_name = None
self.BFloat16Tensor = None
self.ByteTensor = None
self.DoubleTensor = None
self.FloatTensor = None
self.HalfTensor = None
self.IntTensor = None
self.LongTensor = None

# Device APIs
@abc.abstractmethod
Expand All @@ -24,7 +17,7 @@ def device(self, device_index):
...

@abc.abstractmethod
def set_device(self):
def set_device(self, device_index):
...

@abc.abstractmethod
Expand Down Expand Up @@ -65,7 +58,7 @@ def manual_seed_all(self, seed):
...

@abc.abstractmethod
def initial_seed(self):
def initial_seed(self, seed):
...

@abc.abstractmethod
Expand Down Expand Up @@ -181,6 +174,41 @@ def communication_backend_name(self):
...

# Tensor operations
@property
@abc.abstractmethod
def BFloat16Tensor(self):
...

@property
@abc.abstractmethod
def ByteTensor(self):
...

@property
@abc.abstractmethod
def DoubleTensor(self):
...

@property
@abc.abstractmethod
def FloatTensor(self):
...

@property
@abc.abstractmethod
def HalfTensor(self):
...

@property
@abc.abstractmethod
def IntTensor(self):
...

@property
@abc.abstractmethod
def LongTensor(self):
...

@abc.abstractmethod
def pin_memory(self, tensor):
...
Expand All @@ -189,6 +217,10 @@ def pin_memory(self, tensor):
def on_accelerator(self, tensor):
...

@abc.abstractmethod
def op_builder_dir(self):
...

@abc.abstractmethod
def create_op_builder(self, class_name):
...
Expand Down
39 changes: 32 additions & 7 deletions deepspeed/accelerator/cuda_accelerator.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,13 +6,6 @@ class CUDA_Accelerator(DeepSpeedAccelerator):
def __init__(self):
self._name = 'cuda'
self._communication_backend_name = 'nccl'
self.DoubleTensor = torch.cuda.DoubleTensor
self.LongTensor = torch.cuda.LongTensor
self.FloatTensor = torch.cuda.FloatTensor
self.BFloat16Tensor = torch.cuda.BFloat16Tensor
self.HalfTensor = torch.cuda.HalfTensor
self.IntTensor = torch.cuda.IntTensor
self.ByteTensor = torch.cuda.ByteTensor

# Device APIs
def device_name(self, device_index=None):
Expand Down Expand Up @@ -161,6 +154,35 @@ def communication_backend_name(self):
return self._communication_backend_name

# Tensor operations

@property
def BFloat16Tensor(self):
return torch.cuda.BFloat16Tensor

@property
def ByteTensor(self):
return torch.cuda.ByteTensor

@property
def DoubleTensor(self):
return torch.cuda.DoubleTensor

@property
def FloatTensor(self):
return torch.cuda.FloatTensor

@property
def HalfTensor(self):
return torch.cuda.HalfTensor

@property
def IntTensor(self):
return torch.cuda.IntTensor

@property
def LongTensor(self):
return torch.cuda.LongTensor

def pin_memory(self, tensor):
return tensor.pin_memory()

Expand All @@ -171,6 +193,9 @@ def on_accelerator(self, tensor):
else:
return False

def op_builder_dir(self):
return "deepspeed.ops.op_builder"

def create_op_builder(self, class_name):
from deepspeed.ops.op_builder import AsyncIOBuilder, CPUAdagradBuilder, CPUAdamBuilder, FusedAdamBuilder, FusedLambBuilder, QuantizerBuilder, SparseAttnBuilder, StochasticTransformerBuilder, TransformerBuilder, InferenceBuilder, UtilsBuilder
from deepspeed.ops.op_builder.builder_names import AsyncIOBuilder as AsyncIOBuilderName
Expand Down
10 changes: 5 additions & 5 deletions deepspeed/inference/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ def __init__(self, model, config):
# This is a hack to remove the prepare_mask function on HF side for BLOOM architecture
self.remove_mask_prepare_for_bloom()

if get_accelerator().device_name() == 'cuda' and enable_cuda_graph:
if get_accelerator().device_name() == 'cuda' and config.enable_cuda_graph:
assert pkg_version.parse(torch.__version__) >= pkg_version.parse("1.10"), \
"If you want to use cuda graph, please upgrade torch to at least v1.10"

Expand Down Expand Up @@ -127,7 +127,7 @@ def __init__(self, model, config):
device = get_accelerator().current_device_name()
self.module.to(device)

if self.mp_world_size > 1:
if config.tensor_parallel.tp_size > 1:
_rng_state = get_accelerator().get_rng_state().to(
get_accelerator().current_device_name())
dist.broadcast(_rng_state, 0)
Expand Down Expand Up @@ -514,11 +514,11 @@ def forward(self, *inputs, **kwargs):
"""
start = None
if self.model_profile_enabled and get_accelerator().device_name(
) == 'cuda' and self.enable_cuda_graph:
) == 'cuda' and self._config.enable_cuda_graph:
get_accelerator().synchronize()
start = time.time()

if get_accelerator().device_name() == 'cuda' and self.enable_cuda_graph:
if get_accelerator().device_name() == 'cuda' and self._config.enable_cuda_graph:
if self.cuda_graph_created:
outputs = self._graph_replay(*inputs, **kwargs)
else:
Expand All @@ -527,7 +527,7 @@ def forward(self, *inputs, **kwargs):
else:
outputs = self.module(*inputs, **kwargs)

if self.model_profile_enabled and self.enable_cuda_graph:
if self.model_profile_enabled and self._config.enable_cuda_graph:
get_accelerator().synchronize()
duration = time.time() - start
self._model_times.append(duration)
Expand Down
2 changes: 1 addition & 1 deletion deepspeed/module_inject/replace_module.py
Original file line number Diff line number Diff line change
Expand Up @@ -608,7 +608,7 @@ def _transpose(x):
get_accelerator().current_device_name())
new_module.attn_nb.data = attn_nb.to(
get_accelerator().current_device_name())
if moe_type == 'residual':
if config.moe_type == 'residual':
new_module.res_mlp.inter_w.data = _res_h4h_w.to(
get_accelerator().current_device_name())
new_module.res_mlp.inter_b.data = _res_h4h_b.to(
Expand Down
2 changes: 2 additions & 0 deletions deepspeed/runtime/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,8 @@
from deepspeed.accelerator import get_accelerator
from deepspeed.ops.op_builder.builder_names import UtilsBuilder

from deepspeed.inference.config import DtypeEnum

# Set to torch's distributed package or deepspeed.comm based inside DeepSpeedEngine init
dist = None

Expand Down
10 changes: 4 additions & 6 deletions deepspeed/utils/timer.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,12 +195,10 @@ def stop(self, report_speed=True):

curr_samples_sec = (self.batch_size * self.num_workers) / duration

if self.local_step_count % self.steps_per_output == 0:
if report_speed:
self.logging(
"{}/{}, RunningAvgSamplesPerSec={}, CurrSamplesPerSec={}, MemAllocated={}GB, MaxMemAllocated={}GB"
.format(
self.epoch_count,
if report_speed:
self.logging(
"{}/{}, RunningAvgSamplesPerSec={}, CurrSamplesPerSec={}, MemAllocated={}GB, MaxMemAllocated={}GB"
.format(self.epoch_count,
self.local_step_count,
self.avg_samples_per_sec(),
curr_samples_sec,
Expand Down
35 changes: 19 additions & 16 deletions op_builder/all_ops.py
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
"""
Copyright 2020 The Microsoft DeepSpeed Team
"""
import os
import pkgutil
import importlib
from deepspeed.accelerator import get_accelerator
from deepspeed.ops.op_builder.builder_names import CPUAdamBuilder, CPUAdagradBuilder, FusedAdamBuilder, FusedLambBuilder, SparseAttnBuilder, TransformerBuilder, StochasticTransformerBuilder, AsyncIOBuilder, UtilsBuilder, QuantizerBuilder, InferenceBuilder, SpatialInferenceBuilder

# TODO: infer this list instead of hard coded
# List of all available ops
__op_builders__ = [
get_accelerator().create_op_builder(CPUAdamBuilder),
get_accelerator().create_op_builder(CPUAdagradBuilder),
get_accelerator().create_op_builder(FusedAdamBuilder),
get_accelerator().create_op_builder(FusedLambBuilder),
get_accelerator().create_op_builder(SparseAttnBuilder),
get_accelerator().create_op_builder(TransformerBuilder),
get_accelerator().create_op_builder(StochasticTransformerBuilder),
get_accelerator().create_op_builder(AsyncIOBuilder),
get_accelerator().create_op_builder(UtilsBuilder),
get_accelerator().create_op_builder(QuantizerBuilder),
get_accelerator().create_op_builder(InferenceBuilder),
get_accelerator().create_op_builder(SpatialInferenceBuilder)
]

# reflect all builder names into __op_builders__
op_builder_dir = get_accelerator().op_builder_dir()
op_builder_module = importlib.import_module(op_builder_dir)
__op_builders__ = []

for _, module_name, _ in pkgutil.iter_modules([os.path.dirname(op_builder_module.__file__)]):
# avoid self references
if module_name != 'all_ops' and module_name != 'builder' and module_name != 'builder_names':
module = importlib.import_module("{}.{}".format(op_builder_dir, module_name))
for member_name in module.__dir__():
if member_name.endswith('Builder'):
# append builder to __op_builders__ list
builder = get_accelerator().create_op_builder(member_name)
__op_builders__.append(builder)

ALL_OPS = {op.name: op for op in __op_builders__ if op is not None}
38 changes: 25 additions & 13 deletions op_builder/builder_names.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,25 @@
CPUAdamBuilder = "CPUAdamBuilder"
CPUAdagradBuilder = "CPUAdagradBuilder"
FusedAdamBuilder = "FusedAdamBuilder"
FusedLambBuilder = "FusedLambBuilder"
SparseAttnBuilder = "SparseAttnBuilder"
TransformerBuilder = "TransformerBuilder"
StochasticTransformerBuilder = "StochasticTransformerBuilder"
AsyncIOBuilder = "AsyncIOBuilder"
UtilsBuilder = "UtilsBuilder"
QuantizerBuilder = "QuantizerBuilder"
InferenceBuilder = "InferenceBuilder"
InferenceSpecializedBuilder = "InferenceSpecializedBuilder"
SpatialInferenceBuilder = "SpatialInferenceBuilder"
import sys
import os
import pkgutil
import importlib

# List of all available op builders from deepspeed op_builder

op_builder_dir = "deepspeed.ops.op_builder"
op_builder_module = importlib.import_module(op_builder_dir)
__op_builders__ = []

this_module = sys.modules[__name__]

# reflect all builder names into variable definition such as 'TransformerBuilder = "TransformerBuilder"'
for _, module_name, _ in pkgutil.iter_modules([os.path.dirname(op_builder_module.__file__)]):
# avoid self references
if module_name != 'all_ops' and module_name != 'builder' and module_name != 'builder_names':
module = importlib.import_module("{}.{}".format(op_builder_dir, module_name))
for member_name in module.__dir__():
if member_name.endswith(
'Builder'
) and member_name != "OpBuilder" and member_name != "CUDAOpBuilder":
# assign builder name to variable with same name
# the following is equivalent to i.e. TransformerBuilder = "TransformerBuilder"
this_module.__dict__[member_name] = member_name
You are viewing a condensed version of this merge commit. You can view the full changes here.