Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] add device abstraction to allow other device than CUDA be used #2221

Merged
merged 86 commits into from
Mar 7, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
86 commits
Select commit Hold shift + click to select a range
0a849d5
[device abstraction] add device abstraction to allow other device tha…
delock Aug 16, 2022
e4f40f0
Merge branch '202208-base' into 202208
delock Aug 24, 2022
4a216ea
[rebase-202208] additional changes needed when rebase to 202208
delock Aug 24, 2022
2137642
Merge branch '20220824-base' into 20220824
delock Aug 24, 2022
089657e
[rebase] cleanup direct cuda usage after merge
delock Aug 24, 2022
d5a8424
[precommit] fix pre-commit issues
delock Aug 25, 2022
96d0765
Merge branch 'master' into gma/device-abstraction
tjruwase Aug 30, 2022
ac64c7a
[pin_memory] make pin_memory select device type
delock Sep 1, 2022
02c3a57
Merge branch 'master' into gma/device-abstraction
delock Sep 8, 2022
522b24b
[downstream] merge from xpu support downstream
delock Sep 9, 2022
a3b1e02
Merge branch 'master' into gma/device-abstraction
tjruwase Sep 12, 2022
4557c33
Merge branch 'master' into gma/device-abstraction
tjruwase Sep 13, 2022
2ef7d6c
Merge branch 'up-master' into gma/merge-upstream-20220921
delock Sep 21, 2022
9656321
[device] port cuda device to literal_device() in new tests
delock Sep 21, 2022
65729e3
[accel_runtime] add pin_memory to accelerator runtime interface.
delock Sep 22, 2022
f94d53e
[accelerator abstraction] merge from #2320
delock Sep 26, 2022
6005abe
Merge branch 'up-master' into gma/device-abstraction
delock Sep 26, 2022
31c0997
change call site of literal_device, on_accel_device and accel_runtime…
delock Oct 12, 2022
1785c26
add new interface definition from olruwase/accelerator_abstraction
delock Oct 12, 2022
17203a4
[accelerator abstraction] remove name() from interface, device_name()…
delock Oct 14, 2022
e8daea6
merge with master (ec13da6ba7cabc44bb4745a64a208b8580792954)
delock Oct 14, 2022
cfd23ed
Merge branch 'up-master' into gma/device-abstraction
delock Oct 14, 2022
13bbbdf
[OpBuilder] Add op builder abstraction
delock Oct 23, 2022
06e39a5
Merge branch 'up-master' into gma/device-abstraction
delock Oct 23, 2022
257490f
convert op builder usage in merged code
delock Oct 23, 2022
c93b999
[OpBuilder] add create_op_builder interface in abstract_accelerator.py
delock Oct 23, 2022
9858d42
[OpBuilder] fix op builder usage in tests
delock Oct 23, 2022
68ce006
[OpBuilder] fix <op builder>.NAME usage in tests to follow op builder…
delock Oct 23, 2022
4b62dab
import get_accelerator from deepspeed.accelerator directly
delock Oct 23, 2022
c5b2070
[OpBuilder] remove unused function and sync with main
delock Oct 23, 2022
9532843
add missing get_accelerator import
delock Oct 25, 2022
0729695
fix obsolete name in CPU Adam which should be create_op_builder
delock Oct 25, 2022
be517d8
fix create_op_builder calls
delock Oct 25, 2022
3af870f
fix misuse of new accelerator abstraction interface in tests
delock Oct 25, 2022
8fa64b9
Merge from downstream for bug fixing
delock Oct 28, 2022
4873538
merge from downstream
delock Nov 3, 2022
61b10b0
remove SYCL_KERNEL specific code
delock Nov 4, 2022
457d281
Merge branch 'up-master(9cfcf7431a02a)' into gma/device-abstraction
delock Nov 8, 2022
fea4604
Merge branch 'up-master(6f77da1bae506)' into gma/device-abstraction
delock Nov 10, 2022
f80a907
Merge branch 'up-master(3ca9878d8e92a)' into gma/device-abstraction
delock Nov 10, 2022
3b0b14c
merge from downstream for bugs fixes
delock Nov 10, 2022
b375e46
Merge branch 'up-master(be5ec506bd5219a)' into gma/device-abstraction
delock Nov 11, 2022
18b3c95
fix torch.cuda in new files
delock Nov 11, 2022
97695f5
use OpBuilder name symbol, improve env_report, fix typo, fix get_acce…
delock Nov 13, 2022
93e157b
Merge branch 'master' into gma/device-abstraction
tjruwase Nov 13, 2022
b1c5384
fix missing () in get_accelerator for ds_attention.py
delock Nov 14, 2022
91fb948
import deepspeed.accelerator.get_accelerator only when torch_availabl…
delock Nov 14, 2022
8f89c2b
Merge branch 'up-master' into gma/device-abstraction
delock Dec 1, 2022
26e628d
Change reference of InferenceSpecializedBuilder to name string, Infer…
delock Dec 1, 2022
91f5cb2
convert new code with CUDA references
delock Dec 1, 2022
5a1ae0e
remove unneeded get_accelerator import in op_builder/__init__.py
delock Dec 1, 2022
05842b6
[setup] fix build error when pytorch is not installed in environment
delock Dec 1, 2022
24d2b38
Handle the case when torch is not installed during deepspeed installa…
delock Dec 1, 2022
c26e5d4
Merge branch 'master' into gma/device-abstraction
tjruwase Dec 2, 2022
4116ba5
Merge branch 'up-master' into gma/device-abstraction
delock Jan 8, 2023
bea648f
port new cuda specific code
delock Jan 8, 2023
94253d4
revert changes in __init__.py since new mechanism no longer requires …
delock Jan 8, 2023
2acad48
Merge branch 'up-master' into gma/device-abstraction
delock Jan 27, 2023
77af66a
use old op builder interface
delock Jan 27, 2023
8ec0905
Merge branch 'up-master' into gma/device-abstraction
delock Jan 27, 2023
bd9d275
remove bypass code in set_accelerator_visible
delock Jan 27, 2023
f1e75ff
revert changes in quantizer according to latest op builder interface
delock Jan 27, 2023
9860282
Merge branch 'master' into gma/device-abstraction
delock Jan 30, 2023
c26da46
port additional torch.cuda code in deepspeed
delock Jan 27, 2023
cb46cf4
Merge branch 'master' into gma/device-abstraction
delock Jan 31, 2023
b74a47c
Merge branch 'master' into gma/device-abstraction
delock Feb 3, 2023
6e55729
Merge branch 'master' into gma/device-abstraction
delock Feb 6, 2023
3c186d2
Merge branch 'master' into gma/device-abstraction
delock Feb 7, 2023
667c878
follow comments
delock Feb 9, 2023
d693dad
Merge branch 'up-master' into gma/device-abstraction
delock Feb 9, 2023
7a9e7ea
fix format
delock Feb 9, 2023
538148b
fix new code with cuda specific code
delock Feb 9, 2023
af8cee2
Merge branch 'master' into gma/device-abstraction
delock Feb 11, 2023
3dd816c
Merge branch 'master' into gma/device-abstraction
delock Feb 15, 2023
abf31b6
Merge branch 'master' into gma/device-abstraction
delock Feb 17, 2023
9539def
Merge branch 'master' into gma/device-abstraction
delock Feb 20, 2023
6ac4de4
Merge branch 'master' into gma/device-abstraction
delock Feb 22, 2023
b551304
Merge branch 'master' into gma/device-abstraction
delock Feb 22, 2023
238dc1e
port cuda specific code in module injection
delock Feb 23, 2023
da254d7
Merge branch 'master' into gma/device-abstraction
delock Feb 24, 2023
33ace54
Merge branch 'master' into gma/device-abstraction
delock Feb 26, 2023
3d572bb
Merge branch 'up-master' into gma/device-abstraction
delock Mar 1, 2023
4f9f6c2
add licensing message
delock Mar 1, 2023
e92fd92
Merge branch 'master' into gma/device-abstraction
delock Mar 2, 2023
136ba27
Merge branch 'master' into gma/device-abstraction
tjruwase Mar 7, 2023
9569b46
Merge branch 'master' into gma/device-abstraction
jeffra Mar 7, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
[accelerator abstraction] merge from #2320
  • Loading branch information
delock committed Sep 26, 2022
commit f94d53e74ded5558ba04f50a7b619b2d8ce0817f
1 change: 1 addition & 0 deletions deepspeed/accelerator/__init__.py
Original file line number Diff line number Diff line change
@@ -1 +1,2 @@
from .device import literal_device, on_accel_device
from .abstract_accelerator import DeepSpeedAccelerator
149 changes: 149 additions & 0 deletions deepspeed/accelerator/abstract_accelerator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,149 @@
import abc
from abc import ABC


class DeepSpeedAccelerator(ABC):
def __init__(self):
self.name = None
self.communication_backend = None
self.BFloat16Tensor = None
self.ByteTensor = None
self.DoubleTensor = None
self.FloatTensor = None
self.HalfTensor = None
self.IntTensor = None
self.LongTensor = None

# Device APIs
@abc.abstractmethod
def device(self, device_index):
...

@abc.abstractmethod
def set_device(self):
...

@abc.abstractmethod
def current_device(self):
...

@abc.abstractmethod
def device_count(self):
...

@abc.abstractmethod
def synchronize(self, device_index=None):
...

# RNG APIs
@abc.abstractmethod
def set_rng_state(self, new_state, device_index=None):
...

@abc.abstractmethod
def get_rng_state(self, device_index=None):
...

@abc.abstractmethod
def manual_seed(self, seed):
...

@abc.abstractmethod
def manual_seed_all(self, seed):
...

@abc.abstractmethod
def initial_seed(self):
...

@abc.abstractmethod
def default_generator(self, device_index):
...

# Streams/Events
@abc.abstractmethod
def Stream(self, device_index=None, priority=0, **kwargs):
...

@abc.abstractmethod
def StreamContext(self, stream):
...

@abc.abstractmethod
def current_stream(self, device_index=None):
...

@abc.abstractmethod
def default_stream(self, device_index=None):
...

@abc.abstractmethod
def Event(self, **kwargs):
...

# Memory management
@abc.abstractmethod
def empty_cache(self):
...

@abc.abstractmethod
def memory_allocated(self, device_index=None):
...

@abc.abstractmethod
def max_memory_allocated(self, device_index=None):
...

@abc.abstractmethod
def reset_max_memory_allocated(self, device_index=None):
...

@abc.abstractmethod
def reset_max_memory_cached(self, device_index=None):
...

@abc.abstractmethod
def memory_stats(self, device_index=None):
...

@abc.abstractmethod
def reset_peak_memory_stats(self, device_index=None):
...

@abc.abstractmethod
def memory_reserved(self, device_index=None):
...

@abc.abstractmethod
def max_memory_reserved(self, device_index=None):
...

@abc.abstractmethod
def total_memory(self, device_index=None):
...

# Misc
@abc.abstractmethod
def is_available(self):
...

@abc.abstractmethod
def range_push(self, msg):
...

@abc.abstractmethod
def range_pop(self, msg):
...

@abc.abstractmethod
def lazy_call(self, callback):
...

# Data types
@abc.abstractmethod
def is_bf16_supported(self):
...

@abc.abstractmethod
def is_fp16_supported(self):
...
123 changes: 123 additions & 0 deletions deepspeed/accelerator/cuda_accelerator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
from deepspeed.accelerator.abstract_accelerator import DeepSpeedAccelerator
import torch.cuda


class CUDA_Accelerator(DeepSpeedAccelerator):
def __init__(self):
self.name = 'cuda'
self.communication_backend = 'nccl'
self.DoubleTensor = torch.cuda.DoubleTensor
self.LongTensor = torch.cuda.LongTensor
self.FloatTensor = torch.cuda.FloatTensor
self.BFloat16Tensor = torch.cuda.BFloat16Tensor
self.HalfTensor = torch.cuda.HalfTensor
self.IntTensor = torch.cuda.IntTensor
self.ByteTensor = torch.cuda.ByteTensor

# Device APIs
def device(self, device_index=None):
return torch.cuda.device(device_index)

def set_device(self, device_index):
torch.cuda.set_device(device_index)

def current_device(self):
return torch.cuda.current_device()

def device_count(self):
return torch.cuda.device_count()

def synchronize(self, device_index=None):
return torch.cuda.synchronize(device_index)

# RNG APIs
def set_rng_state(self, new_state, device_index=None):
return torch.cuda.set_rng_state(new_state, device_index)

def get_rng_state(self, device_index=None):
return torch.cuda.get_rng_state(device_index)

def manual_seed(self, seed):
return torch.cuda.manual_seed(seed)

def manual_seed_all(self, seed):
return torch.cuda.manual_seed_all(seed)

def initial_seed(self, seed):
return torch.cuda.initial_seed(seed)

def default_generator(self, device_index):
return torch.cuda.default_generators[device_index]

# Streams/Events
def Stream(self, device_index=None, priority=0, **kwargs):
return torch.cuda.Stream(device_index, priority, **kwargs)

def StreamContext(self, stream):
return torch.cuda.StreamContext(stream)

def current_stream(self, device_index=None):
return torch.cuda.current_stream(device_index)

def default_stream(self, device_index=None):
return torch.cuda.default_stream(device_index)

def Event(self, **kwargs):
return torch.cuda.Event(**kwargs)

# Memory management
def empty_cache(self):
return torch.cuda.empty_cache()

def memory_allocated(self, device_index=None):
return torch.cuda.memory_allocated(device_index)

def max_memory_allocated(self, device_index=None):
return torch.cuda.max_memory_allocated(device_index)

def reset_max_memory_allocated(self, device_index=None):
return torch.cuda.reset_max_memory_allocated(device_index)

def reset_max_memory_cached(self, device_index=None):
return torch.cuda.reset_max_memory_cached(device_index)

def memory_stats(self, device_index=None):
if hasattr(torch.cuda, 'memory_stats'):
return torch.cuda.memory_stats(device_index)

def reset_peak_memory_stats(self, device_index=None):
if hasattr(torch.cuda, 'reset_peak_memory_stats'):
return torch.cuda.reset_peak_memory_stats(device_index)

def memory_reserved(self, device_index=None):
if hasattr(torch.cuda, 'memory_reserved'):
return torch.cuda.memory_reserved(device_index)

def max_memory_reserved(self, device_index=None):
if hasattr(torch.cuda, 'max_memory_reserved'):
return torch.cuda.max_memory_reserved(device_index)

def total_memory(self, device_index=None):
return torch.cuda.get_device_properties(device_index).total_memory

# Misc
def is_available(self):
return torch.cuda.is_available()

def range_push(self, msg):
if hasattr(torch.cuda.nvtx, 'range_push'):
return torch.cuda.nvtx.range_push(msg)

def range_pop(self, msg):
if hasattr(torch.cuda.nvtx, 'range_pop'):
return torch.cuda.nvtx.range_pop(msg)

def lazy_call(self, callback):
return torch.cuda._lazy_call(callback)

# Data types
def is_bf16_supported(self):
return torch.cuda.is_bf16_supported()

def is_fp16_supported(self):
return torch.cuda.is_fp16_supported()
71 changes: 71 additions & 0 deletions deepspeed/accelerator/real_accelerator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
from .abstract_accelerator import DeepSpeedAccelerator

ds_accelerator = None


def _validate_accelerator(accel_obj):
assert isinstance(accel_obj, DeepSpeedAccelerator), \
f'{accel_obj.__class__.__name__} accelerator is not subclass of DeepSpeedAccelerator'

assert accel_obj.is_available(), \
f'{accel_obj.__class__.__name__} accelerator fails is_available() test'


def get_accelerator():
global ds_accelerator
if ds_accelerator is None:
from deepspeed.accelerator.cuda_accelerator import CUDA_Accelerator
ds_accelerator = CUDA_Accelerator()
_validate_accelerator(ds_accelerator)
return ds_accelerator


def set_accelerator(accel_obj):
global ds_accelerator
_validate_accelerator(accel_obj)
ds_accelerator = accel_obj


'''
-----------[code] test_get.py -----------
from deepspeed.accelerator.real_accelerator import get_accelerator
my_accelerator = get_accelerator()
print(f'{my_accelerator.name=}')
print(f'{my_accelerator.communication_backend=}')
print(f'{my_accelerator.HalfTensor().device=}')
print(f'{my_accelerator.total_memory()=}')
-----------[code] test_get.py -----------

---[output] python test_get.py---------
my_accelerator.name='cuda'
my_accelerator.communication_backend='nccl'
my_accelerator.HalfTensor().device=device(type='cuda', index=0)
my_accelerator.total_memory()=34089730048
---[output] python test_get.py---------

**************************************************************************
-----------[code] test_set.py -----------
from deepspeed.accelerator.cuda_accelerator import CUDA_Accelerator
cu_accel = CUDA_Accelerator()
print(f'{id(cu_accel)=}')
from deepspeed.accelerator.real_accelerator import set_accelerator, get_accelerator
set_accelerator(cu_accel)

my_accelerator = get_accelerator()
print(f'{id(my_accelerator)=}')
print(f'{my_accelerator.name=}')
print(f'{my_accelerator.communication_backend=}')
print(f'{my_accelerator.HalfTensor().device=}')
print(f'{my_accelerator.total_memory()=}')
-----------[code] test_set.py -----------


---[output] python test_set.py---------
id(cu_accel)=139648165478304
my_accelerator=<deepspeed.accelerator.cuda_accelerator.CUDA_Accelerator object at 0x7f025f4bffa0>
my_accelerator.name='cuda'
my_accelerator.communication_backend='nccl'
my_accelerator.HalfTensor().device=device(type='cuda', index=0)
my_accelerator.total_memory()=34089730048
---[output] python test_set.py---------
'''