Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DeepSpeed Communication Profiling and Logging #2012

Merged
merged 54 commits into from
Jul 25, 2022
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
867a853
Staging comms v1 (#301)
Quentin-Anthony May 27, 2022
c93fcfe
Delete stage1.py
awan-10 May 27, 2022
7f8ca01
Delete distributed.py
awan-10 May 27, 2022
977ee32
revert deepspeed/__init__.py logging calls
Quentin-Anthony May 28, 2022
68eb9f4
Delete test.py
Quentin-Anthony May 28, 2022
54796bb
Update comments and move custom comm ops to internal functions
Quentin-Anthony May 28, 2022
c06c72d
Merge branch 'staging-comms-next' of https://github.com/microsoft/Dee…
Quentin-Anthony May 28, 2022
f070a0c
Remove unnecessary print and update backend description
Quentin-Anthony May 28, 2022
9976681
Relax assertion to allow Megatron-DeepSpeed MoE to use ZeRO 1
Quentin-Anthony May 31, 2022
09063a3
Simplify ZeRO stage 1 check for previous commit
Quentin-Anthony May 31, 2022
656b415
Remove misleading world_size prints
Quentin-Anthony May 31, 2022
2e7129c
Add commslogger class, and introduce rough prototype comms logging
Quentin-Anthony Jun 1, 2022
0023b3e
Clean up logger
Quentin-Anthony Jun 1, 2022
e55c8e9
Add more robust arg checks
Quentin-Anthony Jun 3, 2022
31c7dcf
Add labels to common collective calls for logger
Quentin-Anthony Jun 3, 2022
8e23f50
Add more annotations
Quentin-Anthony Jun 3, 2022
7998350
Fix up log_summary_new and fix logging bug for barrier
Quentin-Anthony Jun 7, 2022
227874e
Clean up arg sweep logic and add isend/irecv
Quentin-Anthony Jun 7, 2022
27c38f9
Merge branch 'master' into staging-comms-logging-v1
Quentin-Anthony Jun 13, 2022
26e15ae
Clean up logging branch
Quentin-Anthony Jun 13, 2022
3aa3e38
Unify naming and fix circular import
Quentin-Anthony Jun 13, 2022
d2561dc
Fix deepspeed comm imports for logging.py
Quentin-Anthony Jun 13, 2022
c85f3c1
Added comms config support, removed some log names
Quentin-Anthony Jun 14, 2022
f70addb
Add comms config file
Quentin-Anthony Jun 14, 2022
a153331
Add pydantic to requirements
Quentin-Anthony Jun 14, 2022
351f384
Add configure non-op to old torch
Quentin-Anthony Jun 14, 2022
bcb3afd
Update logging call for old torch
Quentin-Anthony Jun 14, 2022
2f8320a
Add log_name placeholder args for old torch
Quentin-Anthony Jun 14, 2022
95aa7d8
Add basic verbosity setup
Quentin-Anthony Jun 15, 2022
93d1a31
Complete verbosity setup
Quentin-Anthony Jun 18, 2022
4a6236d
move comms logging to separate file and clean up
Quentin-Anthony Jun 18, 2022
393c90a
Change debug message design
Quentin-Anthony Jun 25, 2022
527d1c8
refactor debug helper and clean up
Quentin-Anthony Jun 25, 2022
40482a8
Refactor a bit and clean up prints
Quentin-Anthony Jun 25, 2022
a6beecf
Merge branch 'master' into staging-comms-logging-v1
Quentin-Anthony Jun 25, 2022
9343f87
config docs, remove old log_summary func, fix imports
Quentin-Anthony Jun 25, 2022
c07bc13
Finished docs, added import, fixed non-debug calls
Quentin-Anthony Jun 25, 2022
f5fd1f2
Ran pre-commit
Quentin-Anthony Jun 25, 2022
1b31798
Removed old comments
Quentin-Anthony Jun 25, 2022
298349d
Updated fn signatures for torch1.2
Quentin-Anthony Jun 27, 2022
102ae1d
Remove lingering prof arg
Quentin-Anthony Jun 27, 2022
2185f16
Merge branch 'master' into staging-comms-logging-v1
jeffra Jun 29, 2022
4faf3b9
Update logging tutorial
Quentin-Anthony Jun 29, 2022
6381187
Removed unnecessary imports and cleaned up comments
Quentin-Anthony Jun 30, 2022
56dbd71
Take master's cleaner comms init logic
Quentin-Anthony Jun 30, 2022
ae524f0
Fixed bw calculations and made all logging calls blocking
Quentin-Anthony Jul 20, 2022
19bcf79
Added comms logging synch disclaimer
Quentin-Anthony Jul 20, 2022
b9cb4d3
Merge branch 'master' into staging-comms-logging-v1
Quentin-Anthony Jul 21, 2022
c6925a1
Added using_mpi flag for logging
Quentin-Anthony Jul 22, 2022
5a0715c
Formatting
Quentin-Anthony Jul 22, 2022
b4449a2
Merge branch 'master' of https://github.com/microsoft/DeepSpeed into …
Quentin-Anthony Jul 22, 2022
b648979
Merge branch 'master' into staging-comms-logging-v1
Quentin-Anthony Jul 22, 2022
9357a16
Merge branch 'master' into staging-comms-logging-v1
Quentin-Anthony Jul 25, 2022
c85e323
Merge branch 'master' into staging-comms-logging-v1
Quentin-Anthony Jul 25, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Add more robust arg checks
  • Loading branch information
Quentin-Anthony committed Jun 3, 2022
commit e55c8e937dcaab6c5394ecf704590b162f372e19
25 changes: 16 additions & 9 deletions deepspeed/comm/comm.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,7 +87,7 @@ class ReduceOp(Enum):
# - Global profiling (profile all comms)
# - Op-type profiling (e.g. profile all all_reduce comms)
# - Op profiling (e.g. profile a specific all_reduce op)
prof_all = False
prof_all = True
prof_op = None


Expand Down Expand Up @@ -121,17 +121,24 @@ def log_wrapper(*args, **kwargs):
# Need func args and their defaults
func_args = get_default_args(func)
func_args.update(kwargs)
tensor_pos = get_tensor_position(func)
if len(args) > 0:
tensor_pos = get_tensor_position(func)
tensor_arg = args[tensor_pos]
else:
if len(kwargs) > 0:
tensor_arg = func_args['tensor']
# Get size of tensor to be communicated
# set msg_size = 0 for barrier
msg_size = 0
# Sum of tensor sizes for list colls
if type(args[tensor_pos]) is list:
msg_size = sum(x.element_size() * x.nelement()
for x in func_args['tensor_list'])
# msg_size = tensor size for most colls
if len(kwargs) == 0:
msg_size = 0
else:
msg_size = args[tensor_pos].element_size() * args[tensor_pos].nelement()
# Sum of tensor sizes for list colls
if type(tensor_arg) is list:
msg_size = sum(x.element_size() * x.nelement()
for x in func_args['tensor_list'])
# msg_size = tensor size for most colls
else:
msg_size = tensor_arg.element_size() * tensor_arg.nelement()
# Start the timer if arg is set or it's a default
if func_args['prof'] or prof_all or func_args['log_name'] == prof_op:
timers(func_args['log_name']).start()
Expand Down
5 changes: 4 additions & 1 deletion deepspeed/comm/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,4 +103,7 @@ def get_tensor_position(func):
# all_to_all and torch multiGPU colls
elif 'input_tensor_list' in sig_params:
arg = 'input_tensor_list'
return list(sig_params).index(arg)
if arg is None:
return -1
else:
return list(sig_params).index(arg)
10 changes: 5 additions & 5 deletions deepspeed/runtime/zero/stage3.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,11 +102,11 @@ def _apply_to_tensors_only(module, functional, backward_function, outputs):
elif type(outputs) is torch.Tensor:
return functional.apply(module, backward_function, outputs)
else:
if not is_builtin_type(outputs):
logger.warning(
f"A module has unknown inputs or outputs type ({type(outputs)}) and the tensors embedded in it cannot be detected. "
"The ZeRO-3 hooks designed to trigger before or after backward pass of the module relies on knowing the input and "
"output tensors and therefore may not get triggered properly.")
#if not is_builtin_type(outputs):
# logger.warning(
# f"A module has unknown inputs or outputs type ({type(outputs)}) and the tensors embedded in it cannot be detected. "
# "The ZeRO-3 hooks designed to trigger before or after backward pass of the module relies on knowing the input and "
# "output tensors and therefore may not get triggered properly.")
return outputs


Expand Down
4 changes: 2 additions & 2 deletions tests/comm/all_reduce.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
def timed_allreduce(mat):
torch.cuda.synchronize()
pre = time.perf_counter()
dist.all_reduce(mat)
dist.all_reduce(tensor=mat)
#print('ignore me', mat[0][0]) # required due to lazy evaluation
torch.cuda.synchronize()
duration = time.perf_counter() - pre
Expand Down Expand Up @@ -62,7 +62,7 @@ def init_processes(fn, backend='nccl', use_deepspeed=False):
print(f'local rank = {dist.get_local_rank()}')
torch.cuda.set_device(dist.get_local_rank())
dist.start_profiling_comms()
dist.set_comms_log_verbose(True)
#dist.set_comms_log_verbose(True)
fn(dist.get_local_rank())


Expand Down