Skip to content

Commit 8f7ce94

Browse files
authored
[hotfix] fix auto tensor placement policy (#753)
1 parent 84c6700 commit 8f7ce94

File tree

2 files changed

+3
-3
lines changed

2 files changed

+3
-3
lines changed

colossalai/zero/sharded_model/sharded_model_v2.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,10 +53,9 @@ class ShardedModelV2(nn.Module):
5353
If it's 'cpu', parameters, gradients and optimizer states will be offloaded to CPU, which means min CUDA memory will be used.
5454
If it's 'cuda', they won't be offloaded, which means max CUDA memory will be used.
5555
If it's 'auto', they are moving dynamically based on CPU and CUDA memory usage. It will utilize heterogeneous memory space evenly and well.
56+
Note that 'auto' policy can only work well when no other processes use CUDA during your training.
5657
Defaults to 'cuda'.
57-
offload_config (Optional[dict], optional): We currently only support CPU offload. Set to `{"device": "cpu"}` to enable CPU offload. Defaults to None.
5858
gradient_predivide_factor (Optional[float], optional): Gradient is divived by this value before reduce-scatter. Defaults to 1.0.
59-
use_memory_tracer (bool, optional): Whether to use memoty tracer. Defaults to False.
6059
reuse_fp16_shard (bool, optional): Whether to reuse fp16 shard for param and grad.
6160
Enabling this can reduce GPU memory usage, but you have to make sure you disable it when using gradient accumulation.
6261
In this mode, grad will be fp16. Make sure your optimizer supports mixed precision (fp32 param and fp16 grad).

colossalai/zero/utils/tensor_placement_policy.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ class AutoTensorPlacementPolicy(TensorPlacementPolicy):
4545

4646
def __init__(self, mem_stats_collector: Optional[MemStatsCollector] = None) -> None:
4747
super().__init__(None, mem_stats_collector=mem_stats_collector)
48-
self._warmup_non_model_data_ratio: float = 0.2
48+
# model data will use 1-self._warmup_non_model_data_ratio CUDA memory in warmup phase
49+
self._warmup_non_model_data_ratio: float = 0.8
4950

5051
def evict_tensors(self,
5152
hold_cuda_tensor_list: List[StatefulTensor],

0 commit comments

Comments
 (0)