[BUG]: shardformer: pipeline forward error with customized layer distribution

### 🐛 Describe the bug

Hi, I am trying to implement a custom shard policy with different layer distribution, but it seems all built-in policies have the following inconsistent implementation:

In `get_held_layers()`, a policy uses `self.distribute_layers()` and `self.get_stage_index()`, which are customizable:
https://github.com/hpcaitech/ColossalAI/blob/79718fae04fc4461a35ae80ab87f52b64260f394/colossalai/shardformer/policies/gpt2.py#L170-L175

But in `set_pipeline_forward()`, the policy uses `Policy.distribute_layers()` and `Policy.get_stage_index()`:
https://github.com/hpcaitech/ColossalAI/blob/79718fae04fc4461a35ae80ab87f52b64260f394/colossalai/shardformer/policies/gpt2.py#L192-L193

which will raise an error during pipeline forward due to layer inconsistency if the functions are overridden.

How to reproduce

I tested with `examples/language/gpt/hybridparallelism/finetune.py`.
For `hybrid_parallel` plugin, add a custom policy:

```
 elif args.plugin == "hybrid_parallel":
        BATCH_SIZE = 128
        from colossalai.shardformer.policies.base_policy import Policy
        from colossalai.shardformer.policies.gpt2 import GPT2ForSequenceClassificationPolicy
        class CustomGPT2Policy(GPT2ForSequenceClassificationPolicy):
            @staticmethod
            def distribute_layers(num_layers: int, num_stages: int) -> List[int]:
                layers_per_stage = Policy.distribute_layers(num_layers - 4, num_stages)
                layers_per_stage[0] += 4
                return layers_per_stage

        plugin = HybridParallelPlugin(
            tp_size=1,
            pp_size=4,
            num_microbatches=None,
            microbatch_size=8,
            zero_stage=0,
            precision="fp16",
            initial_scale=1,
            custom_policy=CustomGPT2Policy(),
        )
```
which distributes layers in a slightly different way: first stage has 4 more layers.

This leads the following error:
```
...
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/gpt2/modeling_gpt2.py", line 312, in forward
    query, key, value = self.c_attn(hidden_states).split(self.split_size, dim=2)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/pytorch_utils.py", line 107, in forward
    x = torch.addmm(self.bias, x.view(-1, x.size(-1)), self.weight)
TypeError: addmm(): argument 'input' (position 1) must be Tensor, not NoneType
```

### Environment

torch 2.1.0 + cu118

	layers_per_stage = self.distribute_layers(len(module.h), stage_manager.num_stages)
	if stage_manager.is_first_stage():
	held_layers.append(module.wte)
	held_layers.append(module.wpe)
	held_layers.append(module.drop)
	start_idx, end_idx = self.get_stage_index(layers_per_stage, stage_manager.stage)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG]: shardformer: pipeline forward error with customized layer distribution #5187

🐛 Describe the bug

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	layers_per_stage = Policy.distribute_layers(len(module.h), stage_manager.num_stages)
	stage_index = Policy.get_stage_index(layers_per_stage, stage_manager.stage)

[BUG]: shardformer: pipeline forward error with customized layer distribution #5187

Description

🐛 Describe the bug

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions