Skip to content

Conversation

@oahzxl
Copy link
Contributor

@oahzxl oahzxl commented Aug 17, 2023

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

refactor code to better adapt to llm

  1. remove abandoned usage like gpc
  2. only use tp and ep ffn
  3. add ckpt io for ffn
  4. add more functions in api

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@github-actions
Copy link
Contributor

The code coverage for the changed files is 84%.

Click me to view the complete report
Name                                                                    Stmts   Miss  Cover
-------------------------------------------------------------------------------------------
colossalai/amp/naive_amp/mixed_precision_optimizer.py                      98     20    80%
colossalai/booster/booster.py                                              66      9    86%
colossalai/booster/plugin/__init__.py                                      11      0   100%
colossalai/booster/plugin/hybrid_parallel_plugin.py                       152     17    89%
colossalai/booster/plugin/pp_plugin_base.py                                 9      1    89%
colossalai/cluster/__init__.py                                              5      0   100%
colossalai/cluster/process_group_mesh.py                                   72      1    99%
colossalai/context/moe_context.py                                          51     24    53%
colossalai/engine/gradient_handler/__init__.py                              6      0   100%
colossalai/interface/optimizer.py                                          45      5    89%
colossalai/kernel/cuda_native/__init__.py                                   5      0   100%
colossalai/lazy/lazy_init.py                                              315     44    86%
colossalai/nn/layer/moe/__init__.py                                         6      0   100%
colossalai/nn/layer/moe/checkpoint.py                                      21     15    29%
colossalai/nn/layer/moe/experts.py                                        108     89    18%
colossalai/nn/layer/moe/layers.py                                         100     80    20%
colossalai/nn/layer/moe/utils.py                                           31     16    48%
colossalai/pipeline/p2p.py                                                102      7    93%
colossalai/pipeline/schedule/__init__.py                                    3      0   100%
colossalai/pipeline/schedule/_utils.py                                     50      5    90%
colossalai/pipeline/schedule/base.py                                       10      1    90%
colossalai/pipeline/schedule/one_f_one_b.py                               116      4    97%
colossalai/pipeline/stage_manager.py                                       68      4    94%
colossalai/shardformer/_utils.py                                           54     15    72%
colossalai/shardformer/layer/__init__.py                                    8      0   100%
colossalai/shardformer/layer/embedding.py                                 130     24    82%
colossalai/shardformer/layer/linear.py                                    181     53    71%
colossalai/shardformer/layer/normalization.py                              51     10    80%
colossalai/shardformer/layer/qkv_fused_linear.py                          292     70    76%
colossalai/shardformer/layer/utils.py                                      84     17    80%
colossalai/shardformer/modeling/bert.py                                   431    128    70%
colossalai/shardformer/modeling/blip2.py                                   53      1    98%
colossalai/shardformer/modeling/bloom.py                                  387    122    68%
colossalai/shardformer/modeling/chatglm2_6b/configuration_chatglm.py       30      0   100%
colossalai/shardformer/modeling/chatglm2_6b/modeling_chatglm.py           571    239    58%
colossalai/shardformer/modeling/chatglm.py                                149     34    77%
colossalai/shardformer/modeling/gpt2.py                                   293     83    72%
colossalai/shardformer/modeling/jit.py                                     19      3    84%
colossalai/shardformer/modeling/llama.py                                  204     65    68%
colossalai/shardformer/modeling/opt.py                                    285     65    77%
colossalai/shardformer/modeling/sam.py                                     94      6    94%
colossalai/shardformer/modeling/t5.py                                     297     74    75%
colossalai/shardformer/modeling/vit.py                                    149     23    85%
colossalai/shardformer/modeling/whisper.py                                 95     13    86%
colossalai/shardformer/policies/auto_policy.py                             27      2    93%
colossalai/shardformer/policies/base_policy.py                             87     11    87%
colossalai/shardformer/policies/bert.py                                   257      0   100%
colossalai/shardformer/policies/blip2.py                                   54      2    96%
colossalai/shardformer/policies/bloom.py                                  151      2    99%
colossalai/shardformer/policies/chatglm.py                                100      6    94%
colossalai/shardformer/policies/gpt2.py                                   181      1    99%
colossalai/shardformer/policies/llama.py                                  114      3    97%
colossalai/shardformer/policies/opt.py                                    140      2    99%
colossalai/shardformer/policies/sam.py                                     32      0   100%
colossalai/shardformer/policies/t5.py                                     182      5    97%
colossalai/shardformer/policies/vit.py                                    108      1    99%
colossalai/shardformer/policies/whisper.py                                 61      2    97%
colossalai/shardformer/shard/shard_config.py                               28      0   100%
colossalai/shardformer/shard/sharder.py                                    95      3    97%
colossalai/shardformer/shard/shardformer.py                                15      0   100%
colossalai/shardformer/shard/utils.py                                      11      0   100%
colossalai/tensor/d_tensor/api.py                                         149     24    84%
colossalai/tensor/moe_tensor/api.py                                        15      5    67%
colossalai/tensor/moe_tensor/moe_info.py                                    8      5    38%
colossalai/zero/low_level/low_level_optim.py                              330     30    91%
tests/kit/model_zoo/transformers/__init__.py                               12      0   100%
tests/kit/model_zoo/transformers/bert.py                                   50      0   100%
tests/kit/model_zoo/transformers/blip2.py                                  21      0   100%
tests/kit/model_zoo/transformers/bloom.py                                  36      0   100%
tests/kit/model_zoo/transformers/chatglm.py                                20      0   100%
tests/kit/model_zoo/transformers/gpt.py                                    39      0   100%
tests/kit/model_zoo/transformers/opt.py                                    32      0   100%
tests/kit/model_zoo/transformers/sam.py                                    14      0   100%
tests/kit/model_zoo/transformers/t5.py                                     25      0   100%
tests/kit/model_zoo/transformers/vit.py                                    24      0   100%
tests/kit/model_zoo/transformers/whisper.py                                23      0   100%
tests/test_booster/test_plugin/test_3d_plugin.py                           64      7    89%
tests/test_booster/test_plugin/test_gemini_plugin.py                       74     10    86%
tests/test_cluster/test_process_group_mesh.py                              86      1    99%
tests/test_fx/test_tracer/test_hf_model/hf_tracer_utils.py                 21      2    90%
tests/test_fx/test_tracer/test_hf_model/test_hf_bert.py                    17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_gpt.py                     17      1    94%
tests/test_lazy/test_models.py                                             14      1    93%
tests/test_pipeline/test_p2p_communication.py                              44      1    98%
tests/test_pipeline/test_schedule/test_oneF_oneB.py                        80      2    98%
tests/test_pipeline/test_schedule/test_pipeline_schedule_utils.py          40      0   100%
tests/test_pipeline/test_stage_manager.py                                  52      1    98%
tests/test_shardformer/test_layer/test_embedding.py                        37      1    97%
tests/test_shardformer/test_layer/test_gpt2_qkv_fused_linear_1d.py         89      1    99%
tests/test_shardformer/test_layer/test_layernorm.py                        35      1    97%
tests/test_shardformer/test_layer/test_linear_1d.py                       110      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py              51      1    98%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py      39      1    97%
tests/test_shardformer/test_model/_utils.py                               142     21    85%
tests/test_shardformer/test_model/test_shard_bert.py                       62      1    98%
tests/test_shardformer/test_model/test_shard_blip2.py                      40      1    98%
tests/test_shardformer/test_model/test_shard_bloom.py                      59      1    98%
tests/test_shardformer/test_model/test_shard_chatglm.py                    60      1    98%
tests/test_shardformer/test_model/test_shard_gpt2.py                       65      1    98%
tests/test_shardformer/test_model/test_shard_llama.py                      62      1    98%
tests/test_shardformer/test_model/test_shard_opt.py                        62      1    98%
tests/test_shardformer/test_model/test_shard_sam.py                        39      1    97%
tests/test_shardformer/test_model/test_shard_t5.py                         59      1    98%
tests/test_shardformer/test_model/test_shard_vit.py                        61      1    98%
tests/test_shardformer/test_model/test_shard_whisper.py                    46      1    98%
tests/test_shardformer/test_shard_utils.py                                 21      0   100%
tests/test_shardformer/test_with_torch_ddp.py                              52      1    98%
tests/test_utils/test_flash_attention.py                                   92      8    91%
-------------------------------------------------------------------------------------------
TOTAL                                                                    9540   1565    84%

@github-actions
Copy link
Contributor

The code coverage for the changed files is 19%.

Click me to view the complete report
Name                                                                   Stmts   Miss  Cover
------------------------------------------------------------------------------------------
colossalai/amp/naive_amp/mixed_precision_optimizer.py                     98     80    18%
colossalai/booster/booster.py                                             66     18    73%
colossalai/booster/plugin/__init__.py                                     11      0   100%
colossalai/booster/plugin/hybrid_parallel_plugin.py                      152    103    32%
colossalai/booster/plugin/pp_plugin_base.py                                9      1    89%
colossalai/cluster/__init__.py                                             5      0   100%
colossalai/cluster/process_group_mesh.py                                  72     46    36%
colossalai/context/moe_context.py                                         53     26    51%
colossalai/engine/gradient_handler/__init__.py                             6      0   100%
colossalai/interface/optimizer.py                                         45     18    60%
colossalai/kernel/cuda_native/__init__.py                                  5      0   100%
colossalai/lazy/lazy_init.py                                             315    246    22%
colossalai/nn/layer/moe/__init__.py                                        6      0   100%
colossalai/nn/layer/moe/checkpoint.py                                     40     22    45%
colossalai/nn/layer/moe/experts.py                                        72     55    24%
colossalai/nn/layer/moe/layers.py                                        122     99    19%
colossalai/nn/layer/moe/routers.py                                       133    110    17%
colossalai/nn/layer/moe/utils.py                                          39     22    44%
colossalai/pipeline/p2p.py                                               102     81    21%
colossalai/pipeline/schedule/__init__.py                                   3      0   100%
colossalai/pipeline/schedule/_utils.py                                    50     38    24%
colossalai/pipeline/schedule/base.py                                      10      2    80%
colossalai/pipeline/schedule/one_f_one_b.py                              116     97    16%
colossalai/pipeline/stage_manager.py                                      68     46    32%
colossalai/shardformer/_utils.py                                          54     48    11%
colossalai/shardformer/layer/__init__.py                                   8      8     0%
colossalai/shardformer/layer/embedding.py                                130    130     0%
colossalai/shardformer/layer/linear.py                                   181    181     0%
colossalai/shardformer/layer/normalization.py                             51     51     0%
colossalai/shardformer/layer/qkv_fused_linear.py                         292    292     0%
colossalai/shardformer/layer/utils.py                                     84     84     0%
colossalai/shardformer/modeling/bert.py                                  431    431     0%
colossalai/shardformer/modeling/blip2.py                                  53     53     0%
colossalai/shardformer/modeling/bloom.py                                 387    387     0%
colossalai/shardformer/modeling/chatglm2_6b/configuration_chatglm.py      30      0   100%
colossalai/shardformer/modeling/chatglm2_6b/modeling_chatglm.py          571    240    58%
colossalai/shardformer/modeling/chatglm.py                               149    149     0%
colossalai/shardformer/modeling/gpt2.py                                  293    293     0%
colossalai/shardformer/modeling/jit.py                                    19     19     0%
colossalai/shardformer/modeling/llama.py                                 204    204     0%
colossalai/shardformer/modeling/opt.py                                   285    285     0%
colossalai/shardformer/modeling/sam.py                                    94     94     0%
colossalai/shardformer/modeling/t5.py                                    297    297     0%
colossalai/shardformer/modeling/vit.py                                   149    149     0%
colossalai/shardformer/modeling/whisper.py                                95     95     0%
colossalai/shardformer/policies/auto_policy.py                            27     14    48%
colossalai/shardformer/policies/base_policy.py                            87     41    53%
colossalai/shardformer/policies/bert.py                                  257    257     0%
colossalai/shardformer/policies/blip2.py                                  54     54     0%
colossalai/shardformer/policies/bloom.py                                 151    151     0%
colossalai/shardformer/policies/chatglm.py                               100    100     0%
colossalai/shardformer/policies/gpt2.py                                  181    181     0%
colossalai/shardformer/policies/llama.py                                 114    114     0%
colossalai/shardformer/policies/opt.py                                   140    140     0%
colossalai/shardformer/policies/sam.py                                    32     32     0%
colossalai/shardformer/policies/t5.py                                    182    182     0%
colossalai/shardformer/policies/vit.py                                   108    108     0%
colossalai/shardformer/policies/whisper.py                                61     61     0%
colossalai/shardformer/shard/shard_config.py                              28      9    68%
colossalai/shardformer/shard/sharder.py                                   95     70    26%
colossalai/shardformer/shard/shardformer.py                               15      5    67%
colossalai/shardformer/shard/utils.py                                     11      8    27%
colossalai/tensor/d_tensor/api.py                                        149    112    25%
colossalai/tensor/moe_tensor/api.py                                       20      7    65%
colossalai/tensor/moe_tensor/moe_info.py                                  10      7    30%
colossalai/zero/low_level/low_level_optim.py                             330     30    91%
tests/kit/model_zoo/transformers/__init__.py                              12      0   100%
tests/kit/model_zoo/transformers/bert.py                                  50      0   100%
tests/kit/model_zoo/transformers/blip2.py                                 21      0   100%
tests/kit/model_zoo/transformers/bloom.py                                 36      0   100%
tests/kit/model_zoo/transformers/chatglm.py                               20      0   100%
tests/kit/model_zoo/transformers/gpt.py                                   39      0   100%
tests/kit/model_zoo/transformers/opt.py                                   32      0   100%
tests/kit/model_zoo/transformers/sam.py                                   14      0   100%
tests/kit/model_zoo/transformers/t5.py                                    25      0   100%
tests/kit/model_zoo/transformers/vit.py                                   24      0   100%
tests/kit/model_zoo/transformers/whisper.py                               23      0   100%
tests/test_shardformer/test_model/_utils.py                              142    142     0%
tests/test_shardformer/test_model/test_shard_bert.py                      62     62     0%
tests/test_shardformer/test_model/test_shard_blip2.py                     40     40     0%
tests/test_shardformer/test_model/test_shard_bloom.py                     59     59     0%
tests/test_shardformer/test_model/test_shard_chatglm.py                   60     60     0%
tests/test_shardformer/test_model/test_shard_gpt2.py                      65     65     0%
tests/test_shardformer/test_model/test_shard_llama.py                     62     62     0%
tests/test_shardformer/test_model/test_shard_opt.py                       62     62     0%
tests/test_shardformer/test_model/test_shard_sam.py                       39     39     0%
tests/test_shardformer/test_model/test_shard_t5.py                        59     59     0%
tests/test_shardformer/test_model/test_shard_vit.py                       61     61     0%
tests/test_shardformer/test_model/test_shard_whisper.py                   46     46     0%
tests/test_shardformer/test_shard_utils.py                                21     21     0%
tests/test_shardformer/test_with_torch_ddp.py                             52     52     0%
------------------------------------------------------------------------------------------
TOTAL                                                                   8733   7113    19%

@github-actions
Copy link
Contributor

The code coverage for the changed files is 83%.

Click me to view the complete report
Name                                                                    Stmts   Miss  Cover
-------------------------------------------------------------------------------------------
colossalai/amp/naive_amp/mixed_precision_optimizer.py                      98     20    80%
colossalai/booster/booster.py                                              66      9    86%
colossalai/booster/plugin/__init__.py                                      11      0   100%
colossalai/booster/plugin/hybrid_parallel_plugin.py                       152     17    89%
colossalai/booster/plugin/pp_plugin_base.py                                 9      1    89%
colossalai/cluster/__init__.py                                              5      0   100%
colossalai/cluster/process_group_mesh.py                                   72      1    99%
colossalai/context/moe_context.py                                          53     26    51%
colossalai/engine/gradient_handler/__init__.py                              6      0   100%
colossalai/interface/optimizer.py                                          45      5    89%
colossalai/kernel/cuda_native/__init__.py                                   5      0   100%
colossalai/lazy/lazy_init.py                                              315     44    86%
colossalai/nn/layer/moe/__init__.py                                         6      0   100%
colossalai/nn/layer/moe/checkpoint.py                                      40     22    45%
colossalai/nn/layer/moe/experts.py                                         72     55    24%
colossalai/nn/layer/moe/layers.py                                         122     99    19%
colossalai/nn/layer/moe/routers.py                                        133    110    17%
colossalai/nn/layer/moe/utils.py                                           39     22    44%
colossalai/pipeline/p2p.py                                                102      7    93%
colossalai/pipeline/schedule/__init__.py                                    3      0   100%
colossalai/pipeline/schedule/_utils.py                                     50      5    90%
colossalai/pipeline/schedule/base.py                                       10      1    90%
colossalai/pipeline/schedule/one_f_one_b.py                               116      4    97%
colossalai/pipeline/stage_manager.py                                       68      4    94%
colossalai/shardformer/_utils.py                                           54     15    72%
colossalai/shardformer/layer/__init__.py                                    8      0   100%
colossalai/shardformer/layer/embedding.py                                 130     24    82%
colossalai/shardformer/layer/linear.py                                    181     53    71%
colossalai/shardformer/layer/normalization.py                              51     10    80%
colossalai/shardformer/layer/qkv_fused_linear.py                          292     70    76%
colossalai/shardformer/layer/utils.py                                      84     17    80%
colossalai/shardformer/modeling/bert.py                                   431    128    70%
colossalai/shardformer/modeling/blip2.py                                   53      1    98%
colossalai/shardformer/modeling/bloom.py                                  387    122    68%
colossalai/shardformer/modeling/chatglm2_6b/configuration_chatglm.py       30      0   100%
colossalai/shardformer/modeling/chatglm2_6b/modeling_chatglm.py           571    239    58%
colossalai/shardformer/modeling/chatglm.py                                149     34    77%
colossalai/shardformer/modeling/gpt2.py                                   293     83    72%
colossalai/shardformer/modeling/jit.py                                     19      3    84%
colossalai/shardformer/modeling/llama.py                                  204     65    68%
colossalai/shardformer/modeling/opt.py                                    285     65    77%
colossalai/shardformer/modeling/sam.py                                     94      6    94%
colossalai/shardformer/modeling/t5.py                                     297     74    75%
colossalai/shardformer/modeling/vit.py                                    149     23    85%
colossalai/shardformer/modeling/whisper.py                                 95     13    86%
colossalai/shardformer/policies/auto_policy.py                             27      2    93%
colossalai/shardformer/policies/base_policy.py                             87     11    87%
colossalai/shardformer/policies/bert.py                                   257      0   100%
colossalai/shardformer/policies/blip2.py                                   54      2    96%
colossalai/shardformer/policies/bloom.py                                  151      2    99%
colossalai/shardformer/policies/chatglm.py                                100      6    94%
colossalai/shardformer/policies/gpt2.py                                   181      1    99%
colossalai/shardformer/policies/llama.py                                  114      3    97%
colossalai/shardformer/policies/opt.py                                    140      2    99%
colossalai/shardformer/policies/sam.py                                     32      0   100%
colossalai/shardformer/policies/t5.py                                     182      5    97%
colossalai/shardformer/policies/vit.py                                    108      1    99%
colossalai/shardformer/policies/whisper.py                                 61      2    97%
colossalai/shardformer/shard/shard_config.py                               28      0   100%
colossalai/shardformer/shard/sharder.py                                    95      3    97%
colossalai/shardformer/shard/shardformer.py                                15      0   100%
colossalai/shardformer/shard/utils.py                                      11      0   100%
colossalai/tensor/d_tensor/api.py                                         149     24    84%
colossalai/tensor/moe_tensor/api.py                                        20      7    65%
colossalai/tensor/moe_tensor/moe_info.py                                   10      7    30%
colossalai/zero/low_level/low_level_optim.py                              330     30    91%
tests/kit/model_zoo/transformers/__init__.py                               12      0   100%
tests/kit/model_zoo/transformers/bert.py                                   50      0   100%
tests/kit/model_zoo/transformers/blip2.py                                  21      0   100%
tests/kit/model_zoo/transformers/bloom.py                                  36      0   100%
tests/kit/model_zoo/transformers/chatglm.py                                20      0   100%
tests/kit/model_zoo/transformers/gpt.py                                    39      0   100%
tests/kit/model_zoo/transformers/opt.py                                    32      0   100%
tests/kit/model_zoo/transformers/sam.py                                    14      0   100%
tests/kit/model_zoo/transformers/t5.py                                     25      0   100%
tests/kit/model_zoo/transformers/vit.py                                    24      0   100%
tests/kit/model_zoo/transformers/whisper.py                                23      0   100%
tests/test_booster/test_plugin/test_3d_plugin.py                           64      7    89%
tests/test_booster/test_plugin/test_gemini_plugin.py                       74     10    86%
tests/test_cluster/test_process_group_mesh.py                              86      1    99%
tests/test_fx/test_tracer/test_hf_model/hf_tracer_utils.py                 21      2    90%
tests/test_fx/test_tracer/test_hf_model/test_hf_bert.py                    17      1    94%
tests/test_fx/test_tracer/test_hf_model/test_hf_gpt.py                     17      1    94%
tests/test_lazy/test_models.py                                             14      1    93%
tests/test_pipeline/test_p2p_communication.py                              44      1    98%
tests/test_pipeline/test_schedule/test_oneF_oneB.py                        80      2    98%
tests/test_pipeline/test_schedule/test_pipeline_schedule_utils.py          40      0   100%
tests/test_pipeline/test_stage_manager.py                                  52      1    98%
tests/test_shardformer/test_layer/test_embedding.py                        37      1    97%
tests/test_shardformer/test_layer/test_gpt2_qkv_fused_linear_1d.py         89      1    99%
tests/test_shardformer/test_layer/test_layernorm.py                        35      1    97%
tests/test_shardformer/test_layer/test_linear_1d.py                       110      1    99%
tests/test_shardformer/test_layer/test_qkv_fused_linear_1d.py              51      1    98%
tests/test_shardformer/test_layer/test_vocab_parallel_embedding_1d.py      39      1    97%
tests/test_shardformer/test_model/_utils.py                               142     21    85%
tests/test_shardformer/test_model/test_shard_bert.py                       62      1    98%
tests/test_shardformer/test_model/test_shard_blip2.py                      40      1    98%
tests/test_shardformer/test_model/test_shard_bloom.py                      59      1    98%
tests/test_shardformer/test_model/test_shard_chatglm.py                    60      1    98%
tests/test_shardformer/test_model/test_shard_gpt2.py                       65      1    98%
tests/test_shardformer/test_model/test_shard_llama.py                      62      1    98%
tests/test_shardformer/test_model/test_shard_opt.py                        62      1    98%
tests/test_shardformer/test_model/test_shard_sam.py                        39      1    97%
tests/test_shardformer/test_model/test_shard_t5.py                         59      1    98%
tests/test_shardformer/test_model/test_shard_vit.py                        61      1    98%
tests/test_shardformer/test_model/test_shard_whisper.py                    46      1    98%
tests/test_shardformer/test_shard_utils.py                                 21      0   100%
tests/test_shardformer/test_with_torch_ddp.py                              52      1    98%
tests/test_utils/test_flash_attention.py                                   92      8    91%
-------------------------------------------------------------------------------------------
TOTAL                                                                    9695   1679    83%

@ver217 ver217 merged commit c690fff into hpcaitech:feature/moe Aug 25, 2023
@oahzxl oahzxl deleted the moe branch September 5, 2023 03:10
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Sep 15, 2023
* polish code

* rename

* refactor code

* fix test

* refactor code

* update flash attention version

* Support TP (#6)

* add tp test

* update tp test

* update

* remove fa dependency

* update dependency

* update softmax

* update checkpointio

* update processgroupmesh

* update name

* update param

* add keep vars
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Sep 15, 2023
* polish code

* rename

* refactor code

* fix test

* refactor code

* update flash attention version

* Support TP (#6)

* add tp test

* update tp test

* update

* remove fa dependency

* update dependency

* update softmax

* update checkpointio

* update processgroupmesh

* update name

* update param

* add keep vars
oahzxl added a commit to oahzxl/ColossalAI that referenced this pull request Oct 26, 2023
* polish code

* rename

* refactor code

* fix test

* refactor code

* update flash attention version

* Support TP (#6)

* add tp test

* update tp test

* update

* remove fa dependency

* update dependency

* update softmax

* update checkpointio

* update processgroupmesh

* update name

* update param

* add keep vars
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants