Optimization for llm/gpt-3 #6570

DrownFish19 · 2023-08-01T07:23:51Z

PR types

Function optimization

PR changes

APIs and Docs

Description

Optimization for llm/gpt-3

update README.md file.
replace self. parameters with config. parameters in modeling.py.
add _init_weights for GPTPretrainedModel.
modify output_attentions (need_weights) to control attention weights output.

Replace parameters with config in MHA; Replace GPTEmbedding ParamAttr initialiizer with _init_weights;Modify fuse_attention_qkv parameter

paddle-bot · 2023-08-01T07:23:55Z

Thanks for your contribution!

codecov · 2023-08-01T08:01:50Z

Codecov Report

Merging #6570 (b072aa6) into develop (1324998) will not change coverage.
Report is 2 commits behind head on develop.
The diff coverage is n/a.

@@           Coverage Diff            @@
##           develop    #6570   +/-   ##
========================================
  Coverage    62.94%   62.94%           
========================================
  Files          531      531           
  Lines        77727    77727           
========================================
  Hits         48923    48923           
  Misses       28804    28804

ZHUI · 2023-08-01T08:12:38Z

llm/gpt-3/modeling.py

+        need_weights=False, # 
+        weight_attr=None, #
+        bias_attr=None, #
+        do_recompute=False,


看一下这些参数的使用，应该都可以删除

kdim=None, # vdim=None, # need_weights=False, # weight_attr=None, # bias_attr=None, # do_recompute=False,

删除了kdim, vdim, need_weights和bias_attr, 保留了weight_attr和do_recompute作为TransformerDecoderLayer参数接口

https://github.com/PaddlePaddle/PaddleNLP/pull/6403/files#diff-d7382bad9889b21de808d9a410c4eb01aa833043bdb611bc2e40d3ded1878846

ZHUI · 2023-08-01T08:13:28Z

llm/gpt-3/modeling.py

+        embed_dim = config.hidden_size
+        self.embed_dim = config.hidden_size
+        self.kdim = kdim if kdim is not None else config.hidden_size
+        self.vdim = vdim if vdim is not None else config.hidden_size


这个 kdim vdim 应该不单独传入，直接用 config.hidden_size

这两处都直接使用hidden_size了，删除了kdim和vdim

ZHUI · 2023-08-01T08:46:39Z

llm/gpt-3/modeling.py

+        need_weights=False, # 
+        weight_attr=None, #
+        bias_attr=None, #
+        do_recompute=False,


https://github.com/PaddlePaddle/PaddleNLP/pull/6403/files#diff-d7382bad9889b21de808d9a410c4eb01aa833043bdb611bc2e40d3ded1878846

ZHUI · 2023-08-01T08:47:40Z

llm/gpt-3/modeling.py


-        if num_partitions > 1:
+        if config.tensor_parallel_degree > 1:


assert self.num_heads % config.tensor_parallel_degree == 0 self.num_heads = self.num_heads // config.tensor_parallel_degree

ZHUI · 2023-08-01T08:48:52Z

llm/gpt-3/modeling.py

+        if isinstance(layer, (nn.Linear, 
+                              nn.Embedding,
+                              fleet.meta_parallel.VocabParallelEmbedding)):
+            # In the dygraph mode, use the `set_value` to reset the parameter directly,


此处不全，参考llama

ZHUI · 2023-08-01T08:49:57Z

llm/gpt-3/modeling.py

@@ -682,6 +669,17 @@ def get_tensor_parallel_split_mappings(num_layers):
                "layers.0.linear2.weight": partial(fn, is_column=False),
            }

+            if config.fuse_attention_qkv:
+                base_actions["layers.0.self_attn.qkv_proj.weight"] = partial(fn, is_column=True)


单卡。tp=2前向精度。

https://github.com/PaddlePaddle/PaddleNLP/blob/3b8569206d23a8bbf615ae3d2a3862c4bb12a7b7/llm/glm/tests/test_glm_mp.py#L111C1-L113C35

ZHUI · 2023-08-01T08:53:10Z

llm/gpt-3/modeling.py

-
-        self.head_dim = embed_dim // num_heads
-        assert self.head_dim * num_heads == self.embed_dim, "embed_dim must be divisible by num_heads"
+        self.use_flash_attn = config.use_flash_attn if flash_attention else None


use_flash_attention

ZHUI · 2023-08-02T06:17:59Z

llm/gpt-3/README.md

-使用下面脚本,即可在llama-7b的基础上,继续训练.
+注意：
+1. 需要paddle develop版本训练，需要安装`pip install tool_helpers visualdl==2.5.3`等相关缺失whl包
+2. `use_flash_attn` 需要在A100机器开启，否则loss可能不正常（很快变成0.00x,非常小不正常）。建议使用cuda11.8环境。


Suggested change

2. `use_flash_attn` 需要在A100机器开启，否则loss可能不正常（很快变成0.00x,非常小不正常）。建议使用cuda11.8环境。

2. `use_flash_attention` 需要在A100机器开启，否则loss可能不正常（很快变成0.00x,非常小不正常）。建议使用cuda11.8环境。

ZHUI · 2023-08-02T06:18:23Z

llm/gpt-3/README.md

 export PYTHONPATH="../../PaddleNLP/"
 export FLAGS_cudnn_deterministic=True
 log_dir="log"
 rm -rf $log_dir

 python -u  -m paddle.distributed.launch \
-    --gpus "0" \
+    --gpus "6,7" \


Suggested change

--gpus "6,7" \

--gpus "0" \

ZHUI · 2023-08-02T06:19:43Z

llm/gpt-3/modeling.py

+
+        if config.tensor_parallel_degree > 1:
+            assert config.num_attention_heads % config.tensor_parallel_degree == 0
+            config.num_attention_heads = config.num_attention_heads // config.tensor_parallel_degree


Suggested change

config.num_attention_heads = config.num_attention_heads // config.tensor_parallel_degree

self.num_attention_heads = config.num_attention_heads // config.tensor_parallel_degree

修改了原始变量的地方，建议重新赋值一遍。不要直接修改config

ZHUI · 2023-08-02T06:20:16Z

llm/gpt-3/modeling.py

@@ -270,10 +233,10 @@ def gen_cache(self, key, value=None, type=Cache):
            return self.StaticCache(k, v)
        elif value is None:  # incremental_state
            k = layers.fill_constant_batch_size_like(
-                input=key, shape=[-1, self.num_heads, 0, self.head_dim], dtype=key.dtype, value=0
+                input=key, shape=[-1, self.config.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0


Suggested change

input=key, shape=[-1, self.config.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0

input=key, shape=[-1, self.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0

ZHUI · 2023-08-02T06:20:31Z

llm/gpt-3/modeling.py

            )
            v = layers.fill_constant_batch_size_like(
-                input=key, shape=[-1, self.num_heads, 0, self.head_dim], dtype=key.dtype, value=0
+                input=key, shape=[-1, self.config.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0


Suggested change

input=key, shape=[-1, self.config.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0

input=key, shape=[-1, self.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0

ZHUI · 2023-08-02T06:20:54Z

llm/gpt-3/modeling.py

+        # Recompute defaults to False and is controlled by Trainer
+        self.enable_recompute = False
+
+        config.use_flash_attention = config.use_flash_attention if flash_attention else None


Suggested change

config.use_flash_attention = config.use_flash_attention if flash_attention else None

self.use_flash_attention = config.use_flash_attention if flash_attention else None

ZHUI · 2023-08-02T06:35:58Z

llm/gpt-3/modeling.py


        out = paddle.matmul(weights, v)

        # combine heads
        out = tensor.transpose(out, perm=[0, 2, 1, 3])
        out = tensor.reshape(x=out, shape=[0, 0, -1])

-        return (out, weights) if self.need_weights else out
+        return (out, weights) if self.config.need_weights else out


https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py#L299C9-L299C26

参考此处，self.config.need_weights 换成，forward 函数中的 output_attentions 参数。

ZHUI

LGTM

DrownFish19 added 2 commits August 1, 2023 07:16

Update README for llm/gpt-3

e30fdbc

Optimization for llm/gpt-3

36bc390

Replace parameters with config in MHA; Replace GPTEmbedding ParamAttr initialiizer with _init_weights;Modify fuse_attention_qkv parameter

ZHUI reviewed Aug 1, 2023

View reviewed changes

Remove unnecessary parameters

3b85692

ZHUI reviewed Aug 1, 2023

View reviewed changes

DrownFish19 added 3 commits August 2, 2023 03:43

Replace self. with config.

0830030

Update README.md

732e57e

fix finetune error

f418ec7

ZHUI reviewed Aug 2, 2023

View reviewed changes

DrownFish19 and others added 8 commits August 3, 2023 03:42

update README.md

f7a2cca

Add attention weights output

91b3f76

Merge branch 'PaddlePaddle:develop' into develop

64f3872

fix

0a87a01

Merge branch 'develop' of github.com:DrownFish19/PaddleNLP into develop

2873b79

Merge branch 'PaddlePaddle:develop' into develop

deac1f5

Rename need_weights to output_attentions

19c6593

Merge branch 'develop' of github.com:DrownFish19/PaddleNLP into develop

6f3ae9b

ZHUI approved these changes Aug 3, 2023

View reviewed changes

Merge branch 'PaddlePaddle:develop' into develop

b072aa6

ZHUI merged commit 435cb4f into PaddlePaddle:develop Aug 3, 2023
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization for llm/gpt-3 #6570

Optimization for llm/gpt-3 #6570

DrownFish19 commented Aug 1, 2023 •

edited

Loading

paddle-bot bot commented Aug 1, 2023

codecov bot commented Aug 1, 2023 •

edited

Loading

ZHUI Aug 1, 2023 •

edited

Loading

DrownFish19 Aug 1, 2023

ZHUI Aug 1, 2023

ZHUI Aug 1, 2023

DrownFish19 Aug 1, 2023

ZHUI Aug 1, 2023

ZHUI Aug 1, 2023

ZHUI Aug 1, 2023

ZHUI Aug 1, 2023

ZHUI Aug 1, 2023

ZHUI Aug 1, 2023

ZHUI Aug 2, 2023

ZHUI Aug 2, 2023

ZHUI Aug 2, 2023

ZHUI Aug 2, 2023

ZHUI Aug 2, 2023

ZHUI Aug 2, 2023

ZHUI Aug 2, 2023

ZHUI left a comment

	2. `use_flash_attn` 需要在A100机器开启，否则loss可能不正常（很快变成0.00x,非常小不正常）。建议使用cuda11.8环境。
	2. `use_flash_attention` 需要在A100机器开启，否则loss可能不正常（很快变成0.00x,非常小不正常）。建议使用cuda11.8环境。

	config.num_attention_heads = config.num_attention_heads // config.tensor_parallel_degree
	self.num_attention_heads = config.num_attention_heads // config.tensor_parallel_degree

	input=key, shape=[-1, self.config.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0
	input=key, shape=[-1, self.num_attention_heads, 0, self.head_dim], dtype=key.dtype, value=0

	config.use_flash_attention = config.use_flash_attention if flash_attention else None
	self.use_flash_attention = config.use_flash_attention if flash_attention else None

Optimization for llm/gpt-3 #6570

Optimization for llm/gpt-3 #6570

Conversation

DrownFish19 commented Aug 1, 2023 • edited Loading

PR types

PR changes

Description

paddle-bot bot commented Aug 1, 2023

codecov bot commented Aug 1, 2023 • edited Loading

Codecov Report

ZHUI Aug 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZHUI left a comment

Choose a reason for hiding this comment

DrownFish19 commented Aug 1, 2023 •

edited

Loading

codecov bot commented Aug 1, 2023 •

edited

Loading

ZHUI Aug 1, 2023 •

edited

Loading