Add PP-MiniLM #1403

LiuChiachi · 2021-12-07T13:04:37Z

PR types

New features

PR changes

Models & Docs

Description

Add PP-MiniLM code
Add doc for PP-MiniLM

Model	#Params	#FLOPs	Speedup	AFQMC	TNEWS	IFLYTEK	CMNLI	OCNLI	WSC	CSL	CLUE平均值
Bertbase	102.3M	10.87B	1.00x	74.14	56.81	61.10	81.19	74.85	79.93	81.47	72.78
TinyBERT6	59.7M	5.44B	1.66x	72.59	55.70	57.64	79.57	73.97	77.63	80.00	71.01
UER-py RoBERTa L6- H768	59.7M	5.44B	1.66x	69.74	66.36	59.95	77.00	71.39	71.05	82.83	71.19
RBT6, Chinese	59.7M	5.44B	1.66x	73.93	56.63	59.79	79.28	73.12	77.30	80.80	71.55
ERNIE-Tiny	90.7M	4.83B	1.89x	70.67	55.60	59.91	75.74	71.36	67.11	76.70	68.16
PP-MiniLM 6L-768H	59.7M	5.44B	1.66x	74.14	57.43	61.75	81.01	76.17	86.18	77.47	73.45
PP-MiniLM裁剪后	49.1M	4.08B	2.00x	73.91	57.44	61.64	81.10	75.59	85.86	77.97	73.36
PP-MiniLM量化后	49.2M	4.08B	4.15x	74.00	57.37	61.33	81.09	75.56	85.85	76.53	73.10

TODO:
1.更新QA对UER-py的测试结果进README；在cuda10.2 paddle2.2.1下测试CSL；

ZeyuChen · 2021-12-07T16:30:09Z

examples/model_compression/PP-MiniLM/README.md

+
+PP-MiniLM融合了蒸馏、裁剪、量化、高性能推理技术，拥有精度高、推理速度快、参数规模小的特点：
+
+- 精度高：6层-768hidden size的模型，精度高于华为、腾讯同样大小的模型；


不要用公司名，要用模型名。

ZeyuChen · 2021-12-07T16:30:40Z

examples/model_compression/PP-MiniLM/README.md

+# PP-MiniLM中文特色小模型
+
+
+PP-MiniLM中文特色小模型，模型结构同ERNIE，目前本案例主要包含六层transformer layer的模型的通用蒸馏，以及借助PaddleSlim对模型进行裁剪和量化，进一步提升推理速度。


模型介绍中需要是否突出下基于MiniLMv2策略的改进？@tianxin1860

如线上沟通，按照 1.推理速度快 2.模型效果好 3.参数规模小的逻辑来呈现。在模型效果好小项里体现我们对 MiniLMv2 的改进。

ZeyuChen · 2021-12-07T16:31:56Z

examples/model_compression/PP-MiniLM/README.md

+| -------------------- | ------------- | ------ | ------- | ----- | ----- | ------- | ----- | ----- | ----- | ----- | ---------- |
+| bert-base-chinese    | 102.27M       |        | TODO    |       |       |         |       |       |       |       |            |
+| TinyBERT(6l-768d)    | 59.7M         |        | 1.00x   | 72.22 | 55.82 | 58.10   | 79.53 | 74.00 | 75.99 | 80.57 | 70.89      |
+| 腾讯 RoBERTa 6l-768d | 59.7M         |        | 1.00x   | 69.74 | 66.36 | 59.95   | 77.00 | 71.39 | 71.05 | 82.83 | 71.19      |


UER-py RoBERTa xxxx 去掉公司名

ZeyuChen · 2021-12-07T16:34:29Z

examples/model_compression/PP-MiniLM/README.md

+
+### 数据介绍
+
+百度内部业务数据。数据被分割成64个文件，放在目录dataset下。


去掉内部业务数据这段话。是否改用CLUESmall数据来作为数据示例。

或者只是简单说下数据的组织方式即可

ZeyuChen · 2021-12-07T16:36:09Z

examples/model_compression/PP-MiniLM/README.md

+
+PP-MiniLM模型的蒸馏方法介绍：
+
+用large-size教师模型的第20层对6层学生模型第6层的q与q、k与k、v与v之间的样本间关系进行蒸馏。即对q、k、v统一head_num之后进行重新排列，


large-size教师模型是什么？是否需要以某个模型为例？

这段话需要重新概括

不适特别清晰

ZeyuChen · 2021-12-07T16:48:18Z

examples/model_compression/PP-MiniLM/README.md

+
+### 性能测试
+
+我们在NVIDIA 16G T4单卡上，使用inference/infer.py脚本，对量化后的模型进行预测。


NVIDIA Tesla T4 （T4只有16G，可以无需特别强调）

ZeyuChen · 2021-12-07T16:50:02Z

examples/model_compression/PP-MiniLM/README.md

+```shell
+cd inference
+
+python infer.py --task_name ${task}  --model_path  ../quantization/${task}_quant_models/${algo}${bs}/int8  --int8 --use_trt --collect_shape # 生成shape range info文件


collect shape环节是否需要增强下说明

ZeyuChen · 2021-12-07T16:50:51Z

examples/model_compression/PP-MiniLM/README.md

+
+### 环境要求：
+
+这一步需要依赖paddle2.2.1，如果想要看到更明显的加速比，需要在T系列卡上测试（本案例使用的是T4）。若在V系列卡上测试，由于其不支持int8 tensor core，加速效果将达不到本文档表格中的加速效果。


要注重英文术语大小写拼写的正确性
Int8 Tensor Core

如果需要得到更明显的加速效果，推荐在NVIDA Tensor Core GPU（如T4、A10、A100)上进行测试。

ZeyuChen · 2021-12-07T16:57:08Z

examples/model_compression/PP-MiniLM/README.md

+float32预测脚本：
+
+```shell
+python infer.py --task_name ${task}  --model_path  $MODEL_PATH  --use_trt --collect_shape


Collect shape环节需要单独说明一下比较好，否则这里会让用户很困惑

ZeyuChen · 2021-12-07T16:57:55Z

examples/model_compression/PP-MiniLM/inference/infer.py

+                config.tensorrt_engine_enabled()))
+            if args.collect_shape:
+                config.collect_shape_range_info(
+                    os.path.dirname(args.model_path) + "/" + args.task_name +


目录的拼写应该是用os.path.join API来拼接，强制用 /会导致windows不兼容

tianxin1860

Leave some comments

tianxin1860 · 2021-12-08T10:52:24Z

examples/model_compression/PP-MiniLM/README.md

+
+### 数据介绍
+
+本实验基于CLUE中分类数据集，linux系统下该数据集会在启动脚本后自动下载到`~/.paddlenlp/datasets/Clue/`。


本实验基于 CLUE 数据集，运行 Fine-tune 脚本会自动下载该数据集到 *** 目录.

tianxin1860 · 2021-12-08T10:56:18Z

examples/model_compression/PP-MiniLM/README.md

+
+本实验基于CLUE中分类数据集，linux系统下该数据集会在启动脚本后自动下载到`~/.paddlenlp/datasets/Clue/`。
+
+使用以下超参范围对第一步通用蒸馏得到的通用模型`GENERAL_MODEL_DIR`进行精调


基于如下超参范围对第一步蒸馏产出的小模型 GENERAL_MODEL_DIR 进行 Grid Search 超参寻优

tianxin1860 · 2021-12-08T10:59:38Z

examples/model_compression/PP-MiniLM/README.md

+cd ofa
+```
+
+经过我们的实验，模型的宽度压缩为原来的3/4的情况下，模型精度无损（-0.15)。


注意表述: 6L768H 条件下，模型宽度压缩为原来的 3/4, 精度几乎无损。

tianxin1860 · 2021-12-08T11:03:34Z

examples/model_compression/PP-MiniLM/README.md

+
+经过我们的实验，模型的宽度压缩为原来的3/4的情况下，模型精度无损（-0.15)。
+
+### 压缩和蒸馏的启动脚本


是否明确一下压缩、裁剪、蒸馏的关系以及使用场合？感觉标题这里用压缩和蒸馏可能引起误解。

tianxin1860 · 2021-12-08T11:06:58Z

examples/model_compression/PP-MiniLM/finetuning/run_clue.py

+    "cmnli": Accuracy,
+    "cluewsc2020": Accuracy,
+    "csl": Accuracy,
+    "xnli": Accuracy,


删除 CLUE 之外的数据集更合适一些？

tianxin1860 · 2021-12-08T11:21:20Z

examples/model_compression/PP-MiniLM/ofa/export_model.py

+    #print(origin_model_new.state_dict().keys())
+    #print("=====================")
+    #for name, params in origin_model_new.named_parameters():
+    #    print(name, params.name)


tianxin1860 · 2021-12-08T11:22:38Z

examples/model_compression/PP-MiniLM/ofa/run_ofa.sh

+export CUDA_VISIBLE_DEVICES=$6
+export TASK_NAME=$1
+export BATCH_SIZE=$3
+export SEQ_LEN=$5
+export PRE_EPOCHS=$4
+export LR=$2
+export STUDENT_DIR=$7


是否按 $1、$2 顺序依次解析变量？

tianxin1860 · 2021-12-08T11:23:20Z

examples/model_compression/PP-MiniLM/quantization/quant_all.sh

+
+do
+
+    python quant_post.py --task_name ${task} --input_dir ${MODEL_DIR}/${task}/0.75/sub_static


这个 0.75 直接作为目录名吗？

tianxin1860 · 2021-12-08T11:23:40Z

examples/model_compression/PP-MiniLM/quantization/quant_post.py

+            'target']['span1_text'], example['target']['span2_text'], example[
+                'target']['span1_index'], example['target']['span2_index']
+        text_list = list(text)
+        # print(text)


多余注释

tianxin1860 · 2021-12-08T11:25:00Z

paddlenlp/transformers/distill_utils.py

+    s_head_dim, t_head_dim = s.shape[3], t.shape[3]
+
+    if alpha + beta == 1.0:
+        loss1 = 0.0


loss1、loss2、loss3 是否替换为相应有意义的命名？

guoshengCS · 2021-12-09T11:13:35Z

examples/model_compression/PP-MiniLM/README.md

+| ----- | ----- | ------- | ----- | ----- | ----- | ----- | ---------- |
+| 74.28 | 57.33 | 61.72   | 81.06 | 76.20 | 86.51 | 78.77 | 73.70      |
+
+### 你可以这样导出Fine-tuning之后的模型直接用于部署


这个作为标题不太合适，标题要尽可能简洁

guoshengCS · 2021-12-09T11:14:57Z

examples/model_compression/PP-MiniLM/README.md

+# PP-MiniLM中文特色小模型
+
+
+PP-MiniLM中文特色小模型，模型结构同ERNIE，目前本案例主要包含六层transformer layer的模型的通用蒸馏，以及借助PaddleSlim对模型进行裁剪和量化，进一步提升推理速度。


我们自称为特色是否合适还请再看下。

这里的本案例可能也不太合适，现在是本模型或者本方案了，是自己的模型了

guoshengCS · 2021-12-09T11:16:17Z

examples/model_compression/PP-MiniLM/README.md

+
+### 原理介绍
+
+PP-MiniLM模型的蒸馏方法介绍：


是否也提下体现下原版MiniLM，也能显出我们命名的由来

guoshengCS · 2021-12-09T11:19:26Z

examples/model_compression/PP-MiniLM/README.md

+
+执行完成后，模型保存的路径位于`ofa_models/CLUEWSC2020/0.75/best_model/`
+
+### 导出裁剪后的模型：


标题里加: 好像也不太合适

guoshengCS · 2021-12-09T11:21:52Z

examples/model_compression/PP-MiniLM/README.md

+
+### 环境要求
+
+本实验如果基于NVIDIA V100 32G 8卡进行，训练周期约为2-3天。若资源有限，可以直接下载这一步得到的模型跳过此步骤。


这个感觉不能算是环境要求

guoshengCS · 2021-12-09T11:35:25Z

examples/model_compression/PP-MiniLM/README.md

+
+PP-MiniLM中文特色小模型，模型结构同ERNIE，目前本案例主要包含六层transformer layer的模型的通用蒸馏，以及借助PaddleSlim对模型进行裁剪和量化，进一步提升推理速度。
+
+PP-MiniLM融合了蒸馏、裁剪、量化、高性能推理技术，拥有精度高、推理速度快、参数规模小的特点：


这句和上面那句可以合起来

guoshengCS · 2021-12-09T13:45:42Z

examples/model_compression/PP-MiniLM/README.md

+| 腾讯 RoBERTa 6l-768d | 59.7M         |        | 1.00x   | 69.74 | 66.36 | 59.95   | 77.00 | 71.39 | 71.05 | 82.83 | 71.19      |
+| PP-MiniLM 6l-768d    | 59.7M         |        | 1.00x   | 74.28 | 57.33 | 61.72   | 81.06 | 76.2  | 86.51 | 78.77 | 73.70      |
+| PP-MiniLM裁剪后      | 49.1M (+裁剪) |        | 1.15x   | 73.82 | 57.33 | 61.60   | 81.38 | 76.20 | 85.52 | 79.00 | 73.55      |
+| PP-MiniLM量化后      | 49.2M(+量化)  |        | 2.18x   | 73.61 | 57.18 | 61.49   | 81.26 | 76.31 | 84.54 | 77.67 | 73.15      |


量化后模型比上一步还更大了吗

guoshengCS · 2021-12-09T13:49:15Z

examples/model_compression/PP-MiniLM/README.md

+- `num_relation_heads` relation heads的个数，一般对于large size的教师模型是64，对于base size的教师模型是48。
+- `teacher_model_type`指示了教师模型类型，当前仅支持'ernie'、'roberta'。
+- `teacher_layer_index`蒸馏时使用的教师模型的层数
+- `student_layer_index` 蒸馏时使用的学生模型的层数


是想表示选用第几层吧，层数感觉可能会带来些误解

# The first commit's message is: update inference # This is the 2nd commit message: update

fix infer perf remove useless comments

…nto add-ppminilm

update readme

tianxin1860 · 2021-12-10T14:39:13Z

examples/model_compression/PP-MiniLM/README.md

@@ -0,0 +1,305 @@
+# PP-MiniLM中文小模型
+
+PP-MiniLM 中文特小模型案例旨在提供训推一体的高精度、高性能小模型解决方案。


PP-MiniLM 中文特小模型案例旨在提供训推一体的高精度、高性能小模型及解决方案。

tianxin1860 · 2021-12-10T14:42:15Z

examples/model_compression/PP-MiniLM/README.md

+
+当前解决方案依托业界领先的 Task Agnostic 模型蒸馏技术、裁剪技术、量化技术，使得小模型兼具推理速度快、模型效果好、参数规模小的 3 大特点。
+
+- 推理速度快：我们集成了 PaddleSlim 的裁剪、量化技术进一步对小模型进行压缩，保证模型推理速度达到原先的2.18倍；


推理速度快: 依托 PaddleSlim 的裁剪、量化技术进一步小模型进行压缩, 使得 PP-MiniLM 量化模型 GPU 推理速度相比 Bert-base 加速比高达 3.56；

tianxin1860 · 2021-12-10T14:47:17Z

examples/model_compression/PP-MiniLM/README.md

+
+- 推理速度快：我们集成了 PaddleSlim 的裁剪、量化技术进一步对小模型进行压缩，保证模型推理速度达到原先的2.18倍；
+
+- 精度高： 我们以 MiniLMv2 提出的 Multi-Head Self-Attention Relation Distillation 技术为基础，通过引入样本间关系知识蒸馏做了进一步算法优化。我们的6层、hidden size为768的模型，在CLUE上的平均准确率分别高于TinyBERT、UER-py RoBERTa同样大小的模型2.66%、1.51%。


精度高: 我们以 MiniLMv2 提出的 Multi-Head Self-Attention Relation Distillation 技术为基础，通过引入样本间关系知识蒸馏做了进一步算法优化, 6 层 PP-MiniLM 模型在 CLUE 数据集上比 12 层 Bert-base-chinese 高 0.23%，比同等规模的 TinyBERT、UER-py RoBERTa 分别高 2.66%、1.51%；

tianxin1860 · 2021-12-10T14:49:52Z

examples/model_compression/PP-MiniLM/README.md

+
+- 精度高： 我们以 MiniLMv2 提出的 Multi-Head Self-Attention Relation Distillation 技术为基础，通过引入样本间关系知识蒸馏做了进一步算法优化。我们的6层、hidden size为768的模型，在CLUE上的平均准确率分别高于TinyBERT、UER-py RoBERTa同样大小的模型2.66%、1.51%。
+
+- 参数规模小：依托 PaddleSlim 裁剪技术，在精度几乎无损(-0.15)条件下将模型宽度压缩 1/4。


参数规模小：依托 PaddleSlim 裁剪技术，在精度几乎无损(-0.15%)条件下将模型隐层宽度压缩 1/4，模型参数量减少 28%；

已经修改，十分感谢～

tianxin1860

Leave some comments

update code and readme update readme Add serial number to readme update readme Added a catalog fix a catalog bug fix a catalog bug

ZeyuChen · 2021-12-15T05:34:54Z

examples/model_compression/PP-MiniLM/README.md

+
+| Model                   | #Params | #FLOPs | Speedup | AFQMC | TNEWS | IFLYTEK | CMNLI | OCNLI | WSC   | CSL   | CLUE平均值 |
+| ----------------------- | ------- | ------ | ------- | ----- | ----- | ------- | ----- | ----- | ----- | ----- | ---------- |
+| Bert<sub>base</sub>     | 102.3M  | 10.87B | 1.00x   | 74.17 | 57.17 | 61.14   | 81.14 | 75.08 | 80.26 | 81.47 | 72.92      |


BERT，作为模型名的BERT统一大写

感谢，已经修改

ZeyuChen · 2021-12-15T05:35:28Z

examples/model_compression/PP-MiniLM/README.md

+
+### 环境说明
+
+本实验基于NVIDIA Tesla V100 32G 8卡进行，训练周期约为2-3天。若资源有限，可以直接[下载PP-MiniLM(6L768H)](https://bj.bcebos.com/paddlenlp/models/transformers/ppminilm/6l-768h)用于下游任务的微调。


需要手动下载吗？是否可以告诉大家用from_pretrained接口自动下载？

不需要，已经加上了用from_pretrained导入的示例，并在modeling.py和tokenizer.py加上ppminilm相关配置

update reamde update readme update reamde update readme

jiweibo · 2021-12-15T12:36:42Z

examples/model_compression/PP-MiniLM/inference/infer_all.sh

@@ -0,0 +1,12 @@
+for task in afqmc tnews iflytek cmnli ocnli cluewsc2020 csl


是否需要加copyright声明呢

感谢提示，已经把shell上都加了copyright

jiweibo

LGTM for inference api

jiweibo · 2021-12-15T14:01:05Z

examples/model_compression/PP-MiniLM/README.md

+
+#### 环境要求
+
+这一步依赖安装有预测库的 PaddlePaddle 2.2.1。可以在[PaddlePaddle 官网](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html)根据机器环境选择合适的 Python 预测库进行安装。


这个url直达python？
https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html#python

感谢提示，已经更新

fix director

chenxiaozeng · 2021-12-16T04:12:43Z

examples/model_compression/PP-MiniLM/README.md

+
+## 导入 PP-MiniLM
+
+PP-MiniLM是使用任务无关蒸馏方法，以 `roberta-wwm-ext-large` 做教师模型蒸馏产出的包含 6 层 Transformer Encoder Layer、Hidden Size 为 768 的中文预训练小模型，在[中文任务测评基准 CLUE](https://github.com/CLUEbenchmark/CLUE) 上七个分类任务上的模型精度超过 BERT<sub>base</sub>、TinyBERT<sub>6</sub>、UER-py RoBERTa L6-H768、RBT6。


roberta-wwm-ext-large 作为教师模型，6层ERNIE作为学生模型是吧？感觉体现下ERNIE比较清晰

感谢建议，已经体现 6 层 ERNIE了。

tianxin1860

LGTM

…nto add-ppminilm

ZeyuChen

LGTM

LiuChiachi · 2021-12-16T14:40:08Z

Thank you all:)🙏

LiuChiachi force-pushed the add-ppminilm branch 2 times, most recently from 6eb85a4 to ebd2be2 Compare December 7, 2021 13:25

ZeyuChen requested a review from tianxin1860 December 7, 2021 16:26

ZeyuChen reviewed Dec 7, 2021

View reviewed changes

tianxin1860 reviewed Dec 8, 2021

View reviewed changes

guoshengCS reviewed Dec 9, 2021

View reviewed changes

LiuChiachi added 2 commits December 10, 2021 09:15

# This is a combination of 2 commits.

ca13671

# The first commit's message is: update inference # This is the 2nd commit message: update

solve conflicts

8d0af8b

fix infer perf remove useless comments

LiuChiachi force-pushed the add-ppminilm branch from 7626a14 to 8d0af8b Compare December 10, 2021 09:18

LiuChiachi added 3 commits December 10, 2021 13:55

update readme

63b108c

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

d644175

…nto add-ppminilm

update readme

af7a461

update readme

LiuChiachi force-pushed the add-ppminilm branch from 4b8aa23 to af7a461 Compare December 10, 2021 14:08

delete useless char

b82ab76

tianxin1860 reviewed Dec 10, 2021

View reviewed changes

LiuChiachi force-pushed the add-ppminilm branch from acb5e69 to dd9cac5 Compare December 14, 2021 08:08

LiuChiachi requested a review from tianxin1860 December 14, 2021 09:23

update reamde

581625c

update code and readme update readme Add serial number to readme update readme Added a catalog fix a catalog bug fix a catalog bug

LiuChiachi force-pushed the add-ppminilm branch from d4e9725 to 581625c Compare December 14, 2021 15:29

ZeyuChen reviewed Dec 15, 2021

View reviewed changes

update reamde

90df17e

update reamde update readme update reamde update readme

LiuChiachi force-pushed the add-ppminilm branch from c71a768 to 90df17e Compare December 15, 2021 12:02

LiuChiachi requested review from jiweibo and ceci3 December 15, 2021 12:04

ZeyuChen previously approved these changes Dec 15, 2021

View reviewed changes

jiweibo reviewed Dec 15, 2021

View reviewed changes

LiuChiachi added 2 commits December 15, 2021 13:24

update readme, add general readme, remove 'ofa'

d98af38

remove infe_perf

1ccb536

LiuChiachi dismissed ZeyuChen’s stale review via 1ccb536 December 15, 2021 13:47

LiuChiachi requested a review from jiweibo December 15, 2021 13:49

jiweibo previously approved these changes Dec 15, 2021

View reviewed changes

jiweibo reviewed Dec 15, 2021

View reviewed changes

Update README

7e4b169

tianxin1860 dismissed jiweibo’s stale review via 7e4b169 December 15, 2021 14:13

tianxin and others added 4 commits December 15, 2021 22:23

Update README

20fc343

remove blank space between Chinese characters and numbers

0055c61

solve conflicts

97c028a

add blank space

75346d1

fix director

LiuChiachi force-pushed the add-ppminilm branch from 8107426 to 75346d1 Compare December 15, 2021 15:24

chenxiaozeng reviewed Dec 16, 2021

View reviewed changes

tianxin1860 previously approved these changes Dec 16, 2021

View reviewed changes

update data in readme

9b67b42

LiuChiachi dismissed tianxin1860’s stale review via 9b67b42 December 16, 2021 14:07

LiuChiachi added 3 commits December 16, 2021 14:14

add blank space

cd7ce83

update readme

e900e03

Merge branch 'develop' of https://github.com/PaddlePaddle/PaddleNLP i…

067dd3a

…nto add-ppminilm

ZeyuChen approved these changes Dec 16, 2021

View reviewed changes

LiuChiachi merged commit 868e7a2 into PaddlePaddle:develop Dec 16, 2021

wawltor mentioned this pull request Dec 17, 2021

PaddleNLP 2.2.1 Release Note Candidate #1467

Closed


		PP-MiniLM融合了蒸馏、裁剪、量化、高性能推理技术，拥有精度高、推理速度快、参数规模小的特点：

		- 精度高：6层-768hidden size的模型，精度高于华为、腾讯同样大小的模型；

		# PP-MiniLM中文特色小模型


		PP-MiniLM中文特色小模型，模型结构同ERNIE，目前本案例主要包含六层transformer layer的模型的通用蒸馏，以及借助PaddleSlim对模型进行裁剪和量化，进一步提升推理速度。


		### 数据介绍

		百度内部业务数据。数据被分割成64个文件，放在目录dataset下。


		PP-MiniLM模型的蒸馏方法介绍：

		用large-size教师模型的第20层对6层学生模型第6层的q与q、k与k、v与v之间的样本间关系进行蒸馏。即对q、k、v统一head_num之后进行重新排列，


		### 性能测试

		我们在NVIDIA 16G T4单卡上，使用inference/infer.py脚本，对量化后的模型进行预测。


		### 环境要求：

		这一步需要依赖paddle2.2.1，如果想要看到更明显的加速比，需要在T系列卡上测试（本案例使用的是T4）。若在V系列卡上测试，由于其不支持int8 tensor core，加速效果将达不到本文档表格中的加速效果。


		### 数据介绍

		本实验基于CLUE中分类数据集，linux系统下该数据集会在启动脚本后自动下载到`~/.paddlenlp/datasets/Clue/`。


		本实验基于CLUE中分类数据集，linux系统下该数据集会在启动脚本后自动下载到`~/.paddlenlp/datasets/Clue/`。

		使用以下超参范围对第一步通用蒸馏得到的通用模型`GENERAL_MODEL_DIR`进行精调


		经过我们的实验，模型的宽度压缩为原来的3/4的情况下，模型精度无损（-0.15)。

		### 压缩和蒸馏的启动脚本


		do

		python quant_post.py --task_name ${task} --input_dir ${MODEL_DIR}/${task}/0.75/sub_static


		执行完成后，模型保存的路径位于`ofa_models/CLUEWSC2020/0.75/best_model/`

		### 导出裁剪后的模型：


		### 环境要求

		本实验如果基于NVIDIA V100 32G 8卡进行，训练周期约为2-3天。若资源有限，可以直接下载这一步得到的模型跳过此步骤。

		@@ -0,0 +1,305 @@
		# PP-MiniLM中文小模型

		PP-MiniLM 中文特小模型案例旨在提供训推一体的高精度、高性能小模型解决方案。


		当前解决方案依托业界领先的 Task Agnostic 模型蒸馏技术、裁剪技术、量化技术，使得小模型兼具推理速度快、模型效果好、参数规模小的 3 大特点。

		- 推理速度快：我们集成了 PaddleSlim 的裁剪、量化技术进一步对小模型进行压缩，保证模型推理速度达到原先的2.18倍；


		- 推理速度快：我们集成了 PaddleSlim 的裁剪、量化技术进一步对小模型进行压缩，保证模型推理速度达到原先的2.18倍；

		- 精度高：我们以 MiniLMv2 提出的 Multi-Head Self-Attention Relation Distillation 技术为基础，通过引入样本间关系知识蒸馏做了进一步算法优化。我们的6层、hidden size为768的模型，在CLUE上的平均准确率分别高于TinyBERT、UER-py RoBERTa同样大小的模型2.66%、1.51%。


		- 精度高：我们以 MiniLMv2 提出的 Multi-Head Self-Attention Relation Distillation 技术为基础，通过引入样本间关系知识蒸馏做了进一步算法优化。我们的6层、hidden size为768的模型，在CLUE上的平均准确率分别高于TinyBERT、UER-py RoBERTa同样大小的模型2.66%、1.51%。

		- 参数规模小：依托 PaddleSlim 裁剪技术，在精度几乎无损(-0.15)条件下将模型宽度压缩 1/4。


		### 环境说明

		本实验基于NVIDIA Tesla V100 32G 8卡进行，训练周期约为2-3天。若资源有限，可以直接[下载PP-MiniLM(6L768H)](https://bj.bcebos.com/paddlenlp/models/transformers/ppminilm/6l-768h)用于下游任务的微调。

		@@ -0,0 +1,12 @@
		for task in afqmc tnews iflytek cmnli ocnli cluewsc2020 csl


		#### 环境要求

		这一步依赖安装有预测库的 PaddlePaddle 2.2.1。可以在[PaddlePaddle 官网](https://paddleinference.paddlepaddle.org.cn/user_guides/download_lib.html)根据机器环境选择合适的 Python 预测库进行安装。


		## 导入 PP-MiniLM

		PP-MiniLM是使用任务无关蒸馏方法，以 `roberta-wwm-ext-large` 做教师模型蒸馏产出的包含 6 层 Transformer Encoder Layer、Hidden Size 为 768 的中文预训练小模型，在[中文任务测评基准 CLUE](https://github.com/CLUEbenchmark/CLUE) 上七个分类任务上的模型精度超过 BERT<sub>base</sub>、TinyBERT<sub>6</sub>、UER-py RoBERTa L6-H768、RBT6。

Add PP-MiniLM #1403

Add PP-MiniLM #1403

Conversation

LiuChiachi commented Dec 7, 2021 • edited Loading

PR types

PR changes

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianxin1860 Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

tianxin1860 Dec 10, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jiweibo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tianxin1860 left a comment

Choose a reason for hiding this comment

ZeyuChen left a comment

Choose a reason for hiding this comment

LiuChiachi commented Dec 16, 2021

LiuChiachi commented Dec 7, 2021 •

edited

Loading

tianxin1860 Dec 10, 2021 •

edited

Loading

tianxin1860 Dec 10, 2021 •

edited

Loading