PaddlePaddle · yingyibiao · Aug 2, 2021 · Jul 27, 2021 · Jul 27, 2021 · Jul 27, 2021
diff --git a/examples/language_model/roformer/README.md b/examples/language_model/roformer/README.md
@@ -0,0 +1,82 @@
+# RoFormer
+
+## 模型简介
+
+[RoFormer](https://arxiv.org/pdf/2104.09864.pdf) (RoFormer: Enhanced Transformer with Rotary Position Embedding)是一个带有旋转位置嵌入(RoPE)的MLM预训练语言模型。 RoPE是一种相对位置编码方法，具有良好的理论特性。其主要思想是根据绝对位置将上下文嵌入（transformer中的 q，k）乘以旋转矩阵。可以证明上下文嵌入的内积将仅取决于相对位置。
+RoPE 是唯一可用于线性注意力的相对位置嵌入。更多详情请参考[论文](https://arxiv.org/pdf/2104.09864.pdf)或[原博客](https://kexue.fm/archives/8265)。EleutherAI还发布了一篇[博客](https://blog.eleuther.ai/rotary-embeddings/)，其中包含有关 RoPE 的直观解释和实验。
+
+本项目是RoFormer在 Paddle 2.0上的开源实现，包含了`THUCNews分类任务`的微调代码。
+
+## 快速开始
+
+### 预训练模型转换
+
+预训练模型可以从 huggingface/transformers 转换而来，方法如下（适用于roformer模型，其他模型按情况调整）：
+
+1. 从huggingface.co获取roformer模型权重
+2. 设置参数运行convert.py代码
+3. 例子：
+   假设我想转换https://huggingface.co/junnyu/roformer_chinese_base 权重
+   - (1)首先下载 https://huggingface.co/junnyu/roformer_chinese_base/tree/main 中的pytorch_model.bin文件,假设我们存入了`./roformer_chinese_base/pytorch_model.bin`
+   - (2)运行convert.py
+        ```bash
+        python convert.py \
+            --pytorch_checkpoint_path ./roformer_chinese_base/pytorch_model.bin \
+            --paddle_dump_path ./roformer_chinese_base/model_state.pdparams
+        ```
+   - (3)最终我们得到了转化好的权重`./roformer_chinese_base/model_state.pdparams`
+
+### 预训练MLM测试
+    ```bash
+    python test_mlm.py --model_name roformer-chinese-base --text 今天[MASK]很好，我想去公园玩！
+    # paddle: 今天[天气||天||阳光||太阳||空气]很好，我想去公园玩！
+    python test_mlm.py --model_name roformer-chinese-base --text 北京是[MASK]的首都！
+    # paddle: 北京是[中国||谁||中华人民共和国||我们||中华民族]的首都！
+    python test_mlm.py --model_name roformer-chinese-char-base --text 今天[MASK]很好，我想去公园玩！
+    # paddle: 今天[天||气||都||风||人]很好，我想去公园玩！
+    python test_mlm.py --model_name roformer-chinese-char-base --text 北京是[MASK]的首都！
+    # paddle: 北京是[谁||我||你||他||国]的首都！
+    ```
+
+### THUCNews分类任务数据
+
+THUCNews分类任务所含数据集已在paddlenlp中以API形式提供，无需预先准备，使用`run_thucnews.py`执行微调时将会自动下载。
+
+### 执行Fine-tunning
+
+启动Fine-tuning的方式如下：
+
+```shell
+unset CUDA_VISIBLE_DEVICES
+python -m paddle.distributed.launch --gpus "0" run_thucnews.py \
+    --model_type roformer \
+    --model_name_or_path roformer-chinese-base \
+    --max_seq_length 256 \
+    --batch_size 64   \
+    --learning_rate 2e-5 \
+    --num_train_epochs 3 \
+    --logging_steps 1 \
+    --save_steps 500 \
+    --output_dir ./tmp/ \
+    --device gpu \
+    --use_amp False
+```
+
+其中参数释义如下：
+- `model_type` 指示了模型类型，使用RoFormer模型时设置为roformer即可。
+- `model_name_or_path` 指示了某种特定配置的模型，对应有其预训练模型和预训练时使用的 tokenizer。若模型相关内容保存在本地，这里也可以提供相应目录地址。注：`roformer-chinese-base`等对应使用的预训练模型转自[huggingface/transformers](https://github.com/huggingface/transformers)，具体可参考当前目录下converter中的内容。
+- `max_seq_length` 表示最大句子长度，超过该长度将被截断。
+- `batch_size` 表示每次迭代**每张卡**上的样本数目。
+- `learning_rate` 表示基础学习率大小，将于learning rate scheduler产生的值相乘作为当前学习率。
+- `num_train_epochs` 表示训练轮数。
+- `logging_steps` 表示日志打印间隔。
+- `save_steps` 表示模型保存及评估间隔。
+- `output_dir` 表示模型保存路径。
+- `device` 表示训练使用的设备, 'gpu'表示使用GPU, 'xpu'表示使用百度昆仑卡, 'cpu'表示使用CPU。
+- `use_amp` 指示是否启用自动混合精度训练。
+
+基于`roformer-chinese-base`在THUCNews分类任务上Fine-tuning后，在验证集上有如下结果：
+
+| Task  | Metric                       | Result            |
+|:-----:|:----------------------------:|:-----------------:|
+| THUCNews | Accuracy                     |      0.98      |
diff --git a/examples/language_model/roformer/convert.py b/examples/language_model/roformer/convert.py
@@ -0,0 +1,77 @@
+# Copyright (c) 2021 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+from collections import OrderedDict
+import argparse
+
+huggingface_to_paddle = {
+    "embeddings.LayerNorm": "embeddings.layer_norm",
+    "encoder.layer": "encoder.layers",
+    "attention.self.query": "self_attn.q_proj",
+    "attention.self.key": "self_attn.k_proj",
+    "attention.self.value": "self_attn.v_proj",
+    "attention.output.dense": "self_attn.out_proj",
+    "intermediate.dense": "linear1",
+    "output.dense": "linear2",
+    "attention.output.LayerNorm": "norm1",
+    "output.LayerNorm": "norm2",
+    "predictions.decoder.": "predictions.decoder_",
+    "predictions.transform.dense": "predictions.transform",
+    "predictions.transform.LayerNorm": "predictions.layer_norm",
+}
+
+
+def convert_pytorch_checkpoint_to_paddle(pytorch_checkpoint_path,
+                                         paddle_dump_path):
+
+    import torch
+    import paddle
+    pytorch_state_dict = torch.load(pytorch_checkpoint_path, map_location="cpu")
+    paddle_state_dict = OrderedDict()
+    for k, v in pytorch_state_dict.items():
+        if k == "cls.predictions.bias" or "encoder.embed_positions." in k:
+            continue
+        if k[-7:] == ".weight":
+            if ".embeddings." not in k and ".LayerNorm." not in k:
+                v = v.transpose(0, 1)
+        oldk = k
+        for huggingface_name, paddle_name in huggingface_to_paddle.items():
+            k = k.replace(huggingface_name, paddle_name)
+
+        if "roformer." not in k and "cls." not in k:
+            k = "roformer." + k
+
+        print(f"Converting: {oldk} => {k}")
+        paddle_state_dict[k] = v.data.numpy()
+
+    paddle.save(paddle_state_dict, paddle_dump_path)
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "--pytorch_checkpoint_path",
+        default="roformer_chinese_base/pytorch_model.bin",
+        type=str,
+        required=True,
+        help="Path to the Pytorch checkpoint path.")
+    parser.add_argument(
+        "--paddle_dump_path",
+        default="roformer_chinese_base/model_state.pdparams",
+        type=str,
+        required=True,
+        help="Path to the output Paddle model.")
+    args = parser.parse_args()
+    convert_pytorch_checkpoint_to_paddle(args.pytorch_checkpoint_path,
+                                         args.paddle_dump_path)