fastnlp
diff --git a/‎README.md
Lines changed: 31 additions & 12 deletions b/‎README.md
Lines changed: 31 additions & 12 deletions
diff --git a/‎finetune/REAMDE.md
Lines changed: 25 additions & 1 deletion b/‎finetune/REAMDE.md
Lines changed: 25 additions & 1 deletion
diff --git a/‎finetune/__init__.py b/‎finetune/__init__.py
diff --git a/‎finetune/classification/REAMDE.md
Lines changed: 78 additions & 0 deletions b/‎finetune/classification/REAMDE.md
Lines changed: 78 additions & 0 deletions
diff --git a/‎finetune/classification/run_clue_classifier.py
Lines changed: 11 additions & 8 deletions b/‎finetune/classification/run_clue_classifier.py
Lines changed: 11 additions & 8 deletions
diff --git a/‎finetune/classification/run_clue_prompt.py
Lines changed: 4 additions & 2 deletions b/‎finetune/classification/run_clue_prompt.py
Lines changed: 4 additions & 2 deletions
diff --git a/‎finetune/cws/REAMDE.md
Lines changed: 21 additions & 0 deletions b/‎finetune/cws/REAMDE.md
Lines changed: 21 additions & 0 deletions
diff --git a/‎finetune/cws/run_cws.py
Lines changed: 5 additions & 2 deletions b/‎finetune/cws/run_cws.py
Lines changed: 5 additions & 2 deletions
diff --git a/‎finetune/generation/REAMDE.md
Lines changed: 1 addition & 0 deletions b/‎finetune/generation/REAMDE.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎finetune/generation/run_gen.py
Lines changed: 6 additions & 3 deletions b/‎finetune/generation/run_gen.py
Lines changed: 6 additions & 3 deletions
diff --git a/‎finetune/modeling_cpt.py
Lines changed: 2 additions & 22 deletions b/‎finetune/modeling_cpt.py
Lines changed: 2 additions & 22 deletions
diff --git a/‎finetune/mrc/REAMDE.md
Lines changed: 41 additions & 0 deletions b/‎finetune/mrc/REAMDE.md
Lines changed: 41 additions & 0 deletions
@@ -22,29 +22,48 @@ The architecture of CPT is a variant of the full Transformer and consists of thr
 2. **Understanding Decoder** (U-Dec): a shallow Transformer encoder with fully-connected self-attention, which is designed for NLU tasks. The input of U-Dec is the output of S-Enc.
 3. **Generation Decoder** (G-Dec): a Transformer decoder with masked self-attention, which is designed for generation tasks with auto-regressive fashion. G-Dec utilizes the output of S-Enc with cross-attention.
 
-## Downloads & Usage
+## Pre-Trained Models
+We provide the pre-trained weights of CPT and Chinese BART with source code, which can be directly used in Huggingface-Transformers.
 
-Coming soon.
+- **`Chinese BART-base`**: 6 layers Encoder, 6 layers Decoder, 12 Heads and 768 Model dim.
+- **`Chinese BART-large`**: 12 layers Encoder, 12 layers Decoder, 16 Heads and 1024 Model dim.
+- **`CPT-base`**: 10 layers S-Enc, 2 layers U-Dec/G-Dec, 12 Heads and 768 Model dim.
+- **`CPT-large`**: 20 layers S-Enc, 4 layers U-Dec/G-Dec, 16 Heads and 1024 Model dim.
 
-## Chinese BART
+The pre-trained weights can be downloaded here.
+| Model | `MODEL_NAME`|
+| --- | --- |
+| **`Chinese BART-base`**  | [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese) | 
+| **`Chinese BART-large`**   | [fnlp/bart-large-chinese](https://huggingface.co/fnlp/bart-large-chinese) |
+| **`CPT-base`**   | [fnlp/cpt-base](https://huggingface.co/fnlp/cpt-base) | 
+| **`CPT-large`**   | [fnlp/cpt-large](https://huggingface.co/fnlp/cpt-large) |
 
-We also provide a pre-trained Chinese BART as a byproduct. The BART models is pre-trained with the same corpora, tokenization and hyper-parameters of CPT.
 
-#### Load with Huggingface-Transformers
-
-Chinese BART is available in **base** and **large** versions, and can be loaded with Huggingface-Transformers. The example code is as follows, where `MODEL_NAME` is `fnlp/bart-base-chinese` or `fnlp/bart-large-chinese` for **base** or **large** size of BART, respectively.
+To use CPT, please import the file `finetune/modeling_cpt.py` that define the architecture of CPT into your project.
+Then, use the PTMs as the following example, where `MODEL_NAME` is the corresponding  string that refers to the model.
 
+For CPT:
 ```python
->>> tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
->>> model = BartForConditionalGeneration.from_pretrained("MODEL_NAME")
+from modeling_cpt import BertTokenizer, CPTForConditionalGeneration
+tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
+model = CPTForConditionalGeneration.from_pretrained("MODEL_NAME")
+print(model)
 ```
 
-The checkpoints of Chinese BART can be downloaded here. 
+For Chinese BART:
+```python
+from transformers import BertTokenizer, BartForConditionalGeneration
+tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
+model = BartForConditionalGeneration.from_pretrained("MODEL_NAME")
+print(model)
+```
 
-- [fnlp/bart-base-chinese](https://huggingface.co/fnlp/bart-base-chinese): 6 layers encoder, 6 layers decoder, 12 heads and 768 model dim.
-- [fnlp/bart-large-chinese](https://huggingface.co/fnlp/bart-large-chinese): 12 layers encoder, 12 layers decoder, 16 heads and 1024 model dim.
+## Pre-Training
+Pre-training code and examples can be find [Here](pretrain/README.md).
 
 
+## Fine-Tuning
+Fine-tuning code and examples can be find [Here](finetune/README.md).
 
 ## Citation
 
 
@@ -1 +1,25 @@
-# Fine-Tuning of CPT
+# Fine-Tuning CPT
+
+This repo contains the fine-tuning code for CPT on multiple NLU and NLG tasks, such as text classification, machine reading comprehension (MRC), sequence labeling and text generation, etc.
+
+## Requirement
+- pytorch==1.8.1
+- transformers==4.2.0
+
+## Run
+The code and running examples are listed in the corresponding folders of the fine-tuning tasks.
+
+- **`classification`**: [Fine-tuning](classification/REAMDE.md) for sequence classification with either external classifiers or prompt-based learning.
+- **`cws`**: [Fine-tuning](cws/REAMDE.md) for Chinese Word Segmentation with external classifiers.
+- **`generation`**: [Fine-tuning](generation/REAMDE.md) for abstractive summarization and data-to-text generation.
+- **`mrc`**: [Fine-tuning](mrc/REAMDE.md) for Span-based Machine Reading Comprehension with exteranl classifiers.
+- **`ner`**: [Fine-tuning](ner/REAMDE.md) for Named Entity Recognition.
+
+You can also fine-tuning CPT on other tasks by adding `modeling_cpt.py` into your project and use the following code to use CPT.
+
+```python
+from modeling_cpt import BertTokenizer, CPTForConditionalGeneration
+tokenizer = BertTokenizer.from_pretrained("MODEL_NAME")
+model = CPTForConditionalGeneration.from_pretrained("MODEL_NAME")
+print(model)
+```
@@ -0,0 +1,78 @@
+# Fine-tuning CPT for Sequence Classification
+
+## Dataset
+The dataset of **CLUE** can be downloaded [HERE](https://github.com/CLUEbenchmark/CLUE)
+
+## Train and Evaluate
+To train and evaluate **CPT$_u$**, **CPT$_g$** and **CPT$_{ug}$**, run the python file `run_clue_classifier.py`, with the argument `--cls_mode` be set to `1`, `2` and `3`, respectively. Following is a script example to run base version of **CPT$_u$** on **AFQMC** dataset.
+
+```bash
+export MODEL_TYPE=cpt-base
+export MODEL_NAME=fnlp/cpt-base
+export CLUE_DATA_DIR=/path/to/clue_data_dir
+export TASK_NAME=afqmc
+export CLS_MODE=1
+python run_clue_classifier.py \
+    --model_type=$MODEL_TYPE \
+    --model_name_or_path=$MODEL_NAME \
+    --cls_mode=$CLS_MODE \
+    --task_name=$TASK_NAME \
+    --do_train=True \
+    --do_predict=1 \
+    --no_tqdm=False \
+    --data_dir=$CLUE_DATA_DIR/${TASK_NAME}/ \
+    --max_seq_length=512 \
+    --per_gpu_train_batch_size=16 \
+    --gradient_accumulation_steps 1 \
+    --per_gpu_eval_batch_size=64 \
+    --weight_decay=0.1 \
+    --adam_epsilon=1e-6 \
+    --adam_beta1=0.9 \
+    --adam_beta2=0.999 \
+    --max_grad_norm=1.0 \
+    --learning_rate=1e-5 \
+    --power=1.0 \
+    --num_train_epochs=5.0 \
+    --warmup_steps=0.1 \
+    --logging_steps=200 \
+    --save_steps=999999 \
+    --output_dir=output/ft/$MODEL_TYPE/${TASK_NAME}/ \
+    --overwrite_output_dir=True \
+    --seed=42
+```
+
+
+## Prompt-based Fine-Tuning
+To train and evaluate **CPT$_{u+p}$** and **CPT$_{g+p}$**, run the python file `run_clue_prompt.py` with the argument `--cls_mode` be set to `1` and `2`, respectively. Following is a script example to run base version of **CPT$_{u+p}$** on **AFQMC** dataset.
+
+```bash
+export MODEL_TYPE=cpt-base
+export MODEL_NAME=fnlp/cpt-base
+export CLUE_DATA_DIR=/path/to/clue_data_dir
+export TASK_NAME=afqmc
+export NUM_TRAIN=-1
+export PATTERN_IDS=0
+export CLS_MODE=1
+python run_clue_prompt.py \
+--pattern_ids $PATTERN_IDS \
+--cls_mode 1 \
+--data_dir=$CLUE_DATA_DIR/${TASK_NAME}/ \
+--model_type $MODEL_TYPE \
+--model_name_or_path $MODEL_NAME \
+--max_seq_length 512 \
+--task_name $TASK_NAME \
+--output_dir output/prompt/$MODEL_TYPE/${TASK_NAME}/ \
+--train_examples $NUM_TRAIN \
+--weight_decay 0.1 \
+--learning_rate 1e-5 \
+--power 1.0 \
+--warmup_steps 0.1 \
+--split_examples_evenly \
+--num_train_epochs 5 \
+--eval_steps 200 \
+--per_gpu_train_batch_size 16 \
+--gradient_accumulation_steps 1 \
+--per_gpu_eval_batch_size 32 \
+--do_train \
+--do_eval
+```
@@ -43,6 +43,9 @@
 from transformers import glue_processors as processors
 from transformers.models.bert.tokenization_bert import BertTokenizer
 from data_processors import clue_output_modes, clue_processors
+
+import sys
+sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..'))
 from modeling_cpt import CPTForSequenceClassification, CPTConfig
 
 
@@ -620,14 +623,19 @@ def get_dataset(args, task, tokenizer, part='train'):
     return dataset
 
 def get_model(args, model_name_or_path, num_labels):
-    if 'cpt' in model_name_or_path:
+    tokenizer = BertTokenizer.from_pretrained(
+        args.tokenizer_name if args.tokenizer_name else model_name_or_path,
+        do_lower_case=args.do_lower_case,
+        cache_dir=args.cache_dir if args.cache_dir else None,
+    )
+    if 'cpt' in args.model_type:
         config = CPTConfig.from_pretrained(
             model_name_or_path,
             num_labels=num_labels,
             finetuning_task=args.task_name,
             cache_dir=args.cache_dir if args.cache_dir else None)
         # config.consist_lambda = args.consist_lambda
-        config.cls_mode = args.ft_mode
+        config.cls_mode = args.cls_mode
         model = CPTForSequenceClassification.from_pretrained(
             model_name_or_path,
             from_tf=bool(".ckpt" in model_name_or_path),
@@ -641,11 +649,6 @@ def get_model(args, model_name_or_path, num_labels):
             finetuning_task=args.task_name,
             cache_dir=args.cache_dir if args.cache_dir else None,
         )
-        tokenizer = AutoTokenizer.from_pretrained(
-            args.tokenizer_name if args.tokenizer_name else model_name_or_path,
-            do_lower_case=args.do_lower_case,
-            cache_dir=args.cache_dir if args.cache_dir else None,
-        )
         model = AutoModelForSequenceClassification.from_pretrained(
             pretrained_model_name_or_path=args.config_name if args.config_name else model_name_or_path,
             from_tf=bool(".ckpt" in model_name_or_path),
@@ -778,7 +781,7 @@ def main():
         type=str2bool, default=True,
         help="Evaluate all checkpoints starting with the same prefix as model_name ending and ending with step number",
     )
-    parser.add_argument("--ft_mode", default=1, type=int, help="CPT fine-tune `mode`")
+    parser.add_argument("--cls_mode", default=1, type=int, help="CPT fine-tune `mode`")
     parser.add_argument("--no_cuda", action="store_true", help="Avoid using CUDA when available")
     parser.add_argument("--no_tqdm", type=str2bool, default=False, help="Avoid using tqdm when available")
     parser.add_argument("--sample_tokenize", type=str2bool, default=False, help="using sampling when tokenize")
 
@@ -10,10 +10,12 @@
 from prompt import prompt, templates, log
 import json
 from transformers import glue_processors, WEIGHTS_NAME
-import sys
-sys.path.append('..')
 from data_processors import clue_processors
+
+import sys
+sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..'))
 from modeling_cpt import CPTForMaskedLM
+
 import glob
 
 import torch.multiprocessing
 
@@ -0,0 +1,21 @@
+# Fine-tuning CPT for CWS
+
+## Dataset
+The dataset **MSR** and **PKU** is from **SIGHAN2005**, which can be downloaded [HERE](http://sighan.cs.uchicago.edu/bakeoff2005/).
+
+## Train and Evaluate
+
+To train and evaluate CPT on CWS dataset, run the python file `run_cws.py`. Following is a script example to run base version of **CPT$_u$** on **MSR** dataset.
+
+```bash
+export MODEL_TYPE=cpt-base
+export MODEL_NAME=fnlp/cpt-base
+export DATA_DIR=/path/to/cws_data_dir
+python run_cws.py \
+    --bert_name=$MODEL_NAME \
+    --data_dir=$DATA_DIR \
+    --dataset=msr \
+    --lr=2e-5 \
+    --batch_size=16 \
+    --epoch=10 \
+```
@@ -16,15 +16,18 @@
 
 from model import CWSModel
 from utils import DataTrainingArguments, ModelArguments, load_json
+
+import sys
+sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..'))
 from modeling_cpt import CPTModel
 
 parser = argparse.ArgumentParser()
-parser.add_argument("--bert_name",default='/remote-home/share/yfshao/bart-zh/arch24-4-new-iter10w',type=str)
+parser.add_argument("--bert_name",default='/path/to/model/',type=str)
 parser.add_argument("--dataset", default="msr",type=str)
 parser.add_argument("--lr",default=2e-5,type=float)
 parser.add_argument("--batch_size",default='16',type=str)
 parser.add_argument("--epoch",default='10',type=str)
-parser.add_argument("--data_dir",default="../../data",type='str')
+parser.add_argument("--data_dir",default="/path/to/dataset/",type='str')
 args = parser.parse_args()
 arg_dict=args.__dict__
 
 
@@ -0,0 +1 @@
+# Fine-tuning CPT for Text Generation
@@ -16,16 +16,19 @@
 from transformers.trainer_utils import is_main_process
 from datasets import load_metric,Dataset
 from utils import DataTrainingArguments, ModelArguments, load_json
+
+import sys
+sys.path.append(os.path.join(os.path.dirname(os.path.abspath(__file__)), '..'))
 from modeling_cpt import CPTModel, CPTForConditionalGeneration
 
 
 parser = argparse.ArgumentParser()
-parser.add_argument("--bert_name",default='/path/to/cpt/',type=str)
+parser.add_argument("--bert_name",default='/path/to/model',type=str)
 parser.add_argument("--dataset", default="lcsts",type=str)
 parser.add_argument("--lr",default=2e-5,type=float)
 parser.add_argument("--batch_size",default='50',type=str)
 parser.add_argument("--epoch",default='5',type=str)
-parser.add_argument("--data_dir",default="/path/to/dataset/",type='str')
+parser.add_argument("--data_dir",default="/path/to/dataset/",type=str)
 args = parser.parse_args()
 arg_dict=args.__dict__
 
@@ -275,4 +278,4 @@ def on_evaluate(self, args, state, control, **kwargs):
         test_preds = [pred.strip() for pred in test_preds]
         output_test_preds_file = os.path.join(training_args.output_dir, "test_generations.txt")
         with open(output_test_preds_file, "w",encoding='UTF-8') as writer:
-            writer.write("\n".join(test_preds))
+            writer.write("\n".join(test_preds))
@@ -50,24 +50,15 @@
 
 from torch.nn import LayerNorm
 
-# For cuda fused ops
-# from megatron.model import LayerNorm
-# from megatron.model.transformer import ParallelMLP
-# from megatron.model.fused_bias_gelu import bias_gelu_impl
-# from megatron import mpu
-# from megatron import get_args
-# from megatron.model.enums import AttnMaskType, LayerType, AttnType
-
-
 logger = logging.get_logger(__name__)
 
-_CHECKPOINT_FOR_DOC = "fudannlp/cpt-large"
+_CHECKPOINT_FOR_DOC = "fnlp/cpt-large"
 _CONFIG_FOR_DOC = "CPTConfig"
 _TOKENIZER_FOR_DOC = "CPTTokenizer"
 
 
 CPT_PRETRAINED_MODEL_ARCHIVE_LIST = [
-    "fudannlp/cpt-large",
+    "fnlp/cpt-large",
 ]
 
 
@@ -114,17 +105,6 @@ def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int]
 
     return inverted_mask.masked_fill(inverted_mask.bool(), torch.finfo(dtype).min)
 
-# def _expand_mask(mask: torch.Tensor, dtype: torch.dtype, tgt_len: Optional[int] = None):
-#     """
-#     Expands attention_mask from `[bsz, seq_len]` to `[bsz, 1, tgt_seq_len, src_seq_len]`.
-#     """
-#     bsz, src_len = mask.size()
-#     tgt_len = tgt_len if tgt_len is not None else src_len
-
-#     expanded_mask = mask[:, None, None, :].expand(bsz, 1, tgt_len, src_len).to(dtype)
-
-#     inverted_mask = (expanded_mask < 0.5)
-#     return inverted_mask
 def attention_mask_func(attention_scores, attention_mask):
     return attention_scores + attention_mask
 
 
@@ -0,0 +1,41 @@
+# Fine-tuning CPT for Sequence Classification
+
+## Dataset
+The dataset of **CMRC2018** can be downloaded [HERE](https://github.com/CLUEbenchmark/CLUE). And **DRCD** can be downloaded [HERE](https://github.com/DRCKnowledgeTeam/DRCD).
+
+## Train and Evaluate
+To train and evaluate **CPT$_u$**, **CPT$_g$** and **CPT$_{ug}$**, run the python file `run_mrc.py`, with the argument `--cls_mode` be set to `1`, `2` and `3`, respectively. Following is a script example to run base version of **CPT$_u$** on **DRCD** dataset.
+
+```bash
+export MODEL_TYPE=cpt-base
+export MODEL_NAME=fnlp/cpt-base
+export CLUE_DATA_DIR=/path/to/mrc_data_dir
+export TASK_NAME=drcd
+export CLS_MODE=1
+python run_mrc.py \
+  --fp16 \
+  --model_type $MODEL_TYPE \
+  --train_epochs=5 \
+  --do_train=1 \
+  --do_predict=1 \
+  --n_batch=16 \
+  --gradient_accumulation_steps 4 \
+  --lr=3e-5 \
+  --dropout=0.2 \
+  --CLS_MODE=$CLS_MODE \
+  --warmup_rate=0.1 \
+  --weight_decay_rate=0.01 \
+  --max_seq_length=512 \
+  --eval_steps=200 \
+  --task_name=$TASK_NAME \
+  --init_restore_dir=$MODEL_NAME \
+  --train_dir=$CLUE_DATA_DIR/$TASK_NAME/train_features.json \
+  --train_file=$CLUE_DATA_DIR/$TASK_NAME/train.json \
+  --dev_dir1=$CLUE_DATA_DIR/$TASK_NAME/dev_examples.json \
+  --dev_dir2=$CLUE_DATA_DIR/$TASK_NAME/dev_features.json \
+  --dev_file=$CLUE_DATA_DIR/$TASK_NAME/dev.json \
+  --test_file=$CLUE_DATA_DIR/$TASK_NAME/test.json \
+  --test_dir1=$CLUE_DATA_DIR/$TASK_NAME/test_examples_$MODEL_TYPE.json \
+  --test_dir2=$CLUE_DATA_DIR/$TASK_NAME/test_features_$MODEL_TYPE.json \
+  --checkpoint_dir=output/$MODEL_TYPE/$TASK_NAME/
+```