Merge pull request #47 from yuzhou03/bert-docs

add Bert model && case readme
FlagOpen · Apr 20, 2023 · e3c7c07 · e3c7c07
2 parents 6a5be21 + 564f1c7
commit e3c7c07
Show file tree

Hide file tree

Showing 6 changed files with 245 additions and 131 deletions.
diff --git a/training/benchmarks/bert/paddle/readme.md → training/benchmarks/bert/README.md b/training/benchmarks/bert/paddle/readme.md → training/benchmarks/bert/README.md
@@ -1,131 +1,90 @@
-
-### 模型Checkpoint下载
-
-● 下载地址：
-`https://drive.google.com/drive/u/0/folders/1oQF4diVHNPCclykwdvQJw8n_VIWwV0PT`
-
-
-```
-文件列表：
-tf1_ckpt
-vocab.txt
-bert_config.json
-```
-
-
-● 模型格式转换：
-
-```
-git clone https://github.com/mlcommons/training_results_v1.0.git
-cd training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/
-docker build --pull -t mlperf-nvidia:language_model .
-```
-
-启动容器，将checkpoint保存路径挂载为/cks
-
-```
-python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint model.ckpt-28252.pt
-```
-
-### 测试数据集下载
-
-● 下载地址：`https://drive.google.com/drive/folders/1cywmDnAsrP5-2vsr8GDc6QUc7VWe-M3v`
-
-```
-文件列表：
-results_text.tar.gz
-bert_reference_results_text_md5.txt
-```
-
-● 数据集格式转换：
-
-```
-cd /data && tar xf results_text.tar.gz
-cd results4
-md5sum --check ../bert_reference_results_text_md5.txt
-cd ..
-cp training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/input_preprocessing/* ./
-```
-
-再次启动容器，将/data保存路径挂载为/data
-
-```
-cd /data
-./parallel_create_hdf5.sh
-mkdir -p 2048_shards_uncompressed
-python3 ./chop_hdf5_files.py
-mkdir eval_set_uncompressed
-
-python3 create_pretraining_data.py \
-  --input_file=results4/eval.txt \
-  --output_file=eval_all \
-  --vocab_file=vocab.txt \
-  --do_lower_case=True \
-  --max_seq_length=512 \
-  --max_predictions_per_seq=76 \
-  --masked_lm_prob=0.15 \
-  --random_seed=12345 \
-  --dupe_factor=10
-
-python3 pick_eval_samples.py \
-  --input_hdf5_file=eval_all.hdf5 \
-  --output_hdf5_file=eval_set_uncompressed/part_eval_10k.hdf5 \
-  --num_examples_to_pick=10000
-```
-
-> 注：详情参考https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA/benchmarks/bert/implementations/pytorch
-
-### Paddle版本运行指南
-
-单卡运行命令：
-● 依赖包，paddlepaddle-gpu
-
-'''
-python -m pip install paddlepaddle-gpu==2.4.0rc0 -i https://pypi.tuna.tsinghua.edu.cn/simple
-'''
-
-● bash环境变量:
-```
-export MASTER_ADDR=user_ip
-export MASTER_PORT=user_port
-export WORLD_SIZE=1
-export NODE_RANK=0
-export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引
-export RANK=0
-export LOCAL_RANK=0
-```
-example：
-```
-export MASTER_ADDR=10.21.226.184
-export MASTER_PORT=29501
-export WORLD_SIZE=1
-export NODE_RANK=0
-export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引
-export RANK=0
-export LOCAL_RANK=0
-```
-
-● 运行脚本:
-
-在该路径目录下
-
-```
-python run_pretraining.py 
---data_dir data_path
---extern_config_dir config_path
---extern_config_file config_file.py
-```
-
-example：
-```
-python run_pretraining.py 
---data_dir /ssd2/yangjie40/data_config 
---extern_config_dir /ssd2/yangjie40/flagperf/training/nvidia/bert-pytorch/config 
---extern_config_file config_A100x1x2.py 
-```
-
-
-### 许可证
-
-本项目基于Apache 2.0 license。
-本项目部分代码基于MLCommons https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA 实现。
+## 模型信息
+### 模型介绍
+
+BERT stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications.
+BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement).
+
+Please refer to this paper for a detailed description of BERT:
+[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)
+
+
+###  模型代码来源
+[Bert MLPerf](https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA/benchmarks/bert/implementations)
+
+
+### 模型Checkpoint下载
+
+● 下载地址：
+`https://drive.google.com/drive/u/0/folders/1oQF4diVHNPCclykwdvQJw8n_VIWwV0PT`
+
+```
+文件列表：
+tf1_ckpt
+vocab.txt
+bert_config.json
+```
+
+● 模型格式转换：
+
+```
+git clone https://github.com/mlcommons/training_results_v1.0.git
+cd training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/
+docker build --pull -t mlperf-nvidia:language_model .
+```
+
+启动容器，将checkpoint保存路径挂载为/cks
+
+```
+python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint model.ckpt-28252.pt
+```
+
+### 测试数据集下载
+
+● 下载地址：`https://drive.google.com/drive/folders/1cywmDnAsrP5-2vsr8GDc6QUc7VWe-M3v`
+
+```
+文件列表：
+results_text.tar.gz
+bert_reference_results_text_md5.txt
+```
+
+● 数据集格式转换：
+
+```
+cd /data && tar xf results_text.tar.gz
+cd results4
+md5sum --check ../bert_reference_results_text_md5.txt
+cd ..
+cp training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/input_preprocessing/* ./
+```
+
+再次启动容器，将/data保存路径挂载为/data
+
+```
+cd /data
+./parallel_create_hdf5.sh
+mkdir -p 2048_shards_uncompressed
+python3 ./chop_hdf5_files.py
+mkdir eval_set_uncompressed
+
+python3 create_pretraining_data.py \
+  --input_file=results4/eval.txt \
+  --output_file=eval_all \
+  --vocab_file=vocab.txt \
+  --do_lower_case=True \
+  --max_seq_length=512 \
+  --max_predictions_per_seq=76 \
+  --masked_lm_prob=0.15 \
+  --random_seed=12345 \
+  --dupe_factor=10
+
+python3 pick_eval_samples.py \
+  --input_hdf5_file=eval_all.hdf5 \
+  --output_hdf5_file=eval_set_uncompressed/part_eval_10k.hdf5 \
+  --num_examples_to_pick=10000
+```
+
+### 框架与芯片支持情况
+|     | Pytorch  |Paddle|TensorFlow2|
+|  ----  | ----  |  ----  | ----  |
+| Nvidia GPU | N/A |[✅](../../nvidia/bert-paddle/README.md)  |N/A|
diff --git a/training/nvidia/bert-paddle/README.md b/training/nvidia/bert-paddle/README.md
@@ -0,0 +1,87 @@
+
+### 模型Checkpoint下载
+[模型Checkpoint下载](../../benchmarks/bert/README.md#模型checkpoint下载)
+
+
+### 测试数据集下载
+[测试数据集下载](../../benchmarks/bert/README.md#测试数据集下载)
+
+
+### Paddle版本运行指南
+
+单卡运行命令：
+● 依赖包，paddlepaddle-gpu
+
+'''
+python -m pip install paddlepaddle-gpu==2.4.0rc0 -i https://pypi.tuna.tsinghua.edu.cn/simple
+'''
+
+● bash环境变量:
+```
+export MASTER_ADDR=user_ip
+export MASTER_PORT=user_port
+export WORLD_SIZE=1
+export NODE_RANK=0
+export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引
+export RANK=0
+export LOCAL_RANK=0
+```
+example：
+```
+export MASTER_ADDR=10.21.226.184
+export MASTER_PORT=29501
+export WORLD_SIZE=1
+export NODE_RANK=0
+export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引
+export RANK=0
+export LOCAL_RANK=0
+```
+
+● 运行脚本:
+
+在该路径目录下
+
+```
+python run_pretraining.py
+--data_dir data_path
+--extern_config_dir config_path
+--extern_config_file config_file.py
+```
+
+example：
+```
+python run_pretraining.py
+--data_dir /ssd2/yangjie40/data_config
+--extern_config_dir /ssd2/yangjie40/flagperf/training/nvidia/bert-pytorch/config
+--extern_config_file config_A100x1x2.py
+```
+
+
+### Nvidia GPU配置与运行信息参考
+#### 环境配置
+- ##### 硬件环境
+    - 机器、加速卡型号: NVIDIA_A100-SXM4-40GB
+    - 多机网络类型、带宽: InfiniBand，200Gb/s
+- ##### 软件环境
+   - OS版本：Ubuntu 20.04
+   - OS kernel版本: 5.4.0-113-generic
+   - 加速卡驱动版本：470.129.06
+   - Docker 版本：20.10.16
+   - 训练框架版本: paddle-2.4.0-rc
+   - 依赖软件版本：
+     - cuda: cuda_11.2.r11.2
+
+
+### 运行情况
+| 训练资源 | 配置文件        | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能(samples/s)|
+| -------- | --------------- | ----------- | -------- | -------- | ------- | ---------------- |
+| 单机1卡  | config_A100x1x1 | N/A         | 0.67     | N/A      | N/A     | N/A              |
+| 单机2卡  | config_A100x1x2 | N/A         | 0.67     | N/A      | N/A     | N/A              |
+| 单机4卡  | config_A100x1x4 | 1715.28     | 0.67     | 0.6809   | 6250    | 180.07           |
+| 单机8卡  | config_A100x1x8 | 1315.42     | 0.67     | 0.6818   | 4689    | 355.63           |
+
+### 许可证
+
+本项目基于Apache 2.0 license。
+
+本项目部分代码基于MLCommons https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA/benchmarks/ 实现。
diff --git a/training/nvidia/bert-paddle/config/config_A100x1x1.py b/training/nvidia/bert-paddle/config/config_A100x1x1.py
@@ -0,0 +1,17 @@
+target_mlm_accuracy = 0.67
+gradient_accumulation_steps = 1
+max_steps = 10000000
+start_warmup_step = 0
+warmup_proportion = 0
+warmup_steps = 2000
+
+learning_rate = 1e-4
+weight_decay_rate = 0.01
+opt_lamb_beta_1 = 0.9
+opt_lamb_beta_2 = 0.999
+train_batch_size = 12
+eval_batch_size = train_batch_size
+max_samples_termination = 450000000
+cache_eval_data = False
+
+seed = 9031
diff --git a/training/nvidia/bert-paddle/config/config_A100x1x2.py b/training/nvidia/bert-paddle/config/config_A100x1x2.py
@@ -0,0 +1,17 @@
+target_mlm_accuracy = 0.67
+gradient_accumulation_steps = 1
+max_steps = 10000000
+start_warmup_step = 0
+warmup_proportion = 0
+warmup_steps = 2000
+
+learning_rate = 1e-4
+weight_decay_rate = 0.01
+opt_lamb_beta_1 = 0.9
+opt_lamb_beta_2 = 0.999
+train_batch_size = 12
+eval_batch_size = train_batch_size
+max_samples_termination = 450000000
+cache_eval_data = False
+
+seed = 9031
diff --git a/training/nvidia/bert-paddle/config/config_A100x1x4.py b/training/nvidia/bert-paddle/config/config_A100x1x4.py
@@ -0,0 +1,17 @@
+target_mlm_accuracy = 0.67
+gradient_accumulation_steps = 1
+max_steps = 10000000
+start_warmup_step = 0
+warmup_proportion = 0
+warmup_steps = 2000
+
+learning_rate = 1e-4
+weight_decay_rate = 0.01
+opt_lamb_beta_1 = 0.9
+opt_lamb_beta_2 = 0.999
+train_batch_size = 12
+eval_batch_size = train_batch_size
+max_samples_termination = 450000000
+cache_eval_data = False
+
+seed = 9031
diff --git a/training/nvidia/bert-paddle/config/config_A100x2x8.py b/training/nvidia/bert-paddle/config/config_A100x2x8.py
@@ -0,0 +1,17 @@
+target_mlm_accuracy = 0.67
+gradient_accumulation_steps = 1
+max_steps = 10000
+start_warmup_step = 0
+warmup_proportion = 0
+warmup_steps = 2000
+
+learning_rate = 1e-4
+weight_decay_rate = 0.01
+opt_lamb_beta_1 = 0.9
+opt_lamb_beta_2 = 0.999
+train_batch_size = 12
+eval_batch_size = train_batch_size
+max_samples_termination = 4500000
+cache_eval_data = False
+
+seed = 9031