-
Notifications
You must be signed in to change notification settings - Fork 106
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #47 from yuzhou03/bert-docs
add Bert model && case readme
- Loading branch information
Showing
6 changed files
with
245 additions
and
131 deletions.
There are no files selected for viewing
221 changes: 90 additions & 131 deletions
221
training/benchmarks/bert/paddle/readme.md → training/benchmarks/bert/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,131 +1,90 @@ | ||
|
||
### 模型Checkpoint下载 | ||
|
||
● 下载地址: | ||
`https://drive.google.com/drive/u/0/folders/1oQF4diVHNPCclykwdvQJw8n_VIWwV0PT` | ||
|
||
|
||
``` | ||
文件列表: | ||
tf1_ckpt | ||
vocab.txt | ||
bert_config.json | ||
``` | ||
|
||
|
||
● 模型格式转换: | ||
|
||
``` | ||
git clone https://github.com/mlcommons/training_results_v1.0.git | ||
cd training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/ | ||
docker build --pull -t mlperf-nvidia:language_model . | ||
``` | ||
|
||
启动容器,将checkpoint保存路径挂载为/cks | ||
|
||
``` | ||
python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint model.ckpt-28252.pt | ||
``` | ||
|
||
### 测试数据集下载 | ||
|
||
● 下载地址:`https://drive.google.com/drive/folders/1cywmDnAsrP5-2vsr8GDc6QUc7VWe-M3v` | ||
|
||
``` | ||
文件列表: | ||
results_text.tar.gz | ||
bert_reference_results_text_md5.txt | ||
``` | ||
|
||
● 数据集格式转换: | ||
|
||
``` | ||
cd /data && tar xf results_text.tar.gz | ||
cd results4 | ||
md5sum --check ../bert_reference_results_text_md5.txt | ||
cd .. | ||
cp training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/input_preprocessing/* ./ | ||
``` | ||
|
||
再次启动容器,将/data保存路径挂载为/data | ||
|
||
``` | ||
cd /data | ||
./parallel_create_hdf5.sh | ||
mkdir -p 2048_shards_uncompressed | ||
python3 ./chop_hdf5_files.py | ||
mkdir eval_set_uncompressed | ||
python3 create_pretraining_data.py \ | ||
--input_file=results4/eval.txt \ | ||
--output_file=eval_all \ | ||
--vocab_file=vocab.txt \ | ||
--do_lower_case=True \ | ||
--max_seq_length=512 \ | ||
--max_predictions_per_seq=76 \ | ||
--masked_lm_prob=0.15 \ | ||
--random_seed=12345 \ | ||
--dupe_factor=10 | ||
python3 pick_eval_samples.py \ | ||
--input_hdf5_file=eval_all.hdf5 \ | ||
--output_hdf5_file=eval_set_uncompressed/part_eval_10k.hdf5 \ | ||
--num_examples_to_pick=10000 | ||
``` | ||
|
||
> 注:详情参考https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA/benchmarks/bert/implementations/pytorch | ||
### Paddle版本运行指南 | ||
|
||
单卡运行命令: | ||
● 依赖包,paddlepaddle-gpu | ||
|
||
''' | ||
python -m pip install paddlepaddle-gpu==2.4.0rc0 -i https://pypi.tuna.tsinghua.edu.cn/simple | ||
''' | ||
|
||
● bash环境变量: | ||
``` | ||
export MASTER_ADDR=user_ip | ||
export MASTER_PORT=user_port | ||
export WORLD_SIZE=1 | ||
export NODE_RANK=0 | ||
export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引 | ||
export RANK=0 | ||
export LOCAL_RANK=0 | ||
``` | ||
example: | ||
``` | ||
export MASTER_ADDR=10.21.226.184 | ||
export MASTER_PORT=29501 | ||
export WORLD_SIZE=1 | ||
export NODE_RANK=0 | ||
export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引 | ||
export RANK=0 | ||
export LOCAL_RANK=0 | ||
``` | ||
|
||
● 运行脚本: | ||
|
||
在该路径目录下 | ||
|
||
``` | ||
python run_pretraining.py | ||
--data_dir data_path | ||
--extern_config_dir config_path | ||
--extern_config_file config_file.py | ||
``` | ||
|
||
example: | ||
``` | ||
python run_pretraining.py | ||
--data_dir /ssd2/yangjie40/data_config | ||
--extern_config_dir /ssd2/yangjie40/flagperf/training/nvidia/bert-pytorch/config | ||
--extern_config_file config_A100x1x2.py | ||
``` | ||
|
||
|
||
### 许可证 | ||
|
||
本项目基于Apache 2.0 license。 | ||
本项目部分代码基于MLCommons https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA 实现。 | ||
## 模型信息 | ||
### 模型介绍 | ||
|
||
BERT stands for Bidirectional Encoder Representations from Transformers. Unlike recent language representation models, BERT is designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial task-specific architecture modifications. | ||
BERT is conceptually simple and empirically powerful. It obtains new state-of-the-art results on eleven natural language processing tasks, including pushing the GLUE score to 80.5% (7.7% point absolute improvement), MultiNLI accuracy to 86.7% (4.6% absolute improvement), SQuAD v1.1 question answering Test F1 to 93.2 (1.5 point absolute improvement) and SQuAD v2.0 Test F1 to 83.1 (5.1 point absolute improvement). | ||
|
||
Please refer to this paper for a detailed description of BERT: | ||
[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) | ||
|
||
|
||
### 模型代码来源 | ||
[Bert MLPerf](https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA/benchmarks/bert/implementations) | ||
|
||
|
||
### 模型Checkpoint下载 | ||
|
||
● 下载地址: | ||
`https://drive.google.com/drive/u/0/folders/1oQF4diVHNPCclykwdvQJw8n_VIWwV0PT` | ||
|
||
``` | ||
文件列表: | ||
tf1_ckpt | ||
vocab.txt | ||
bert_config.json | ||
``` | ||
|
||
● 模型格式转换: | ||
|
||
``` | ||
git clone https://github.com/mlcommons/training_results_v1.0.git | ||
cd training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/ | ||
docker build --pull -t mlperf-nvidia:language_model . | ||
``` | ||
|
||
启动容器,将checkpoint保存路径挂载为/cks | ||
|
||
``` | ||
python convert_tf_checkpoint.py --tf_checkpoint /cks/model.ckpt-28252.index --bert_config_path /cks/bert_config.json --output_checkpoint model.ckpt-28252.pt | ||
``` | ||
|
||
### 测试数据集下载 | ||
|
||
● 下载地址:`https://drive.google.com/drive/folders/1cywmDnAsrP5-2vsr8GDc6QUc7VWe-M3v` | ||
|
||
``` | ||
文件列表: | ||
results_text.tar.gz | ||
bert_reference_results_text_md5.txt | ||
``` | ||
|
||
● 数据集格式转换: | ||
|
||
``` | ||
cd /data && tar xf results_text.tar.gz | ||
cd results4 | ||
md5sum --check ../bert_reference_results_text_md5.txt | ||
cd .. | ||
cp training_results_v1.0/NVIDIA/benchmarks/bert/implementations/pytorch/input_preprocessing/* ./ | ||
``` | ||
|
||
再次启动容器,将/data保存路径挂载为/data | ||
|
||
``` | ||
cd /data | ||
./parallel_create_hdf5.sh | ||
mkdir -p 2048_shards_uncompressed | ||
python3 ./chop_hdf5_files.py | ||
mkdir eval_set_uncompressed | ||
python3 create_pretraining_data.py \ | ||
--input_file=results4/eval.txt \ | ||
--output_file=eval_all \ | ||
--vocab_file=vocab.txt \ | ||
--do_lower_case=True \ | ||
--max_seq_length=512 \ | ||
--max_predictions_per_seq=76 \ | ||
--masked_lm_prob=0.15 \ | ||
--random_seed=12345 \ | ||
--dupe_factor=10 | ||
python3 pick_eval_samples.py \ | ||
--input_hdf5_file=eval_all.hdf5 \ | ||
--output_hdf5_file=eval_set_uncompressed/part_eval_10k.hdf5 \ | ||
--num_examples_to_pick=10000 | ||
``` | ||
|
||
### 框架与芯片支持情况 | ||
| | Pytorch |Paddle|TensorFlow2| | ||
| ---- | ---- | ---- | ---- | | ||
| Nvidia GPU | N/A |[✅](../../nvidia/bert-paddle/README.md) |N/A| |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,87 @@ | ||
|
||
### 模型Checkpoint下载 | ||
[模型Checkpoint下载](../../benchmarks/bert/README.md#模型checkpoint下载) | ||
|
||
|
||
### 测试数据集下载 | ||
[测试数据集下载](../../benchmarks/bert/README.md#测试数据集下载) | ||
|
||
|
||
### Paddle版本运行指南 | ||
|
||
单卡运行命令: | ||
● 依赖包,paddlepaddle-gpu | ||
|
||
''' | ||
python -m pip install paddlepaddle-gpu==2.4.0rc0 -i https://pypi.tuna.tsinghua.edu.cn/simple | ||
''' | ||
|
||
● bash环境变量: | ||
``` | ||
export MASTER_ADDR=user_ip | ||
export MASTER_PORT=user_port | ||
export WORLD_SIZE=1 | ||
export NODE_RANK=0 | ||
export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引 | ||
export RANK=0 | ||
export LOCAL_RANK=0 | ||
``` | ||
example: | ||
``` | ||
export MASTER_ADDR=10.21.226.184 | ||
export MASTER_PORT=29501 | ||
export WORLD_SIZE=1 | ||
export NODE_RANK=0 | ||
export CUDA_VISIBLE_DEVICES=0,1#可用的GPU索引 | ||
export RANK=0 | ||
export LOCAL_RANK=0 | ||
``` | ||
|
||
● 运行脚本: | ||
|
||
在该路径目录下 | ||
|
||
``` | ||
python run_pretraining.py | ||
--data_dir data_path | ||
--extern_config_dir config_path | ||
--extern_config_file config_file.py | ||
``` | ||
|
||
example: | ||
``` | ||
python run_pretraining.py | ||
--data_dir /ssd2/yangjie40/data_config | ||
--extern_config_dir /ssd2/yangjie40/flagperf/training/nvidia/bert-pytorch/config | ||
--extern_config_file config_A100x1x2.py | ||
``` | ||
|
||
|
||
### Nvidia GPU配置与运行信息参考 | ||
#### 环境配置 | ||
- ##### 硬件环境 | ||
- 机器、加速卡型号: NVIDIA_A100-SXM4-40GB | ||
- 多机网络类型、带宽: InfiniBand,200Gb/s | ||
- ##### 软件环境 | ||
- OS版本:Ubuntu 20.04 | ||
- OS kernel版本: 5.4.0-113-generic | ||
- 加速卡驱动版本:470.129.06 | ||
- Docker 版本:20.10.16 | ||
- 训练框架版本: paddle-2.4.0-rc | ||
- 依赖软件版本: | ||
- cuda: cuda_11.2.r11.2 | ||
|
||
|
||
### 运行情况 | ||
| 训练资源 | 配置文件 | 运行时长(s) | 目标精度 | 收敛精度 | Steps数 | 性能(samples/s)| | ||
| -------- | --------------- | ----------- | -------- | -------- | ------- | ---------------- | | ||
| 单机1卡 | config_A100x1x1 | N/A | 0.67 | N/A | N/A | N/A | | ||
| 单机2卡 | config_A100x1x2 | N/A | 0.67 | N/A | N/A | N/A | | ||
| 单机4卡 | config_A100x1x4 | 1715.28 | 0.67 | 0.6809 | 6250 | 180.07 | | ||
| 单机8卡 | config_A100x1x8 | 1315.42 | 0.67 | 0.6818 | 4689 | 355.63 | | ||
|
||
### 许可证 | ||
|
||
本项目基于Apache 2.0 license。 | ||
|
||
本项目部分代码基于MLCommons https://github.com/mlcommons/training_results_v1.0/tree/master/NVIDIA/benchmarks/ 实现。 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
target_mlm_accuracy = 0.67 | ||
gradient_accumulation_steps = 1 | ||
max_steps = 10000000 | ||
start_warmup_step = 0 | ||
warmup_proportion = 0 | ||
warmup_steps = 2000 | ||
|
||
learning_rate = 1e-4 | ||
weight_decay_rate = 0.01 | ||
opt_lamb_beta_1 = 0.9 | ||
opt_lamb_beta_2 = 0.999 | ||
train_batch_size = 12 | ||
eval_batch_size = train_batch_size | ||
max_samples_termination = 450000000 | ||
cache_eval_data = False | ||
|
||
seed = 9031 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
target_mlm_accuracy = 0.67 | ||
gradient_accumulation_steps = 1 | ||
max_steps = 10000000 | ||
start_warmup_step = 0 | ||
warmup_proportion = 0 | ||
warmup_steps = 2000 | ||
|
||
learning_rate = 1e-4 | ||
weight_decay_rate = 0.01 | ||
opt_lamb_beta_1 = 0.9 | ||
opt_lamb_beta_2 = 0.999 | ||
train_batch_size = 12 | ||
eval_batch_size = train_batch_size | ||
max_samples_termination = 450000000 | ||
cache_eval_data = False | ||
|
||
seed = 9031 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
target_mlm_accuracy = 0.67 | ||
gradient_accumulation_steps = 1 | ||
max_steps = 10000000 | ||
start_warmup_step = 0 | ||
warmup_proportion = 0 | ||
warmup_steps = 2000 | ||
|
||
learning_rate = 1e-4 | ||
weight_decay_rate = 0.01 | ||
opt_lamb_beta_1 = 0.9 | ||
opt_lamb_beta_2 = 0.999 | ||
train_batch_size = 12 | ||
eval_batch_size = train_batch_size | ||
max_samples_termination = 450000000 | ||
cache_eval_data = False | ||
|
||
seed = 9031 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
target_mlm_accuracy = 0.67 | ||
gradient_accumulation_steps = 1 | ||
max_steps = 10000 | ||
start_warmup_step = 0 | ||
warmup_proportion = 0 | ||
warmup_steps = 2000 | ||
|
||
learning_rate = 1e-4 | ||
weight_decay_rate = 0.01 | ||
opt_lamb_beta_1 = 0.9 | ||
opt_lamb_beta_2 = 0.999 | ||
train_batch_size = 12 | ||
eval_batch_size = train_batch_size | ||
max_samples_termination = 4500000 | ||
cache_eval_data = False | ||
|
||
seed = 9031 |