Skip to content

Latest commit

 

History

History
 
 

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

BERT pre-training

This repo contains DeepSpeed's version of BERT for pre-training.

Using DeepSpeed's optimized transformer kernels as the building block, we were able to achieve the fastest BERT training record: 44 minutes on 1,024 NVIDIA V100 GPUs, compared with the previous best published result of 67 minutes on the same number and generation of GPUs.

The fastest BERT training record reported above was achieved using internal datasets, which were not publicly available at the time of this release. However, the DeepSpeed BERT model can also be pre-trained using publicly available datasets from Nvidia. Instructions for preparing the datasets are available here. In addition, the following three files are provided in this repo to perform the complete pre-training of DeepSpeed BERT using the Nvidia datasets.

  1. ds_train_bert_nvidia_data_bsz64k_seq128.sh script for phase 1 training
  2. ds_train_bert_nvidia_data_bsz32k_seq512.sh script for phase 2 training
  3. bert_large_lamb_nvidia_data.json for configuring the different parameters relating to the model, datasets, hyper-parameters, etc.

The scripts assume that the datasets are available in the path /workspace/bert. For reference, the default settings of these script and configuration files will pre-train the model to achieve EM/F1 finetuning scores of 83.57/90.62 on SQuAD.