Skip to content

Commit

Permalink
add doc for new bert example (#2224)
Browse files Browse the repository at this point in the history
  • Loading branch information
conglongli authored Aug 17, 2022
1 parent 7d8ad45 commit 5e42cc8
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/_tutorials/bert-pretraining.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,10 @@ excerpt: ""
tags: training pre-training
---

**Note:**
On 08/15/2022 we have added another BERT pre-training/fine-tuning example at [github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/bert_with_pile](https://github.com/microsoft/Megatron-DeepSpeed/tree/main/examples/bert_with_pile), which includes a README.md that describes how to use it. Compared to the example described below, the new example in Megatron-DeepSpeed adds supports of ZeRO and tensor-slicing model parallelism (thus support larger model scale), uses a public and richer [Pile dataset](https://github.com/EleutherAI/the-pile) (user can also use their own data), together with some changes to the model architecture and training hyperparameters as described in [this paper](https://arxiv.org/abs/1909.08053). As a result, the BERT models trained by the new example is able to provide better MNLI results than original BERT, but with a slightly different model architecture and larger computation requirements. If you want to train a larger-scale or better quality BERT-style model, we recommend to follow the new example in Megatron-DeepSpeed. If your goal is to strictly reproduce the original BERT model, we recommend to follow the example under DeepSpeedExamples/bing_bert as described below. On the other hand, the tutorial below helps explaining how to integrate DeepSpeed into a pre-training codebase, regardless of which BERT example you use.
{: .notice--info}

In this tutorial we will apply DeepSpeed to pre-train the BERT
(**B**idirectional **E**ncoder **R**epresentations from **T**ransformers),
which is widely used for many Natural Language Processing (NLP) tasks. The
Expand Down

0 comments on commit 5e42cc8

Please sign in to comment.