Skip to content

Commit 5ec39fc

Browse files
authored
Merge pull request NVIDIA#739 from swethmandava/master
Perf number correction for BERT TF SQuAD Fp32
2 parents 0b455ff + 002bcd8 commit 5ec39fc

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

TensorFlow/LanguageModeling/BERT/README.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -273,7 +273,7 @@ Note: Not using BookCorpus can potentially change final accuracy on a few downst
273273

274274
4. Download the pretrained models from NGC.
275275

276-
We have uploaded checkpoints that have been [fine tuned](https://ngc.nvidia.com/catalog/models/nvidia:bert_tf_v1_1_large_fp16_384) and [pre-trained](https://ngc.nvidia.com/catalog/models/nvidia:bert_tf_pretraining_lamb_16n) for various configurations on the NGC Model Registry. Our data download scripts, by default download some of them but you can browse and download the relevant checkpoints directly from the [NGC model catalog](https://ngc.nvidia.com/catalog/models). Download them to the `data/download/nvidia_pretrained/` to easily access them in your scripts.
276+
We have uploaded checkpoints that have been [fine tuned](https://ngc.nvidia.com/catalog/models/nvidia:bert_tf_v1_1_large_fp16_384) and [pre-trained](https://ngc.nvidia.com/catalog/models/nvidia:bert_tf_pretraining_lamb_16n) for various configurations on the NGC Model Registry. Our data download scripts, by default download some of them but you can browse and download the relevant checkpoints directly from the [NGC model catalog](https://ngc.nvidia.com/catalog/models). Download them to the `data/download/nvidia_pretrained/` to easily access them in your scripts.
277277

278278
5. Start an interactive session in the NGC container to run training/inference.
279279

@@ -839,9 +839,9 @@ Our results were obtained by running the `scripts/run_squad.sh` training script
839839

840840
| **GPUs** | **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - FP32** | **Weak scaling - mixed precision** |
841841
|----------|---------------------------------------------|----------------------------------|-----------------------|--------------------------------------------------|-------------------------|------------------------------------|
842-
| 1 | 24, 10 | 51.02 | 31.33 | 1.63 | 1.00 | 1.00 |
843-
| 4 | 24, 10 | 181.37 | 94.19 | 1.93 | 3.55 | 3.01 |
844-
| 8 | 24, 10 | 314.6 | 155.53 | 2.02 | 6.17 | 4.96 |
842+
| 1 | 24, 10 | 51.02 | 10.42 | 4.90 | 1.00 | 1.00 |
843+
| 4 | 24, 10 | 181.37 | 39.77 | 4.56 | 3.55 | 3.82 |
844+
| 8 | 24, 10 | 314.6 | 79.37 | 3.96 | 6.17 | 7.62 |
845845

846846
Note: The respective values for FP32 runs that use a batch size of 24 are not available due to out of memory errors that arise.
847847

@@ -889,10 +889,10 @@ Our results were obtained by running the `scripts/run_squad.sh` training script
889889

890890
| **GPUs** | **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - FP32** | **Weak scaling - mixed precision** |
891891
|----------|---------------------------------------------|----------------------------------|-----------------------|--------------------------------------------------|-------------------------|------------------------------------|
892-
| 1 | 24, 10 | 55.28 | 32.72 | 1.69 | 1.00 | 1.00 |
893-
| 4 | 24, 10 | 199.53 | 100.73 | 1.98 | 3.61 | 3.08 |
894-
| 8 | 24, 10 | 341.55 | 168.92 | 2.02 | 6.18 | 5.16 |
895-
| 16 | 24, 10 | 683.37 | 249.54 | 2.74 | 12.36 | 7.63 |
892+
| 1 | 24, 10 | 55.28 | 11.15 | 4.96 | 1.00 | 1.00 |
893+
| 4 | 24, 10 | 199.53 | 42.91 | 4.65 | 3.61 | 3.85 |
894+
| 8 | 24, 10 | 341.55 | 85.08 | 4.01 | 6.18 | 7.63 |
895+
| 16 | 24, 10 | 683.37 | 156.29 | 4.37 | 12.36 | 14.02 |
896896

897897
Note: The respective values for FP32 runs that use a batch size of 24 are not available due to out of memory errors that arise.
898898

0 commit comments

Comments
 (0)