Skip to content

Commit cef4bab

Browse files
committed
Adding BERT/TF Inference with TensorRT; minor README fixes
1 parent a2281e3 commit cef4bab

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

41 files changed

+2949
-405
lines changed

TensorFlow/LanguageModeling/BERT/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -127,3 +127,7 @@ dmypy.json
127127

128128
# Pyre type checker
129129
.pyre/
130+
131+
# TensorRT
132+
*.engine
133+
models/

TensorFlow/LanguageModeling/BERT/.gitmodules

Lines changed: 0 additions & 4 deletions
This file was deleted.

TensorFlow/LanguageModeling/BERT/Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:19.08-py3
1+
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:19.10-py3
22

33
FROM ${FROM_IMAGE_NAME}
44

TensorFlow/LanguageModeling/BERT/README.md

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ This repository provides a script and recipe to train the BERT model for TensorF
2727
* [Fine tuning](#fine-tuning)
2828
* [Multi-node](#multi-node)
2929
* [Inference process](#inference-process)
30+
* [Inference Process With TensorRT](#inference-process-with-tensorrt)
3031
* [Deploying the BERT model using TensorRT Inference Server](#deploying-the-bert-model-using-tensorrt-inference-server)
3132
* [BioBERT](#biobert)
3233
- [Performance](#performance)
@@ -615,6 +616,9 @@ I0312 23:14:00.550973 140287431493376 run_squad.py:1397] 0 Inference Performance
615616
{"exact_match": 83.69914853358561, "f1": 90.8477003317459}
616617
```
617618

619+
### Inference Process With TensorRT
620+
NVIDIA TensorRT is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. More information on how to perform inference using TensorRT can be found in the subfolder [./trt/README.md](trt/README.md)
621+
618622
### Deploying the BERT model using TensorRT Inference Server
619623

620624
The [NVIDIA TensorRT Inference Server](https://github.com/NVIDIA/tensorrt-inference-server) provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server. More information on how to perform inference using `TensorRT Inference Server` can be found in the subfolder `./trtis/README.md`.
@@ -675,6 +679,7 @@ Our results were obtained by running the `scripts/run_pretraining_lamb.sh` train
675679

676680
| **DGX System** | **Nodes** | **Precision** | **Batch Size/GPU: Phase1, Phase2** | **Accumulation Steps: Phase1, Phase2** | **Time to Train (Hrs)** | **Final Loss** |
677681
|----------------|-----------|---------------|------------------------------------|----------------------------------------|----------------|-------------------------|
682+
| DGX1 | 1 | FP16 | 16, 4 |512,1024| 299.86| 1.67 |
678683
| DGX1 | 4 | FP16 | 16, 4 |128, 256| 62.49 | 1.72 |
679684
| DGX1 | 16 | FP16 | 16, 4 | 32, 64 | 16.58 | 1.76 |
680685
| DGX1 | 32 | FP16 | 16, 2 | 16, 64 | 9.85 | 1.71 |
@@ -728,7 +733,7 @@ The following tables compare `F1` scores across 5 different training runs with d
728733

729734
###### Pre-training training performance: single-node on 16G
730735

731-
Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput.
736+
Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
732737

733738

734739
| **GPUs** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
@@ -744,7 +749,7 @@ Note: The respective values for FP32 runs that use a batch size of 16, 4 in sequ
744749

745750
###### Pre-training training performance: multi-node on 16G
746751

747-
Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput.
752+
Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
748753

749754
| **Nodes** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
750755
|:-------:|:-----:|:-------:|:-------:|:-------:|:-------------:|:------:|:------:|
@@ -777,7 +782,7 @@ To achieve these same results, follow the [Quick Start Guide](#quick-start-guide
777782

778783
###### Pre-training training performance: single-node on 32G
779784

780-
Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput.
785+
Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
781786

782787
| **GPUs** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
783788
|:-------:|:-----:|:-------:|:-------:|:-------:|:-------------:|:------:|:------:|
@@ -809,7 +814,7 @@ To achieve these same results, follow the [Quick Start Guide](#quick-start-guide
809814

810815
###### Pre-training training performance: single-node on DGX-2 32G
811816

812-
Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput.
817+
Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
813818

814819
| **GPUs** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
815820
|:-------:|:-----:|:-------:|:-------:|:-------:|:-------------:|:------:|:------:|
@@ -826,7 +831,7 @@ Note: The respective values for FP32 runs that use a batch size of 48, 8 in sequ
826831

827832
###### Pre-training training performance: multi-node on DGX-2H 32G
828833

829-
Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput.
834+
Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
830835

831836

832837
| **Nodes** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
@@ -1131,6 +1136,9 @@ To achieve these same results, follow the [Quick Start Guide](#quick-start-guide
11311136

11321137
### Changelog
11331138

1139+
Janurary 2020
1140+
- Added inference with TensorRT
1141+
11341142
November 2019
11351143
- Pre-training and Finetuning on BioMedical tasks and corpus
11361144

TensorFlow/LanguageModeling/BERT/biobert/README.md

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -85,9 +85,10 @@ To download and preprocess pre-training data as well as the required vocab files
8585
bash biobert/scripts/biobert_data_download.sh
8686
```
8787

88-
Datasets for finetuning can be obtained from this [repository](https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1)
88+
Datasets for finetuning for NER can be obtained from this (repository)[https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1]
89+
Datasets for finetuning for RE can be obtained from this (repository)[https://github.com/arwhirang/recursive_chemprot/tree/master/Demo/tree_LSTM/data]
8990

90-
Place them in `/workspace/bert/data/biobert/` to be automatically picked up by our scripts.
91+
Place them both in `/workspace/bert/data/biobert/` to be automatically picked up by our scripts.
9192

9293
4. Start an interactive session in the NGC container to run training/inference.
9394

@@ -431,11 +432,15 @@ Our results were obtained by running the `scripts/run_pretraining_lamb.sh` train
431432
432433
| **DGX System** | **Nodes** | **Precision** | **Batch Size/GPU: Phase1, Phase2** | **Accumulation Steps: Phase1, Phase2** | **Time to Train (Hrs)** | **Final Loss** |
433434
|----------------|-----------|---------------|------------------------------------|----------------------------------------|----------------|-------------------------|
434-
| DGX2H | 4 | FP16 | 128, 16 | 8, 32 | 19.14 | 0.88 |
435-
| DGX2H | 16 | FP16 | 128, 16 | 2, 8 | 4.81 | 0.86 |
436-
| DGX2H | 32 | FP16 | 128, 16 | 1, 4 | 2.65 | 0.87 |
437-
438-
#### Fine-tuning accuracy
435+
| DGX2H | 4 | FP16 | 128, 16 | 8, 32 | 19.14 | 0.88 |
436+
| DGX2H | 16 | FP16 | 128, 16 | 2, 8 | 4.81 | 0.86 |
437+
| DGX2H | 32 | FP16 | 128, 16 | 1, 4 | 2.65 | 0.87 |
438+
| DGX1 | 1 | FP16 | 64, 8 |128,512| 174.58 | 0.87 |
439+
| DGX1 | 4 | FP16 | 64, 8 |32, 128| 57.71 | 0.85 |
440+
| DGX1 | 16 | FP16 | 64, 8 |8, 32 | 12.62 | 0.87 |
441+
| DGX1 | 32 | FP16 | 64, 8 |4, 16 | 6.97 | 0.87 |
442+
443+
###### Fine-tuning accuracy
439444
440445
| **Task** | **F1** | **Precision** | **Recall** |
441446
|:-------:|:----:|:----:|:----:|

TensorFlow/LanguageModeling/BERT/biobert/re_eval.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
args = parser.parse_args()
1313

1414

15-
testdf = pd.read_csv(args.answer_path, sep="\t", index_col=0)
15+
testdf = pd.read_csv(args.answer_path, sep="\t", header=None)
1616
preddf = pd.read_csv(args.output_path, sep="\t", header=None)
1717

1818

@@ -37,9 +37,10 @@
3737
pred_class = [np.argmax(v) for v in pred]
3838
str_to_int_mapper = dict()
3939

40-
for i,v in enumerate(sorted(testdf["label"].unique())):
40+
testdf.iloc[:,3] = testdf.iloc[:, 3].fillna("False")
41+
for i,v in enumerate(sorted(testdf.iloc[:,3].unique())):
4142
str_to_int_mapper[v] = i
42-
test_answer = [str_to_int_mapper[v] for v in testdf["label"]]
43+
test_answer = [str_to_int_mapper[v] for v in testdf.iloc[:,3]]
4344

4445
p,r,f,s = sklearn.metrics.precision_recall_fscore_support(y_pred=pred_class, y_true=test_answer, labels=[0,1,2,3,4], average="micro")
4546
results = dict()

TensorFlow/LanguageModeling/BERT/biobert/scripts/biobert_finetune_inference_benchmark.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -134,7 +134,7 @@ elif [ "$task" = "ner_bc5cdr-disease" ] ; then
134134
done
135135

136136
elif [ "$task" = "rel_chemprot" ] ; then
137-
DATASET_DIR=/workspace/bert/data/biobert/ChemProt
137+
DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
138138

139139
LOGFILE="${OUTPUT_DIR}/${task}_training_benchmark_bert_${bert_model}.log"
140140

TensorFlow/LanguageModeling/BERT/biobert/scripts/biobert_finetune_train_benchmark.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,7 +150,7 @@ elif [ "$task" = "ner_bc5cdr-disease" ] ; then
150150
done
151151

152152
elif [ "$task" = "rel_chemprot" ] ; then
153-
DATASET_DIR=/workspace/bert/data/biobert/ChemProt
153+
DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
154154
LOGFILE="${OUTPUT_DIR}/${task}_training_benchmark_bert_${bert_model}_gpu_${num_gpu}.log"
155155

156156
echo "Training performance benchmarking for BERT $bert_model from $BERT_DIR" >> $LOGFILE

TensorFlow/LanguageModeling/BERT/biobert/scripts/rel_chemprot.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
echo "Container nvidia build = " $NVIDIA_BUILD_ID
44

55
init_checkpoint=${1:-"/results/biobert_tf_uncased_base/model.ckpt-4340"}
6-
train_batch_size=${2:-64}
6+
train_batch_size=${2:-8}
77
learning_rate=${3:-1.5e-6}
88
cased=${4:-false}
99
precision=${5:-"fp16"}
@@ -35,7 +35,7 @@ printf -v TAG "tf_bert_biobert_rel_chemprot_%s_%s_gbs%d" "$bert_model" "$precisi
3535
DATESTAMP=`date +'%y%m%d%H%M%S'`
3636

3737

38-
DATASET_DIR=/workspace/bert/data/biobert/ChemProt
38+
DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
3939
OUTPUT_DIR=/results/${TAG}_${DATESTAMP}
4040
mkdir -p ${OUTPUT_DIR}
4141

TensorFlow/LanguageModeling/BERT/biobert/scripts/run_biobert_finetuning_inference.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ elif [ "$task" = "ner_bc5cdr-disease" ] ; then
9494

9595
elif [ "$task" = "rel_chemprot" ] ; then
9696
printf -v TAG "tf_bert_biobert_rel_chemprot_inference_%s_%s_" "$bert_model" "$precision"
97-
DATASET_DIR=/workspace/bert/data/biobert/ChemProt
97+
DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
9898
OUTPUT_DIR=/results/${TAG}_${DATESTAMP}
9999

100100
python3 /workspace/bert/run_re.py \

0 commit comments

Comments
 (0)