pcastonguay
diff --git a/‎TensorFlow/LanguageModeling/BERT/.gitignore‎
Lines changed: 4 additions & 0 deletions b/‎TensorFlow/LanguageModeling/BERT/.gitignore‎
Lines changed: 4 additions & 0 deletions
diff --git a/‎TensorFlow/LanguageModeling/BERT/.gitmodules‎
Lines changed: 0 additions & 4 deletions b/‎TensorFlow/LanguageModeling/BERT/.gitmodules‎
Lines changed: 0 additions & 4 deletions
diff --git a/‎TensorFlow/LanguageModeling/BERT/Dockerfile‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow/LanguageModeling/BERT/Dockerfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow/LanguageModeling/BERT/README.md‎
Lines changed: 13 additions & 5 deletions b/‎TensorFlow/LanguageModeling/BERT/README.md‎
Lines changed: 13 additions & 5 deletions
diff --git a/‎TensorFlow/LanguageModeling/BERT/biobert/README.md‎
Lines changed: 12 additions & 7 deletions b/‎TensorFlow/LanguageModeling/BERT/biobert/README.md‎
Lines changed: 12 additions & 7 deletions
diff --git a/‎TensorFlow/LanguageModeling/BERT/biobert/re_eval.py‎
Lines changed: 4 additions & 3 deletions b/‎TensorFlow/LanguageModeling/BERT/biobert/re_eval.py‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/biobert_finetune_inference_benchmark.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/biobert_finetune_inference_benchmark.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/biobert_finetune_train_benchmark.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/biobert_finetune_train_benchmark.sh‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/rel_chemprot.sh‎
Lines changed: 2 additions & 2 deletions b/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/rel_chemprot.sh‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/run_biobert_finetuning_inference.sh‎
Lines changed: 1 addition & 1 deletion b/‎TensorFlow/LanguageModeling/BERT/biobert/scripts/run_biobert_finetuning_inference.sh‎
Lines changed: 1 addition & 1 deletion
@@ -127,3 +127,7 @@ dmypy.json
 
 # Pyre type checker
 .pyre/
+
+# TensorRT
+*.engine
+models/
@@ -1,4 +1,4 @@
-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:19.08-py3
+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:19.10-py3
 
 FROM ${FROM_IMAGE_NAME}
 
 
@@ -27,6 +27,7 @@ This repository provides a script and recipe to train the BERT model for TensorF
     * [Fine tuning](#fine-tuning)
     * [Multi-node](#multi-node)
   * [Inference process](#inference-process)
+  * [Inference Process With TensorRT](#inference-process-with-tensorrt)
   * [Deploying the BERT model using TensorRT Inference Server](#deploying-the-bert-model-using-tensorrt-inference-server)
   * [BioBERT](#biobert)
 - [Performance](#performance)
@@ -615,6 +616,9 @@ I0312 23:14:00.550973 140287431493376 run_squad.py:1397] 0 Inference Performance
 {"exact_match": 83.69914853358561, "f1": 90.8477003317459}
 ```
 
+### Inference Process With TensorRT
+NVIDIA TensorRT is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high-throughput for deep learning inference applications. More information on how to perform inference using TensorRT can be found in the subfolder [./trt/README.md](trt/README.md)
+
 ### Deploying the BERT model using TensorRT Inference Server
 
 The [NVIDIA TensorRT Inference Server](https://github.com/NVIDIA/tensorrt-inference-server) provides a datacenter and cloud inferencing solution optimized for NVIDIA GPUs. The server provides an inference service via an HTTP or gRPC endpoint, allowing remote clients to request inferencing for any number of GPU or CPU models being managed by the server. More information on how to perform inference using `TensorRT Inference Server` can be found in the subfolder `./trtis/README.md`.
@@ -675,6 +679,7 @@ Our results were obtained by running the `scripts/run_pretraining_lamb.sh` train
 
 | **DGX System** | **Nodes** | **Precision** | **Batch Size/GPU: Phase1, Phase2** | **Accumulation Steps: Phase1, Phase2** | **Time to Train (Hrs)** | **Final Loss** |
 |----------------|-----------|---------------|------------------------------------|----------------------------------------|----------------|-------------------------|
+| DGX1  | 1  | FP16 | 16, 4 |512,1024| 299.86| 1.67 |
 | DGX1  | 4  | FP16 | 16, 4 |128, 256| 62.49 | 1.72 |
 | DGX1  | 16 | FP16 | 16, 4 | 32, 64 | 16.58 | 1.76 |
 | DGX1  | 32 | FP16 | 16, 2 | 16, 64 | 9.85  | 1.71 |
@@ -728,7 +733,7 @@ The following tables compare `F1` scores across 5 different training runs with d
 
 ###### Pre-training training performance: single-node on 16G
 
-Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput.
+Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
 
 
 | **GPUs** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
@@ -744,7 +749,7 @@ Note: The respective values for FP32 runs that use a batch size of 16, 4 in sequ
 
 ###### Pre-training training performance: multi-node on 16G
 
-Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput.
+Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-1 with 8x V100 16G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
 
 | **Nodes** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
 |:-------:|:-----:|:-------:|:-------:|:-------:|:-------------:|:------:|:------:|
@@ -777,7 +782,7 @@ To achieve these same results, follow the [Quick Start Guide](#quick-start-guide
 
 ###### Pre-training training performance: single-node on 32G
 
-Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput.
+Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-1 with 8x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
 
 | **GPUs** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
 |:-------:|:-----:|:-------:|:-------:|:-------:|:-------------:|:------:|:------:|
@@ -809,7 +814,7 @@ To achieve these same results, follow the [Quick Start Guide](#quick-start-guide
 
 ###### Pre-training training performance: single-node on DGX-2 32G
 
-Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput.
+Our results were obtained by running the `scripts/run_pretraining_lamb.sh` training script in the TensorFlow 19.08-py3 NGC container on NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
 
 | **GPUs** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
 |:-------:|:-----:|:-------:|:-------:|:-------:|:-------------:|:------:|:------:|
@@ -826,7 +831,7 @@ Note: The respective values for FP32 runs that use a batch size of 48, 8 in sequ
 
 ###### Pre-training training performance: multi-node on DGX-2H 32G
 
-Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput.
+Our results were obtained by running the `run.sub` training script in the TensorFlow 19.08-py3 NGC container using multiple NVIDIA DGX-2 with 16x V100 32G GPUs. Performance (in sentences per second) is the steady state throughput with number of accumulation steps set to 1.
 
 
 | **Nodes** | **Sequence Length**| **Batch size / GPU: mixed precision, FP32** | **Throughput - mixed precision** | **Throughput - FP32** | **Throughput speedup (FP32 to mixed precision)** | **Weak scaling - mixed precision** | **Weak scaling - FP32** |
@@ -1131,6 +1136,9 @@ To achieve these same results, follow the [Quick Start Guide](#quick-start-guide
 
 ### Changelog
 
+Janurary 2020
+- Added inference with TensorRT
+
 November 2019
 - Pre-training and Finetuning on BioMedical tasks and corpus
 
 
@@ -85,9 +85,10 @@ To download and preprocess pre-training data as well as the required vocab files
 bash biobert/scripts/biobert_data_download.sh
 ```
 
-Datasets for finetuning can be obtained from this [repository](https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1)
+Datasets for finetuning for NER can be obtained from this (repository)[https://github.com/ncbi-nlp/BLUE_Benchmark/releases/tag/0.1]
+Datasets for finetuning for RE can be obtained from this (repository)[https://github.com/arwhirang/recursive_chemprot/tree/master/Demo/tree_LSTM/data]
 
-Place them in `/workspace/bert/data/biobert/` to be automatically picked up by our scripts.
+Place them both in `/workspace/bert/data/biobert/` to be automatically picked up by our scripts.
 
 4. Start an interactive session in the NGC container to run training/inference.
 
@@ -431,11 +432,15 @@ Our results were obtained by running the `scripts/run_pretraining_lamb.sh` train
 
 | **DGX System** | **Nodes** | **Precision** | **Batch Size/GPU: Phase1, Phase2** | **Accumulation Steps: Phase1, Phase2** | **Time to Train (Hrs)** | **Final Loss** |
 |----------------|-----------|---------------|------------------------------------|----------------------------------------|----------------|-------------------------|
-| DGX2H | 4 | FP16 | 128, 16 | 8, 32 | 19.14 | 0.88 |
-| DGX2H | 16 | FP16 | 128, 16 | 2, 8 | 4.81  | 0.86 |
-| DGX2H | 32 | FP16 | 128, 16 | 1, 4 | 2.65  | 0.87 |
-
-#### Fine-tuning accuracy
+| DGX2H | 4  | FP16 | 128, 16 | 8, 32 | 19.14  | 0.88 |
+| DGX2H | 16 | FP16 | 128, 16 | 2, 8  | 4.81   | 0.86 |
+| DGX2H | 32 | FP16 | 128, 16 | 1, 4  | 2.65   | 0.87 |
+| DGX1  | 1  | FP16 | 64, 8   |128,512| 174.58 | 0.87 |
+| DGX1  | 4  | FP16 | 64, 8   |32, 128| 57.71  | 0.85 |
+| DGX1  | 16 | FP16 | 64, 8   |8,  32 | 12.62  | 0.87 |
+| DGX1  | 32 | FP16 | 64, 8   |4,  16 | 6.97   | 0.87 |
+
+###### Fine-tuning accuracy
 
 | **Task** | **F1** | **Precision** | **Recall** |
 |:-------:|:----:|:----:|:----:|
 
@@ -12,7 +12,7 @@
 args = parser.parse_args()
 
 
-testdf = pd.read_csv(args.answer_path, sep="\t", index_col=0)
+testdf = pd.read_csv(args.answer_path, sep="\t", header=None)
 preddf = pd.read_csv(args.output_path, sep="\t", header=None)
 
 
@@ -37,9 +37,10 @@
     pred_class = [np.argmax(v) for v in pred]
     str_to_int_mapper = dict()
 
-    for i,v in enumerate(sorted(testdf["label"].unique())):
+    testdf.iloc[:,3] = testdf.iloc[:, 3].fillna("False")
+    for i,v in enumerate(sorted(testdf.iloc[:,3].unique())):
         str_to_int_mapper[v] = i
-    test_answer = [str_to_int_mapper[v] for v in testdf["label"]]
+    test_answer = [str_to_int_mapper[v] for v in testdf.iloc[:,3]]
 
     p,r,f,s = sklearn.metrics.precision_recall_fscore_support(y_pred=pred_class, y_true=test_answer, labels=[0,1,2,3,4], average="micro")
     results = dict()
 
@@ -134,7 +134,7 @@ elif [ "$task" = "ner_bc5cdr-disease" ] ; then
     done
 
 elif [ "$task" = "rel_chemprot" ] ; then
-  DATASET_DIR=/workspace/bert/data/biobert/ChemProt
+  DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
 
   LOGFILE="${OUTPUT_DIR}/${task}_training_benchmark_bert_${bert_model}.log"
 
 
@@ -150,7 +150,7 @@ elif [ "$task" = "ner_bc5cdr-disease" ] ; then
     done
 
 elif [ "$task" = "rel_chemprot" ] ; then
-  DATASET_DIR=/workspace/bert/data/biobert/ChemProt
+  DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
   LOGFILE="${OUTPUT_DIR}/${task}_training_benchmark_bert_${bert_model}_gpu_${num_gpu}.log"
 
     echo "Training performance benchmarking for BERT $bert_model from $BERT_DIR" >> $LOGFILE
 
@@ -3,7 +3,7 @@
 echo "Container nvidia build = " $NVIDIA_BUILD_ID
 
 init_checkpoint=${1:-"/results/biobert_tf_uncased_base/model.ckpt-4340"}
-train_batch_size=${2:-64}
+train_batch_size=${2:-8}
 learning_rate=${3:-1.5e-6}
 cased=${4:-false}
 precision=${5:-"fp16"}
@@ -35,7 +35,7 @@ printf -v TAG "tf_bert_biobert_rel_chemprot_%s_%s_gbs%d" "$bert_model" "$precisi
 DATESTAMP=`date +'%y%m%d%H%M%S'`
 
 
-DATASET_DIR=/workspace/bert/data/biobert/ChemProt
+DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
 OUTPUT_DIR=/results/${TAG}_${DATESTAMP}
 mkdir -p ${OUTPUT_DIR}
 
 
@@ -94,7 +94,7 @@ elif [ "$task" = "ner_bc5cdr-disease" ] ; then
 
 elif [ "$task" = "rel_chemprot" ] ; then
   printf -v TAG "tf_bert_biobert_rel_chemprot_inference_%s_%s_" "$bert_model" "$precision"
-  DATASET_DIR=/workspace/bert/data/biobert/ChemProt
+  DATASET_DIR=/workspace/bert/data/biobert/chemprot-data_treeLSTM
   OUTPUT_DIR=/results/${TAG}_${DATESTAMP}
 
   python3 /workspace/bert/run_re.py \
Original file line number	Diff line number	Diff line change
`@@ -1,4 +1,4 @@`
`1`		`-ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:19.08-py3`
	`1`	`+ARG FROM_IMAGE_NAME=nvcr.io/nvidia/tensorflow:19.10-py3`
`2`	`2`
`3`	`3`	`FROM ${FROM_IMAGE_NAME}`
`4`	`4`