This document has instructions for running BERT Base SQuAD1.1 inference using Intel-optimized PyTorch.
Follow link to install Conda and build Pytorch, IPEX, TorchVison Jemalloc and TCMalloc.
-
Install Intel OpenMP
conda install intel-openmp
-
Install datasets
pip install datasets
-
Set ENV to use AMX if you are using SPR
export DNNL_MAX_CPU_ISA=AVX512_CORE_AMX
DataType | Throughput | Latency | Accuracy |
---|---|---|---|
FP32 | bash run_multi_instance_throughput.sh fp32 | bash run_multi_instance_realtime.sh fp32 | bash run_accuracy.sh fp32 |
BF16 | bash run_multi_instance_throughput.sh bf16 | bash run_multi_instance_realtime.sh bf16 | bash run_accuracy.sh bf16 |
Follow the instructions above to setup your bare metal environment, download and preprocess the dataset, and do the model specific setup. Once all the setup is done, the Model Zoo can be used to run a quickstart script. Ensure that you have an enviornment variable set to point to an output directory.
# Clone the model zoo repo and set the MODEL_DIR
git clone https://github.com/IntelAI/models.git
cd models
export MODEL_DIR=$(pwd)
# Clone the Transformers repo in the BERT base inference directory
cd quickstart/language_modeling/pytorch/bert_base/inference/cpu
git clone https://github.com/huggingface/transformers.git
cd transformers
git checkout v4.10.0
git apply ../enable_ipex_for_bert-base.diff
pip install -e ./
cd ..
# Env vars
export OUTPUT_DIR=<path to an output directory>
# Run a quickstart script (for example, FP32 multi-instance realtime inference)
bash run_multi_instance_realtime.sh fp32