-
Notifications
You must be signed in to change notification settings - Fork 220
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds the container packages for BERT large inference and training (Py…
…Torch SPR) (#83) * Add specs, docs, and quickstarts for BERT inference and training * Add build and run scripts * Update mount paths * update base FROM * Update spec to add quickstarts * update wrapper to include run.sh * Update path * Update pip install -y * Update bert installs * Regenerate dockerfile * Update dockerfile for bert train * Update installs * Doc updates * Update dockerfile and run after testing training * remove bert inf files from dockerfile * Small doc updates * Add shm-size 8G * Fix error message * Fix env var usages in build.sh * Regenerate dockerfiles * update conda activate partial * Add build tools * quickstart script updates * Clarify dataset download instructions and switch CHECKPOINT_DIR to CONFIG_FILE * Update quickstart and docs to have phase 2 use checkpoints from phase 1 * Fix script
- Loading branch information
Showing
32 changed files
with
1,269 additions
and
1 deletion.
There are no files selected for viewing
93 changes: 93 additions & 0 deletions
93
dockerfiles/pytorch/pytorch-spr-bert-large-inference.Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# Copyright (c) 2020-2021 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================ | ||
# | ||
# THIS IS A GENERATED DOCKERFILE. | ||
# | ||
# This file was assembled from multiple pieces, whose use is documented | ||
# throughout. Please refer to the TensorFlow dockerfiles documentation | ||
# for more information. | ||
|
||
ARG PYTORCH_IMAGE="model-zoo" | ||
ARG PYTORCH_TAG="pytorch-ipex-spr" | ||
|
||
FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch | ||
|
||
RUN yum --enablerepo=extras install -y epel-release && \ | ||
yum install -y \ | ||
ca-certificates \ | ||
git \ | ||
wget \ | ||
make \ | ||
cmake \ | ||
gcc-c++ \ | ||
gcc \ | ||
autoconf \ | ||
bzip2 \ | ||
tar | ||
|
||
RUN source activate pytorch && \ | ||
pip install matplotlib Pillow pycocotools && \ | ||
pip install yacs opencv-python cityscapesscripts transformers && \ | ||
conda install -y libopenblas && \ | ||
mkdir -p /workspace/installs && \ | ||
cd /workspace/installs && \ | ||
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \ | ||
tar -xzf gperftools-2.7.90.tar.gz && \ | ||
cd gperftools-2.7.90 && \ | ||
./configure --prefix=$HOME/.local && \ | ||
make && \ | ||
make install && \ | ||
rm -rf /workspace/installs/ | ||
|
||
ARG PACKAGE_DIR=model_packages | ||
|
||
ARG PACKAGE_NAME="pytorch-spr-bert-large-inference" | ||
|
||
ARG MODEL_WORKSPACE | ||
|
||
# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID | ||
# this allows the default user (root) to work in k8s single-node, multi-node | ||
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE} | ||
|
||
ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE} | ||
|
||
RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x | ||
|
||
WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME} | ||
|
||
ARG BERT_DIR="/workspace/pytorch-spr-bert-large-inference/models/bert" | ||
|
||
RUN source activate pytorch && \ | ||
cd ${BERT_DIR} && \ | ||
cd bert && \ | ||
pip install -r examples/requirements.txt && \ | ||
pip install -e . && \ | ||
conda install -c conda-forge "llvm-openmp" | ||
|
||
FROM intel-optimized-pytorch AS release | ||
COPY --from=intel-optimized-pytorch /root/conda /root/conda | ||
COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/ | ||
COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/ | ||
|
||
ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX" | ||
|
||
ENV PATH="~/conda/bin:${PATH}" | ||
ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD" | ||
ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000" | ||
ENV BASH_ENV=/root/.bash_profile | ||
WORKDIR /workspace/ | ||
RUN yum install -y numactl mesa-libGL && \ | ||
yum clean all && \ | ||
echo "source activate pytorch" >> /root/.bash_profile |
99 changes: 99 additions & 0 deletions
99
dockerfiles/pytorch/pytorch-spr-bert-large-training.Dockerfile
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
# Copyright (c) 2020-2021 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
# ============================================================================ | ||
# | ||
# THIS IS A GENERATED DOCKERFILE. | ||
# | ||
# This file was assembled from multiple pieces, whose use is documented | ||
# throughout. Please refer to the TensorFlow dockerfiles documentation | ||
# for more information. | ||
|
||
ARG PYTORCH_IMAGE="model-zoo" | ||
ARG PYTORCH_TAG="pytorch-ipex-spr" | ||
|
||
FROM ${PYTORCH_IMAGE}:${PYTORCH_TAG} AS intel-optimized-pytorch | ||
|
||
RUN yum --enablerepo=extras install -y epel-release && \ | ||
yum install -y \ | ||
ca-certificates \ | ||
git \ | ||
wget \ | ||
make \ | ||
cmake \ | ||
gcc-c++ \ | ||
gcc \ | ||
autoconf \ | ||
bzip2 \ | ||
tar | ||
|
||
RUN source activate pytorch && \ | ||
pip install matplotlib Pillow pycocotools && \ | ||
pip install yacs opencv-python cityscapesscripts transformers && \ | ||
conda install -y libopenblas && \ | ||
mkdir -p /workspace/installs && \ | ||
cd /workspace/installs && \ | ||
wget https://github.com/gperftools/gperftools/releases/download/gperftools-2.7.90/gperftools-2.7.90.tar.gz && \ | ||
tar -xzf gperftools-2.7.90.tar.gz && \ | ||
cd gperftools-2.7.90 && \ | ||
./configure --prefix=$HOME/.local && \ | ||
make && \ | ||
make install && \ | ||
rm -rf /workspace/installs/ | ||
|
||
ARG PACKAGE_DIR=model_packages | ||
|
||
ARG PACKAGE_NAME="pytorch-spr-bert-large-training" | ||
|
||
ARG MODEL_WORKSPACE | ||
|
||
# ${MODEL_WORKSPACE} and below needs to be owned by root:root rather than the current UID:GID | ||
# this allows the default user (root) to work in k8s single-node, multi-node | ||
RUN umask 002 && mkdir -p ${MODEL_WORKSPACE} && chgrp root ${MODEL_WORKSPACE} && chmod g+s+w,o+s+r ${MODEL_WORKSPACE} | ||
|
||
ADD --chown=0:0 ${PACKAGE_DIR}/${PACKAGE_NAME}.tar.gz ${MODEL_WORKSPACE} | ||
|
||
RUN chown -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chgrp -R root ${MODEL_WORKSPACE}/${PACKAGE_NAME} && chmod -R g+s+w ${MODEL_WORKSPACE}/${PACKAGE_NAME} && find ${MODEL_WORKSPACE}/${PACKAGE_NAME} -type d | xargs chmod o+r+x | ||
|
||
WORKDIR ${MODEL_WORKSPACE}/${PACKAGE_NAME} | ||
|
||
ARG BERT_DIR="/workspace/pytorch-spr-bert-large-training/models/bert/bert" | ||
|
||
RUN source activate pytorch && \ | ||
cd ${BERT_DIR} && \ | ||
pip install --upgrade pip && \ | ||
pip install -r examples/requirements.txt && \ | ||
pip install -e . && \ | ||
pip install datasets accelerate tfrecord && \ | ||
conda install openblas && \ | ||
conda install faiss-cpu -c pytorch && \ | ||
pip install transformers==4.9.0 | ||
|
||
RUN cd .. && \ | ||
rm -rf ${BERT_DIR} | ||
|
||
FROM intel-optimized-pytorch AS release | ||
COPY --from=intel-optimized-pytorch /root/conda /root/conda | ||
COPY --from=intel-optimized-pytorch /workspace/lib/ /workspace/lib/ | ||
COPY --from=intel-optimized-pytorch /root/.local/ /root/.local/ | ||
|
||
ENV DNNL_MAX_CPU_ISA="AVX512_CORE_AMX" | ||
|
||
ENV PATH="~/conda/bin:${PATH}" | ||
ENV LD_PRELOAD="/workspace/lib/jemalloc/lib/libjemalloc.so:$LD_PRELOAD" | ||
ENV MALLOC_CONF="oversize_threshold:1,background_thread:true,metadata_thp:auto,dirty_decay_ms:9000000000,muzzy_decay_ms:9000000000" | ||
ENV BASH_ENV=/root/.bash_profile | ||
WORKDIR /workspace/ | ||
RUN yum install -y numactl mesa-libGL && \ | ||
yum clean all && \ | ||
echo "source activate pytorch" >> /root/.bash_profile |
26 changes: 26 additions & 0 deletions
26
...art/language_modeling/pytorch/bert_large/inference/cpu/.docs/container_build.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
## Build the container | ||
|
||
The <model name> <mode> package has scripts and a Dockerfile that are | ||
used to build a workload container that runs the model. This container | ||
uses the PyTorch/IPEX container as it's base, so ensure that you have built | ||
the `pytorch-ipex-spr.tar.gz` container prior to building this model container. | ||
|
||
Use `docker images` to verify that you have the base container built. For example: | ||
``` | ||
$ docker images | grep pytorch-ipex-spr | ||
model-zoo pytorch-ipex-spr fecc7096a11e 40 minutes ago 8.31GB | ||
``` | ||
|
||
To build the <model name> <mode> container, extract the package and | ||
run the `build.sh` script. | ||
``` | ||
# Extract the package | ||
tar -xzf <package name> | ||
cd <package dir> | ||
# Build the container | ||
./build.sh | ||
``` | ||
|
||
After the build completes, you should have a container called | ||
`<docker image>` that will be used to run the model. |
5 changes: 5 additions & 0 deletions
5
quickstart/language_modeling/pytorch/bert_large/inference/cpu/.docs/description.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
<!-- 10. Description --> | ||
## Description | ||
|
||
This document has instructions for running <model name> <mode> using | ||
Intel-optimized PyTorch. |
30 changes: 30 additions & 0 deletions
30
quickstart/language_modeling/pytorch/bert_large/inference/cpu/.docs/docker_spr.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
## Run the model | ||
|
||
Download the pretrained model from huggingface and set the `PRETRAINED_MODEL` environment | ||
variable to point to the downloaded file. | ||
``` | ||
wget https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin -O pytorch_model.bin | ||
export PRETRAINED_MODEL=$(pwd)/pytorch_model.bin | ||
``` | ||
|
||
Once you have the pretarined model and have [built the container](#build-the-container), | ||
use the `run.sh` script from the container package to run <model name> <mode> in docker. | ||
Set environment variables to specify the precision to run, and an output directory. | ||
By default, the `run.sh` script will run the `inference_realtime.sh` quickstart script. | ||
To run a different script, specify the name of the script using the `SCRIPT` environment | ||
variable. | ||
``` | ||
# Navigate to the container package directory | ||
cd <package dir> | ||
# Set the required environment vars | ||
export PRETRAINED_MODEL=<path to the downloaded model> | ||
export PRECISION=<specify the precision to run> | ||
export OUTPUT_DIR=<directory where log files will be written> | ||
# Run the container with inference_realtime.sh quickstart script | ||
./run.sh | ||
# Use the SCRIPT env var to run a different quickstart script | ||
SCRIPT=accuracy.sh ./run.sh | ||
``` |
4 changes: 4 additions & 0 deletions
4
quickstart/language_modeling/pytorch/bert_large/inference/cpu/.docs/license.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
<!--- 80. License --> | ||
## License | ||
|
||
Licenses can be found in the model package, in the `licenses` directory. |
8 changes: 8 additions & 0 deletions
8
quickstart/language_modeling/pytorch/bert_large/inference/cpu/.docs/quickstart.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
<!--- 40. Quick Start Scripts --> | ||
## Quick Start Scripts | ||
|
||
| Script name | Description | | ||
|-------------|-------------| | ||
| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance for the specified precision (fp32, int8 or bf16) using the [huggingface pretrained model](https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin). | | ||
| `inference_throughput.sh` | Runs multi instance batch inference using 1 instance per socket for the specified precision (fp32, int8 or bf16) using the [huggingface pretrained model](https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin). | | ||
| `accuracy.sh` | Measures the inference accuracy for the specified precision (fp32, int8 or bf16) using the [huggingface pretrained model](https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin). | |
2 changes: 2 additions & 0 deletions
2
quickstart/language_modeling/pytorch/bert_large/inference/cpu/.docs/title.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
<!--- 0. Title --> | ||
# PyTorch <model name> <mode> |
16 changes: 16 additions & 0 deletions
16
...art/language_modeling/pytorch/bert_large/inference/cpu/.docs/wrapper_package.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
## Model Package | ||
|
||
The model package includes the Dockerfile and scripts needed to build and | ||
run <model name> <mode> in a container. | ||
``` | ||
<package dir> | ||
├── README.md | ||
├── build.sh | ||
├── licenses | ||
│ ├── LICENSE | ||
│ └── third_party | ||
├── model_packages | ||
│ └── <package name> | ||
├── <package dir>.Dockerfile | ||
└── run.sh | ||
``` |
98 changes: 98 additions & 0 deletions
98
quickstart/language_modeling/pytorch/bert_large/inference/cpu/README_SPR.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,98 @@ | ||
<!--- 0. Title --> | ||
# PyTorch BERT Large inference | ||
|
||
<!-- 10. Description --> | ||
## Description | ||
|
||
This document has instructions for running BERT Large inference using | ||
Intel-optimized PyTorch. | ||
|
||
## Model Package | ||
|
||
The model package includes the Dockerfile and scripts needed to build and | ||
run BERT Large inference in a container. | ||
``` | ||
pytorch-spr-bert-large-inference | ||
├── README.md | ||
├── build.sh | ||
├── licenses | ||
│ ├── LICENSE | ||
│ └── third_party | ||
├── model_packages | ||
│ └── pytorch-spr-bert-large-inference.tar.gz | ||
├── pytorch-spr-bert-large-inference.Dockerfile | ||
└── run.sh | ||
``` | ||
|
||
<!--- 40. Quick Start Scripts --> | ||
## Quick Start Scripts | ||
|
||
| Script name | Description | | ||
|-------------|-------------| | ||
| `inference_realtime.sh` | Runs multi instance realtime inference using 4 cores per instance for the specified precision (fp32, int8 or bf16) using the [huggingface pretrained model](https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin). | | ||
| `inference_throughput.sh` | Runs multi instance batch inference using 1 instance per socket for the specified precision (fp32, int8 or bf16) using the [huggingface pretrained model](https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin). | | ||
| `accuracy.sh` | Measures the inference accuracy for the specified precision (fp32, int8 or bf16) using the [huggingface pretrained model](https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin). | | ||
|
||
## Build the container | ||
|
||
The BERT Large inference package has scripts and a Dockerfile that are | ||
used to build a workload container that runs the model. This container | ||
uses the PyTorch/IPEX container as it's base, so ensure that you have built | ||
the `pytorch-ipex-spr.tar.gz` container prior to building this model container. | ||
|
||
Use `docker images` to verify that you have the base container built. For example: | ||
``` | ||
$ docker images | grep pytorch-ipex-spr | ||
model-zoo pytorch-ipex-spr fecc7096a11e 40 minutes ago 8.31GB | ||
``` | ||
|
||
To build the BERT Large inference container, extract the package and | ||
run the `build.sh` script. | ||
``` | ||
# Extract the package | ||
tar -xzf pytorch-spr-bert-large-inference.tar.gz | ||
cd pytorch-spr-bert-large-inference | ||
# Build the container | ||
./build.sh | ||
``` | ||
|
||
After the build completes, you should have a container called | ||
`model-zoo:pytorch-bert-large-inference` that will be used to run the model. | ||
|
||
## Run the model | ||
|
||
Download the pretrained model from huggingface and set the `PRETRAINED_MODEL` environment | ||
variable to point to the downloaded file. | ||
``` | ||
wget https://cdn.huggingface.co/bert-large-uncased-whole-word-masking-finetuned-squad-pytorch_model.bin -O pytorch_model.bin | ||
export PRETRAINED_MODEL=$(pwd)/pytorch_model.bin | ||
``` | ||
|
||
Once you have the pretarined model and have [built the container](#build-the-container), | ||
use the `run.sh` script from the container package to run BERT Large inference in docker. | ||
Set environment variables to specify the precision to run, and an output directory. | ||
By default, the `run.sh` script will run the `inference_realtime.sh` quickstart script. | ||
To run a different script, specify the name of the script using the `SCRIPT` environment | ||
variable. | ||
``` | ||
# Navigate to the container package directory | ||
cd pytorch-spr-bert-large-inference | ||
# Set the required environment vars | ||
export PRETRAINED_MODEL=<path to the downloaded model> | ||
export PRECISION=<specify the precision to run> | ||
export OUTPUT_DIR=<directory where log files will be written> | ||
# Run the container with inference_realtime.sh quickstart script | ||
./run.sh | ||
# Use the SCRIPT env var to run a different quickstart script | ||
SCRIPT=accuracy.sh ./run.sh | ||
``` | ||
|
||
<!--- 80. License --> | ||
## License | ||
|
||
Licenses can be found in the model package, in the `licenses` directory. | ||
|
Oops, something went wrong.