Skip to content

Commit

Permalink
Upgrade to pytorch 2.0.1 conda package (nv-morpheus#1015)
Browse files Browse the repository at this point in the history
- Update docker containers and conda recipe/environments to install pytorch 2.0.1 conda package
- Also update torch version for training-tuning scripts/notebooks

Closes nv-morpheus#1008

Authors:
  - Eli Fajardo (https://github.com/efajardo-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: nv-morpheus#1015
  • Loading branch information
efajardo-nv authored Jul 5, 2023
1 parent 99134c7 commit f71b570
Show file tree
Hide file tree
Showing 13 changed files with 64 additions and 133 deletions.
2 changes: 2 additions & 0 deletions ci/conda/recipes/morpheus/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,8 @@ outputs:
- pluggy 1.0.*
- pyarrow * *_cuda # Ensure we get a CUDA build. Version determined by cuDF
- python
- pytorch 2.0.1
- pytorch-cuda
- scikit-learn 1.2.2.*
- tqdm 4.*
- typing_utils 0.1.*
Expand Down
1 change: 1 addition & 0 deletions ci/conda/recipes/run_conda_build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,7 @@ CONDA_ARGS_ARRAY+=("-c" "${CONDA_CHANNEL_ALIAS:+"${CONDA_CHANNEL_ALIAS%/}/"}rapi
CONDA_ARGS_ARRAY+=("-c" "${CONDA_CHANNEL_ALIAS:+"${CONDA_CHANNEL_ALIAS%/}/"}nvidia/label/cuda-11.8.0")
CONDA_ARGS_ARRAY+=("-c" "${CONDA_CHANNEL_ALIAS:+"${CONDA_CHANNEL_ALIAS%/}/"}nvidia")
CONDA_ARGS_ARRAY+=("-c" "${CONDA_CHANNEL_ALIAS:+"${CONDA_CHANNEL_ALIAS%/}/"}nvidia/label/dev")
CONDA_ARGS_ARRAY+=("-c" "pytorch")
CONDA_ARGS_ARRAY+=("-c" "conda-forge")

if hasArg morpheus; then
Expand Down
1 change: 1 addition & 0 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -244,6 +244,7 @@ RUN --mount=type=bind,from=conda_bld_morpheus,source=/opt/conda/conda-bld,target
-c nvidia/label/cuda-11.8.0 \
-c nvidia/label/dev \
-c nvidia \
-c pytorch \
-c conda-forge morpheus &&\
# Install runtime dependencies that are pip-only
/opt/conda/bin/mamba env update -n morpheus --file docker/conda/environments/cuda${CUDA_MAJOR_VER}.${CUDA_MINOR_VER}_runtime.yml
Expand Down
3 changes: 3 additions & 0 deletions docker/conda/environments/cuda11.8_dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ channels:
- nvidia/label/cuda-11.8.0
- nvidia
- nvidia/label/dev # For pre-releases of MRC. Should still default to full releases if available
- pytorch
- conda-forge
dependencies:
####### Morpheus Dependencies (keep sorted!) #######
Expand Down Expand Up @@ -82,6 +83,8 @@ dependencies:
- python-confluent-kafka=1.7.0
- python-graphviz
- python=3.10
- pytorch=2.0.1
- pytorch-cuda
- rapidjson=1.1.0
- scikit-build=0.17.1
- scikit-learn=1.2.2
Expand Down
5 changes: 0 additions & 5 deletions docker/conda/environments/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,7 +1,3 @@
# Note, to include this when using setup.py or pip, set the variable:
# PIP_FIND_LINKS=https://download.pytorch.org/whl/cu116/torch_stable.html
--find-links https://download.pytorch.org/whl/cu116/torch_stable.html

####### Pip-only runtime dependencies (keep sorted!) #######
# Packages listed here should also be listed in setup.py
ipywidgets
Expand All @@ -10,6 +6,5 @@ jupyterlab
nvidia-pyindex
# Duplicated in conda dev to ensure parity with libprotobuf
protobuf==4.21.*
torch==1.13.1+cu116
tritonclient[all]==2.17.*
websockets
Original file line number Diff line number Diff line change
Expand Up @@ -113,56 +113,59 @@
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4655</th>\n",
" <th>4812</th>\n",
" <td>&lt;NA&gt;</td>\n",
" <td>&lt;NA&gt;</td>\n",
" <td>193.106.31.130 - - [11/Aug/2019:19:54:28 +0200...</td>\n",
" <td>193.106.31.130</td>\n",
" <td>85.25.236.93 - - [14/Jan/2017:15:16:03 +0100] ...</td>\n",
" <td>85.25.236.93</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" <td>-</td>\n",
" <td>Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ...</td>\n",
" <td>IE</td>\n",
" <td>Windows</td>\n",
" <td>Vista</td>\n",
" <td>1.0</td>\n",
" <td>POST</td>\n",
" <td>/administrator/index.php</td>\n",
" <td>4481</td>\n",
" <td>http://www.almhuette-raith.at/index.php?option...</td>\n",
" <td>Mozilla/5.0 (X11; U; Linux Core i7-4980HQ; de;...</td>\n",
" <td>JobboerseBot</td>\n",
" <td>Linux</td>\n",
" <td>&lt;NA&gt;</td>\n",
" <td>1.1</td>\n",
" <td>GET</td>\n",
" <td>/images/stories/slideshow/almhuette_raith_03.jpg</td>\n",
" <td>87782</td>\n",
" <td>200</td>\n",
" <td>[11/Aug/2019:19:54:28 +0200]</td>\n",
" <td>[14/Jan/2017:15:16:03 +0100]</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" error_level error_message \\\n",
"4655 <NA> <NA> \n",
"4812 <NA> <NA> \n",
"\n",
" raw remote_host \\\n",
"4655 193.106.31.130 - - [11/Aug/2019:19:54:28 +0200... 193.106.31.130 \n",
" raw remote_host \\\n",
"4812 85.25.236.93 - - [14/Jan/2017:15:16:03 +0100] ... 85.25.236.93 \n",
"\n",
" remote_logname remote_user request_header_referer \\\n",
"4655 - - - \n",
" remote_logname remote_user \\\n",
"4812 - - \n",
"\n",
" request_header_referer \\\n",
"4812 http://www.almhuette-raith.at/index.php?option... \n",
"\n",
" request_header_user_agent \\\n",
"4655 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT ... \n",
"4812 Mozilla/5.0 (X11; U; Linux Core i7-4980HQ; de;... \n",
"\n",
" request_header_user_agent__browser__family \\\n",
"4655 IE \n",
"4812 JobboerseBot \n",
"\n",
" request_header_user_agent__os__family \\\n",
"4655 Windows \n",
"4812 Linux \n",
"\n",
" request_header_user_agent__os__version_string request_http_ver \\\n",
"4655 Vista 1.0 \n",
"4812 <NA> 1.1 \n",
"\n",
" request_method request_url response_bytes_clf status \\\n",
"4655 POST /administrator/index.php 4481 200 \n",
" request_method request_url \\\n",
"4812 GET /images/stories/slideshow/almhuette_raith_03.jpg \n",
"\n",
" time_received \n",
"4655 [11/Aug/2019:19:54:28 +0200] "
" response_bytes_clf status time_received \n",
"4812 87782 200 [14/Jan/2017:15:16:03 +0100] "
]
},
"execution_count": 3,
Expand Down Expand Up @@ -283,7 +286,7 @@
"metadata": {},
"outputs": [],
"source": [
"MAX_SEQ_LEN = 128\n",
"MAX_SEQ_LEN = 256\n",
"STRIDE = 12"
]
},
Expand Down Expand Up @@ -335,7 +338,7 @@
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/envs/morpheus/lib/python3.8/site-packages/cudf/core/subword_tokenizer.py:189: UserWarning: When truncation is not True, the behavior currently differs from HuggingFace as cudf always returns overflowing tokens\n",
"/opt/conda/envs/morpheus/lib/python3.10/site-packages/cudf/core/subword_tokenizer.py:189: UserWarning: When truncation is not True, the behavior currently differs from HuggingFace as cudf always returns overflowing tokens\n",
" warnings.warn(warning_msg)\n"
]
}
Expand Down Expand Up @@ -468,10 +471,10 @@
"name": "stderr",
"output_type": "stream",
"text": [
"Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.seq_relationship.bias', 'cls.predictions.transform.dense.weight']\n",
"Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.seq_relationship.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias']\n",
"- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).\n",
"- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).\n",
"Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias']\n",
"Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']\n",
"You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.\n"
]
}
Expand Down Expand Up @@ -520,44 +523,23 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Epoch: 50%|█████ | 1/2 [00:38<00:38, 38.73s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train loss: 0.2076284834630277\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"Epoch: 100%|██████████| 2/2 [01:17<00:00, 38.85s/it]"
"Epoch: 0%| | 0/2 [00:00<?, ?it/s]/opt/conda/envs/morpheus/lib/python3.10/site-packages/torch/nn/parallel/_functions.py:68: UserWarning: Was asked to gather along dimension 0, but all input tensors were scalars; will instead unsqueeze and return a vector.\n",
" warnings.warn('Was asked to gather along dimension 0, but all '\n",
"Epoch: 50%|█████ | 1/2 [00:38<00:38, 38.39s/it]"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train loss: 0.008250679652531925\n",
"CPU times: user 1min 16s, sys: 896 ms, total: 1min 17s\n",
"Wall time: 1min 17s\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"\n"
"Train loss: 0.3851298079825938\n"
]
}
],
Expand Down Expand Up @@ -601,7 +583,7 @@
},
{
"cell_type": "code",
"execution_count": 22,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
Expand All @@ -611,62 +593,9 @@
},
{
"cell_type": "code",
"execution_count": 23,
"execution_count": null,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/envs/morpheus/lib/python3.8/site-packages/seqeval/metrics/sequence_labeling.py:171: UserWarning: [PAD] seems not to be NE tag.\n",
" warnings.warn('{} seems not to be NE tag.'.format(chunk))\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"f1 score: 0.997863\n",
"Accuracy score: 0.999263\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/envs/morpheus/lib/python3.8/site-packages/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples. Use `zero_division` parameter to control this behavior.\n",
" _warn_prf(average, modifier, msg_start, len(result))\n",
"/opt/conda/envs/morpheus/lib/python3.8/site-packages/seqeval/metrics/v1.py:57: UndefinedMetricWarning: Recall and F-score are ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.\n",
" _warn_prf(average, modifier, msg_start, len(result))\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" PAD] 0.000 0.000 0.000 0\n",
" error_level 1.000 1.000 1.000 90\n",
" error_message 1.000 1.000 1.000 90\n",
" remote_host 1.000 1.000 1.000 890\n",
" request_header_referer 1.000 0.996 0.998 476\n",
" request_header_user_agent 1.000 1.000 1.000 1005\n",
"request_header_user_agent__os__version_string 0.000 0.000 0.000 19\n",
" request_http_ver 1.000 1.000 1.000 890\n",
" request_method 1.000 1.000 1.000 890\n",
" request_url 1.000 0.990 0.995 890\n",
" response_bytes_clf 1.000 1.000 1.000 888\n",
" status 1.000 1.000 1.000 888\n",
" time_received 0.998 1.000 0.999 952\n",
"\n",
" micro avg 0.999 0.996 0.998 7968\n",
" macro avg 0.846 0.845 0.846 7968\n",
" weighted avg 0.997 0.996 0.997 7968\n",
"\n"
]
}
],
"outputs": [],
"source": [
"# Mapping id to label\n",
"id2label={label2id[key] : key for key in label2id.keys()}\n",
Expand Down Expand Up @@ -730,18 +659,16 @@
},
{
"cell_type": "code",
"execution_count": 24,
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"if torch.cuda.device_count() > 1:\n",
" model.module.config.id2label = id2label\n",
" model.module.config.label2id = label2id\n",
" model.module.save_pretrained('log_parsing_apache_morpheus')\n",
"else:\n",
" model.config.id2label = id2label\n",
" model.config.label2id = label2id\n",
" model.save_pretrained('log_parsing_apache_morpheus')"
" model = model.module\n",
"\n",
"model.config.id2label = id2label\n",
"model.config.label2id = label2id\n",
"model.save_pretrained('log_parsing_apache_morpheus')"
]
},
{
Expand All @@ -753,7 +680,7 @@
},
{
"cell_type": "code",
"execution_count": 25,
"execution_count": null,
"metadata": {
"tags": []
},
Expand All @@ -773,7 +700,7 @@
},
{
"cell_type": "code",
"execution_count": 26,
"execution_count": null,
"metadata": {
"tags": []
},
Expand All @@ -784,7 +711,7 @@
},
{
"cell_type": "code",
"execution_count": 27,
"execution_count": null,
"metadata": {
"tags": []
},
Expand Down Expand Up @@ -840,7 +767,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.15"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
onnx=1.14.0
seqeval==1.2.2
transformers==4.22.2
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
numpy==1.22.4
pandas==1.3.5
onnx=1.14.0
scikit_learn==1.1.3
tqdm==4.64.1
transformers==4.24.0
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
numpy==1.22.4
onnx==1.14.0
pandas==1.3.5
scikit_learn==1.1.3
tqdm==4.64.1
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,4 @@ matplotlib==3.4.2
numpy==1.22.0
pandas==1.0.1
scikit_learn==1.0.2
torch==1.13.1+cu116
torch==2.0.1+cu118
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ cudf==22.8.1
numpy==1.22.4
onnxruntime==1.13.1
scipy==1.9.1
torch==1.13.1+cu116
torch==2.0.1+cu118
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,4 @@ cudf==22.8.1
numpy==1.22.4
onnxruntime==1.13.1
scipy==1.9.1
torch==1.13.1+cu116
torch==2.0.1+cu118
Loading

0 comments on commit f71b570

Please sign in to comment.