Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ABP nvsmi sample data generation #1108

Merged
merged 11 commits into from
Aug 31, 2023
16 changes: 14 additions & 2 deletions examples/abp_nvsmi_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,18 @@ Each line in the output represents the GPU metrics at a single point in time. As

In this example we will be using the `examples/data/nvsmi.jsonlines` dataset that is known to contain mining behavior profiles. The dataset is in the `.jsonlines` format which means each new line represents a new JSON object. In order to parse this data, it must be ingested, split by lines into individual JSON objects, and parsed into cuDF dataframes. This will all be handled by Morpheus.

#### Generating your own dataset

This example can be easily applied to datasets generated from your own NVIDIA GPU devices. If NetQ is not deployed in your environment, the `nvsmi_data_extract.py` script is provided which uses [pyNVML](https://pypi.org/project/nvidia-ml-py/) and [pandas](https://pandas.pydata.org/) to generate data similar to NetQ. `pyNVML` contains the Python bindings for NVIDIA Management Library (NVML), the same library used by `nvidia-smi`.

`pyNVML` and `pandas` come already installed on the Morpheus release and development Docker images. Otherwise, they will need to be installed before running the script.

Run the following to start generating your dataset:
```
python nvsmi_data_extract.py
```
This will write a new entry to an output file named `nvsmi.jsonlines` once per second until you press Ctrl+C to exit.

## Pipeline Architecture

The pipeline we will be using in this example is a simple feed-forward linear pipeline where the data from each stage flows on to the next. Simple linear pipelines with no custom stages, like this example, can be configured via the Morpheus CLI or using the Python library. In this example we will be using the Morpheus CLI.
Expand Down Expand Up @@ -123,7 +135,7 @@ morpheus --log_level=DEBUG \
If successful, the following should be displayed:
```bash
Configuring Pipeline via CLI
Loaded columns. Current columns: [['nvidia_smi_log.gpu.pci.tx_util', 'nvidia_smi_log.gpu.pci.rx_util', 'nvidia_smi_log.gpu.fb_memory_usage.used', 'nvidia_smi_log.gpu.fb_memory_usage.free', 'nvidia_smi_log.gpu.bar1_memory_usage.total', 'nvidia_smi_log.gpu.bar1_memory_usage.used', 'nvidia_smi_log.gpu.bar1_memory_usage.free', 'nvidia_smi_log.gpu.utilization.gpu_util', 'nvidia_smi_log.gpu.utilization.memory_util', 'nvidia_smi_log.gpu.temperature.gpu_temp', 'nvidia_smi_log.gpu.temperature.gpu_temp_max_threshold', 'nvidia_smi_log.gpu.temperature.gpu_temp_slow_threshold', 'nvidia_smi_log.gpu.temperature.gpu_temp_max_gpu_threshold', 'nvidia_smi_log.gpu.temperature.memory_temp', 'nvidia_smi_log.gpu.temperature.gpu_temp_max_mem_threshold', 'nvidia_smi_log.gpu.power_readings.power_draw', 'nvidia_smi_log.gpu.clocks.graphics_clock', 'nvidia_smi_log.gpu.clocks.sm_clock', 'nvidia_smi_log.gpu.clocks.mem_clock', 'nvidia_smi_log.gpu.clocks.video_clock', 'nvidia_smi_log.gpu.applications_clocks.graphics_clock', 'nvidia_smi_log.gpu.applications_clocks.mem_clock', 'nvidia_smi_log.gpu.default_applications_clocks.graphics_clock', 'nvidia_smi_log.gpu.default_applications_clocks.mem_clock', 'nvidia_smi_log.gpu.max_clocks.graphics_clock', 'nvidia_smi_log.gpu.max_clocks.sm_clock', 'nvidia_smi_log.gpu.max_clocks.mem_clock', 'nvidia_smi_log.gpu.max_clocks.video_clock', 'nvidia_smi_log.gpu.max_customer_boost_clocks.graphics_clock']]
Loaded columns. Current columns: [['nvidia_smi_log.gpu.fb_memory_usage.used', 'nvidia_smi_log.gpu.fb_memory_usage.free', 'nvidia_smi_log.gpu.utilization.gpu_util', 'nvidia_smi_log.gpu.utilization.memory_util', 'nvidia_smi_log.gpu.temperature.gpu_temp', 'nvidia_smi_log.gpu.temperature.gpu_temp_max_threshold', 'nvidia_smi_log.gpu.temperature.gpu_temp_slow_threshold', 'nvidia_smi_log.gpu.power_readings.power_draw', 'nvidia_smi_log.gpu.clocks.graphics_clock', 'nvidia_smi_log.gpu.clocks.sm_clock', 'nvidia_smi_log.gpu.clocks.mem_clock', 'nvidia_smi_log.gpu.applications_clocks.graphics_clock', 'nvidia_smi_log.gpu.applications_clocks.mem_clock', 'nvidia_smi_log.gpu.default_applications_clocks.graphics_clock', 'nvidia_smi_log.gpu.default_applications_clocks.mem_clock', 'nvidia_smi_log.gpu.max_clocks.graphics_clock', 'nvidia_smi_log.gpu.max_clocks.sm_clock', 'nvidia_smi_log.gpu.max_clocks.mem_clock']]
Starting pipeline via CLI... Ctrl+C to Quit
Config:
{
Expand All @@ -133,7 +145,7 @@ Config:
],
"debug": false,
"edge_buffer_size": 128,
"feature_length": 29,
"feature_length": 18,
"fil": {
"feature_columns": [
"nvidia_smi_log.gpu.pci.tx_util",
Expand Down
73 changes: 73 additions & 0 deletions examples/abp_nvsmi_detection/nvsmi_data_extract.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# SPDX-FileCopyrightText: Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
# SPDX-License-Identifier: Apache-2.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

import argparse
import time

import pandas as pd
from pynvml.smi import NVSMI_QUERY_GPU
from pynvml.smi import nvidia_smi


def main():
query_opts = NVSMI_QUERY_GPU.copy()

# Remove the timestamp and supported clocks from the query
del query_opts["timestamp"]
del query_opts["supported-clocks"]

nvsmi = nvidia_smi.getInstance()

with open(args.output_file, "w", encoding="UTF-8") as f:

while (True):

device_query = nvsmi.DeviceQuery(list(query_opts.values()))

output_dicts = []

# Flatten the GPUs to allow for a new row per GPU
for gpu in device_query["gpu"]:
single_gpu = device_query.copy()

# overwrite the gpu list with a single gpu
single_gpu["gpu"] = gpu

output_dicts.append(single_gpu)

df = pd.json_normalize(output_dicts, record_prefix="nvidia_smi_log")

# Rename the id column to match the XML converted output from NetQ
df.rename(columns={"gpu.id": "gpu.@id", "count": "attached_gpus"}, inplace=True)

df.rename(columns=lambda x: "nvidia_smi_log" + "." + x, inplace=True)

# Add the current timestamp
df.insert(0, "timestamp", time.time())

df.to_json(f, orient="records", lines=True)

f.flush()

time.sleep(args.interval_ms / 1000.0)


if __name__ == '__main__':
parser = argparse.ArgumentParser(description=__doc__)
parser.add_argument('--interval-ms', default=1000, help='interval in ms between writes to output file')
parser.add_argument("--output-file", default='nvsmi.jsonlines', help='output file to save dataset')
args = parser.parse_args()

main()
3 changes: 0 additions & 3 deletions models/abp-models/abp-nvsmi-xgb-20210310.bst

This file was deleted.

3 changes: 3 additions & 0 deletions models/abp-models/abp-nvsmi-xgb-20230831.bst
Git LFS file not shown
13 changes: 1 addition & 12 deletions models/data/columns_fil.txt
Original file line number Diff line number Diff line change
@@ -1,29 +1,18 @@
nvidia_smi_log.gpu.pci.tx_util
nvidia_smi_log.gpu.pci.rx_util
nvidia_smi_log.gpu.fb_memory_usage.used
nvidia_smi_log.gpu.fb_memory_usage.free
nvidia_smi_log.gpu.bar1_memory_usage.total
nvidia_smi_log.gpu.bar1_memory_usage.used
nvidia_smi_log.gpu.bar1_memory_usage.free
nvidia_smi_log.gpu.utilization.gpu_util
nvidia_smi_log.gpu.utilization.memory_util
nvidia_smi_log.gpu.temperature.gpu_temp
nvidia_smi_log.gpu.temperature.gpu_temp_max_threshold
nvidia_smi_log.gpu.temperature.gpu_temp_slow_threshold
nvidia_smi_log.gpu.temperature.gpu_temp_max_gpu_threshold
nvidia_smi_log.gpu.temperature.memory_temp
nvidia_smi_log.gpu.temperature.gpu_temp_max_mem_threshold
nvidia_smi_log.gpu.power_readings.power_draw
nvidia_smi_log.gpu.clocks.graphics_clock
nvidia_smi_log.gpu.clocks.sm_clock
nvidia_smi_log.gpu.clocks.mem_clock
nvidia_smi_log.gpu.clocks.video_clock
nvidia_smi_log.gpu.applications_clocks.graphics_clock
nvidia_smi_log.gpu.applications_clocks.mem_clock
nvidia_smi_log.gpu.default_applications_clocks.graphics_clock
nvidia_smi_log.gpu.default_applications_clocks.mem_clock
nvidia_smi_log.gpu.max_clocks.graphics_clock
nvidia_smi_log.gpu.max_clocks.sm_clock
nvidia_smi_log.gpu.max_clocks.mem_clock
nvidia_smi_log.gpu.max_clocks.video_clock
nvidia_smi_log.gpu.max_customer_boost_clocks.graphics_clock
nvidia_smi_log.gpu.max_clocks.mem_clock
Original file line number Diff line number Diff line change
Expand Up @@ -68,18 +68,14 @@
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": true,
"jupyter": {
"outputs_hidden": true
},
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/io/json.py:51: UserWarning: Using CPU via Pandas to read JSON dataset, this may be GPU accelerated in the future\n",
"/home/efajardo/miniconda3/envs/morpheus/lib/python3.10/site-packages/cudf/io/json.py:121: UserWarning: Using CPU via Pandas to read JSON dataset, this may be GPU accelerated in the future\n",
" warnings.warn(\n"
]
}
Expand All @@ -104,37 +100,24 @@
{
"data": {
"text/plain": [
"['nvidia_smi_log.timestamp',\n",
" 'nvidia_smi_log.gpu.pci.tx_util',\n",
" 'nvidia_smi_log.gpu.pci.rx_util',\n",
" 'nvidia_smi_log.gpu.fb_memory_usage.used',\n",
"['nvidia_smi_log.gpu.fb_memory_usage.used',\n",
" 'nvidia_smi_log.gpu.fb_memory_usage.free',\n",
" 'nvidia_smi_log.gpu.bar1_memory_usage.total',\n",
" 'nvidia_smi_log.gpu.bar1_memory_usage.used',\n",
" 'nvidia_smi_log.gpu.bar1_memory_usage.free',\n",
" 'nvidia_smi_log.gpu.utilization.gpu_util',\n",
" 'nvidia_smi_log.gpu.utilization.memory_util',\n",
" 'nvidia_smi_log.gpu.temperature.gpu_temp',\n",
" 'nvidia_smi_log.gpu.temperature.gpu_temp_max_threshold',\n",
" 'nvidia_smi_log.gpu.temperature.gpu_temp_slow_threshold',\n",
" 'nvidia_smi_log.gpu.temperature.gpu_temp_max_gpu_threshold',\n",
" 'nvidia_smi_log.gpu.temperature.memory_temp',\n",
" 'nvidia_smi_log.gpu.temperature.gpu_temp_max_mem_threshold',\n",
" 'nvidia_smi_log.gpu.power_readings.power_draw',\n",
" 'nvidia_smi_log.gpu.clocks.graphics_clock',\n",
" 'nvidia_smi_log.gpu.clocks.sm_clock',\n",
" 'nvidia_smi_log.gpu.clocks.mem_clock',\n",
" 'nvidia_smi_log.gpu.clocks.video_clock',\n",
" 'nvidia_smi_log.gpu.applications_clocks.graphics_clock',\n",
" 'nvidia_smi_log.gpu.applications_clocks.mem_clock',\n",
" 'nvidia_smi_log.gpu.default_applications_clocks.graphics_clock',\n",
" 'nvidia_smi_log.gpu.default_applications_clocks.mem_clock',\n",
" 'nvidia_smi_log.gpu.max_clocks.graphics_clock',\n",
" 'nvidia_smi_log.gpu.max_clocks.sm_clock',\n",
" 'nvidia_smi_log.gpu.max_clocks.mem_clock',\n",
" 'nvidia_smi_log.gpu.max_clocks.video_clock',\n",
" 'nvidia_smi_log.gpu.max_customer_boost_clocks.graphics_clock',\n",
" 'label']"
" 'nvidia_smi_log.gpu.max_clocks.mem_clock']"
]
},
"execution_count": 3,
Expand All @@ -143,7 +126,9 @@
}
],
"source": [
"list(df)"
"with open(\"../../../morpheus/data/columns_fil.txt\", \"r\", encoding='UTF-8') as fh:\n",
" feat_cols = [x.strip() for x in fh.readlines()]\n",
"feat_cols"
]
},
{
Expand Down Expand Up @@ -207,7 +192,7 @@
"outputs": [],
"source": [
"# 80/20 dataset split\n",
"X_train, X_test, y_train, y_test= train_test_split(df.drop([\"label\",\"nvidia_smi_log.timestamp\"],axis=1), df['label'], train_size=0.8, random_state=1)"
"X_train, X_test, y_train, y_test= train_test_split(df[feat_cols], df['label'], train_size=0.8, random_state=1)"
]
},
{
Expand Down Expand Up @@ -286,6 +271,14 @@
"[3]\tvalidation-auc:1.00000\ttrain-auc:1.00000\n",
"[4]\tvalidation-auc:1.00000\ttrain-auc:1.00000\n"
]
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/efajardo/miniconda3/envs/morpheus/lib/python3.10/site-packages/xgboost/core.py:617: FutureWarning: Pass `evals` as keyword args.\n",
" warnings.warn(msg, FutureWarning)\n"
]
}
],
"source": [
Expand Down Expand Up @@ -382,7 +375,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand All @@ -396,7 +389,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
"version": "3.10.12"
}
},
"nbformat": 4,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
# limitations under the License.
"""
Example Usage:
python abp-nvsmi-xgb-20210310.py \
python abp_nvsmi_xgb_training.py \
--trainingdata \
../../datasets/training-data/abp-sample-nvsmi-training-data.json \
../../datasets/training-data/abp-sample-nvsmi-training-data.json
"""

import argparse
Expand All @@ -33,33 +33,37 @@ def preprocess(trainingdata):

df = cudf.read_json(trainingdata)

# print list of columns
with open("../../../morpheus/data/columns_fil.txt", "r", encoding='UTF-8') as fh:
feat_cols = [x.strip() for x in fh.readlines()]

print(list(df))
feat_cols.append("label")
df = df[feat_cols]

# print labels
# print list of columns
print(feat_cols)

# print labels
print(df['label'].unique())

return df


def train_val_split(df):

(X_train, X_test, y_train, y_test) = \
train_test_split(df.drop(['label', 'nvidia_smi_log.timestamp'],
(x_train, x_test, y_train, y_test) = \
train_test_split(df.drop(['label'],
axis=1), df['label'], train_size=0.8,
random_state=1)

return (X_train, X_test, y_train, y_test)
return (x_train, x_test, y_train, y_test)


def train(X_train, X_test, y_train, y_test):
def train(x_train, x_test, y_train, y_test):

# move to Dmatrix

dmatrix_train = xgb.DMatrix(X_train, label=y_train)
dmatrix_validation = xgb.DMatrix(X_test, label=y_test)
dmatrix_train = xgb.DMatrix(x_train, label=y_train)
dmatrix_validation = xgb.DMatrix(x_test, label=y_test)

# Set parameters

Expand All @@ -75,6 +79,7 @@ def train(X_train, X_test, y_train, y_test):

# Train the model

# pylint: disable=too-many-function-args
bst = xgb.train(params, dmatrix_train, num_round, evallist)
return bst

Expand All @@ -97,10 +102,10 @@ def save_model(model):

def main():
print('Preprocessing...')
(X_train, X_test, y_train, y_test) = \
(x_train, x_test, y_train, y_test) = \
train_val_split(preprocess(args.trainingdata))
print('Model Training...')
model = train(X_train, X_test, y_train, y_test)
model = train(x_train, x_test, y_train, y_test)
print('Saving Model')
save_model(model)

Expand Down
2 changes: 1 addition & 1 deletion models/triton-model-repo/abp-nvsmi-xgb/config.pbtxt
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ input [
{
name: "input__0"
data_type: TYPE_FP32
dims: [ 29 ]
dims: [ 18 ]
}
]
output [
Expand Down
2 changes: 1 addition & 1 deletion morpheus/cli/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ def pipeline_nlp(ctx: click.Context, **kwargs):
cls=PluginGroup,
pipeline_mode=PipelineModes.FIL)
@click.option('--model_fea_length',
default=29,
default=18,
type=click.IntRange(min=1),
help="Number of features trained in the model")
@click.option('--label',
Expand Down