Skip to content

Commit

Permalink
Merge morpheus core spear phishing components. (nv-morpheus#1044)
Browse files Browse the repository at this point in the history
Add Morpheus core spear phishing components.

Authors:
  - Devin Robison (https://github.com/drobison00)
  - Bhargav Suryadevara (https://github.com/bsuryadevara)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: nv-morpheus#1044
  • Loading branch information
drobison00 authored Jul 13, 2023
1 parent 7c2db78 commit 0987dfb
Show file tree
Hide file tree
Showing 52 changed files with 2,722 additions and 74 deletions.
3 changes: 3 additions & 0 deletions docker/conda/environments/cuda11.8_dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,9 @@ dependencies:
- rapidjson=1.1.0
- scikit-build=0.17.1
- scikit-learn=1.2.2
- sphinx
- sphinx_rtd_theme
- sqlalchemy<=1.9
- sysroot_linux-64=2.17
- tritonclient=2.26 # Required by NvTabular, force the version, so we get protobufs compatible with 4.21
- tqdm=4
Expand Down
88 changes: 88 additions & 0 deletions docs/source/loaders/core/sql_loader.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## SQL Loader

[DataLoader](./../../modules/core/data_loader.md) module is configured to use this loader function. SQL loader to
fetch data from a SQL database and store it in a DataFrame, and returns the updated ControlMessage object with payload
as MessageMeta.

### Example Loader Configuration

```json
{
"loaders": [
{
"id": "SQLLoader"
}
]
}
```

**Note** : Loaders can receive configuration from the `load` task via ControlMessage during runtime.

### Task Configurable Parameters

The parameters that can be configured for this specific loader at load task level:

| Parameter | Type | Description | Example Value | Default Value |
|--------------|------------|------------------------------------------|--------------------|---------------|
| `strategy` | string | Strategy for combining queries | "aggregate" | `aggregate` |
| `loader_id` | string | Unique identifier for the loader | "file_to_df" | `[Required]` |
| `sql_config` | dictionary | Dictionary containing SQL queries to run | "file_to_df" | `See below` |

`sql_config`

| Parameter | Type | Description | Example Value | Default Value |
|-----------|------|---------------------------------------------------|--------------------------------------------|---------------|
| `queries` | list | List of dictionaries composing a query definition | "[query_dict_1, ..., query_dict_n]" | `See below` |

`queries`

| Parameter | Type | Description | Example Value | Default Value |
|---------------------|------------|--------------------------------------|-----------------------------------------------------------------|---------------|
| `connection_string` | string | Strategy for combining queries | "postgresql://postgres:postgres@localhost:5432/postgres" | `[required]` |
| `query` | string | SQL Query to execute | "SELECT * FROM test_table WHERE id IN (?, ?, ?)" | `[Required]` |
| `params` | dictionary | Named or positional paramters values | "[foo, bar, baz]" | `-` |

### Example Load Task Configuration

Below JSON configuration specifies how to pass additional configuration to the loader through a control message task at
runtime.

```json
{
"type": "load",
"properties": {
"loader_id": "SQLLoader",
"strategy": "aggregate",
"sql_config": {
"queries": [
{
"connection_string": "postgresql://postgres:postgres@localhost:5431/postgres",
"query": "SELECT * FROM test_table WHERE id IN (?, ?, ?)",
"params": [
"foo",
"bar",
"baz"
]
}
]
}
}
}
```
5 changes: 4 additions & 1 deletion docs/source/loaders/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,9 @@ limitations under the License.

# Loaders

Custom functions called "Loaders" can be utilized by the DataLoader Module to load data into the pipeline. The user can choose to register their own customized loader function and add it to a dataloader registry, which will then become accessible to the DataLoader module during module loading.
Custom functions called "Loaders" can be utilized by the DataLoader Module to load data into the pipeline. The user can
choose to register their own customized loader function and add it to a dataloader registry, which will then become
accessible to the DataLoader module during module loading.

**Note** : Loaders receive configuration from the `load` task via [control message](../../developer_guide/guides/9_control_messages.md) during runtime.

Expand All @@ -28,5 +30,6 @@ Custom functions called "Loaders" can be utilized by the DataLoader Module to lo
./core/file_to_df_loader.md
./core/fsspec_loader.md
./core/sql_loader.md
```
47 changes: 47 additions & 0 deletions docs/source/modules/core/payload_batcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## Batch Data Payload Module

This module batches incoming control message data payload into smaller batches based on the specified configurations.

### Configurable Parameters

| Parameter | Type | Description | Example Value | Default Value |
|-----------------------------|------------|-----------------------------------|---------------------------------|---------------|
| `max_batch_size` | integer | The maximum size of each batch | 256 | `256` |
| `raise_on_failure` | boolean | Whether to raise an exception if a failure occurs during processing | false | `false` |
| `group_by_columns` | list | The column names to group by when batching | ["col1", "col2"] | `[]` |
| `disable_max_batch_size` | boolean | Whether to disable the `max_batch_size` and only batch by group | false | `false` |
| `timestamp_column_name` | string | The name of the timestamp column | None | `None` |
| `timestamp_pattern` | string | The pattern to parse the timestamp column | None | `None` |
| `period` | string | The period for grouping by timestamp | H | `D` |


### Example JSON Configuration

```json
{
"max_batch_size": 256,
"raise_on_failure": false,
"group_by_columns": [],
"disable_max_batch_size": false,
"timestamp_column_name": null,
"timestamp_pattern": null,
"period": "D"
}
```
42 changes: 42 additions & 0 deletions docs/source/modules/examples/spear_phishing/sp_email_enrichment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## Spear Phishing Email Enrichment Module

Module ID: email_enrichment
Module Namespace: morpheus_spear_phishing

This module performs spear phishing email enrichment.

### Configurable Parameters

| Parameter | Type | Description | Example Value | Default Value |
|--------------------------|------|---------------------------------------------------------------------|------------------------|---------------|
| `sender_sketches` | list | List of sender strings naming sender sketch inputs. | ["sender1", "sender2"] | `[]` |
| `intents` | list | List of intent strings naming computed intent inputs. | ["intent1", "intent2"] | `[]` |
| `raise_on_failure` | boolean | Indicate if we should treat processing errors as pipeline failures. | false | `false` |
| `token_length_threshold` | integer | Minimum token length to use when computing syntax similarity | 5 | None |

### Example JSON Configuration

```json
{
"sender_sketches": ["sender1", "sender2"],
"intents": ["intent1", "intent2"],
"raise_on_failure": false,
"token_length_threshold": 5
}
54 changes: 54 additions & 0 deletions docs/source/modules/examples/spear_phishing/sp_inference_intent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## Inference Intent

Module ID: infer_email_intent
Module Namespace: morpheus_spear_phishing

Infers an 'intent' for a given email body.

### Configurable Parameters

| Parameter | Type | Description | Example Value | Default Value |
|--------------------|------|-----------------------------------------|-----------------------|-------------------------|
| `intent` | string | The intent for the model | "classify" | `None` |
| `task` | string | The task for the model | "text-classification" | `"text-classification"` |
| `model_path` | string | The path to the model | "/path/to/model" | `None` |
| `truncation` | boolean | If true, truncates inputs to max_length | true | `true` |
| `max_length` | integer | Maximum length for model input | 512 | `512` |
| `batch_size` | integer | The size of batches for processing | 256 | `256` |
| `feature_col` | string | The feature column to use | "body" | `"body"` |
| `label_col` | string | The label column to use | "label" | `"label"` |
| `device` | integer | The device to run on | 0 | `0` |
| `raise_on_failure` | boolean | If true, raise exceptions on failures | false | `false` |

### Example JSON Configuration

```json
{
"intent": "classify",
"task": "text-classification",
"model_path": "/path/to/model",
"truncation": true,
"max_length": 512,
"batch_size": 256,
"feature_col": "body",
"label_col": "label",
"device": 0,
"raise_on_failure": false
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## Spear Phishing Inference Module

Module ID: inference
Module Namespace: morpheus_spear_phishing

This module defines a setup for spear-phishing inference.

### Configurable Parameters

| Parameter | Type | Description | Example Value | Default Value |
|------------------------|------|---------------------------------------|--------------------|---------------|
| `tracking_uri` | string | The tracking URI for the model | "/path/to/uri" | `None` |
| `registered_model` | string | The registered model for inference | "model_1" | `None` |
| `input_model_features` | list | The input features for the model | ["feat1", "feat2"] | `[]` |
| `raise_on_failure` | boolean | If true, raise exceptions on failures | false | `false` |

### Example JSON Configuration

```json
{
"tracking_uri": "/path/to/uri",
"registered_model": "model_1",
"input_model_features": ["feat1", "feat2"],
"raise_on_failure": false
}
38 changes: 38 additions & 0 deletions docs/source/modules/examples/spear_phishing/sp_label_and_score.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## Spear Phishing Email Scoring Module

Module ID: label_and_score
Module Namespace: morpheus_spear_phishing

This module defines a setup for spear-phishing email scoring.

### Configurable Parameters

| Parameter | Type | Description | Example Value | Default Value |
|--------------------|------|---------------------------------------|---------------------------|---------------|
| `scoring_config` | dictionary | The scoring configuration | {"method": "probability"} | `None` |
| `raise_on_failure` | boolean | If true, raise exceptions on failures | false | `false` |

### Example JSON Configuration

```json
{
"scoring_config": {"method": "probability"},
"raise_on_failure": false
}
38 changes: 38 additions & 0 deletions docs/source/modules/examples/spear_phishing/sp_preprocessing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
<!--
SPDX-FileCopyrightText: Copyright (c) 2022-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: Apache-2.0
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

## Spear Phishing Inference Pipeline Preprocessing Module

Module ID: inference_pipeline_preproc
Module Namespace: morpheus_spear_phishing

This module defines a pre-processing setup for the spear phishing inference pipeline.

### Configurable Parameters

| Parameter | Type | Description | Example Value | Default Value |
|--------------------|------|---------------------------------------------------|---------------|---------------|
| `attach_uuid` | boolean | If true, attach a unique identifier to each input | true | `false` |
| `raise_on_failure` | boolean | If true, raise exceptions on failures | false | `false` |

### Example JSON Configuration

```json
{
"attach_uuid": false,
"raise_on_failure": false
}
Loading

0 comments on commit 0987dfb

Please sign in to comment.