Skip to content

Commit

Permalink
Merge branch 'branch-23.01' into david-23.01-docs
Browse files Browse the repository at this point in the history
  • Loading branch information
dagardner-nv committed Dec 6, 2022
2 parents 9dedfb8 + 40b1c95 commit 530b40e
Show file tree
Hide file tree
Showing 37 changed files with 53 additions and 53 deletions.
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ add_custom_target(copy_python_source ALL
DEPENDS ${OUTPUT_PYTHON_FILES}
)

# Manually install the below files. install(DIRECTORY) doesnt work well and
# Manually install the below files. install(DIRECTORY) doesn't work well and
# makes it impossible to get these files and MORPHEUS_PYTHON_FILES in one command.
install(
FILES ${MORPHEUS_ROOT_PYTHON_FILES}
Expand Down
4 changes: 2 additions & 2 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -332,7 +332,7 @@ TensorRT :Skipped
Complete!
```

This indicates that only 3 out of 314 rows did not match the validation dataset. If you see errors similar to `:/ ( %)` or very high percentages, then the workflow did not complete sucessfully.
This indicates that only 3 out of 314 rows did not match the validation dataset. If you see errors similar to `:/ ( %)` or very high percentages, then the workflow did not complete successfully.

### Troubleshooting the Build

Expand All @@ -342,7 +342,7 @@ Due to the large number of dependencies, it's common to run into build issues. T
- To avoid rebuilding every compilation unit for all dependencies after each change, a fair amount of the build is cached. By default, the cache is located at `${MORPHEUS_ROOT}/.cache`. The cache contains both compiled object files, source repositories, ccache files, clangd files and even the cuDF build.
- The entire cache folder can be deleted at any time and will be redownload/recreated on the next build
- Message indicating `git apply ...` failed
- Many of the dependencies require small patches to make them work. These patches must be applied once and only once. If you see this error, try deleting the offending package from the `build/_deps/<offending_packag>` directory or from `.cache/cpm/<offending_package>`.
- Many of the dependencies require small patches to make them work. These patches must be applied once and only once. If you see this error, try deleting the offending package from the `build/_deps/<offending_package>` directory or from `.cache/cpm/<offending_package>`.
- If all else fails, delete the entire `build/` directory and `.cache/` directory.
## Licensing
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -240,7 +240,7 @@ Commands:
filter Filter message by a classification threshold
from-file Load messages from a file
from-kafka Load messages from a Kafka cluster
gen-viz (Deprecated) Write out vizualization data frames
gen-viz (Deprecated) Write out visualization data frames
inf-identity Perform a no-op inference for testing
inf-pytorch Perform inference with PyTorch
inf-triton Perform inference with Triton
Expand Down Expand Up @@ -317,4 +317,4 @@ Commands:
Note: The available commands for different types of pipelines are not the same. This means that the same stage, when used in different pipelines, may have different options. Please check the CLI help for the most up-to-date information during development.
## Contributing
Please see our [guide for contributing to Morpheus](./CONTRIBUTING.md).
Please see our [guide for contributing to Morpheus](./CONTRIBUTING.md).
2 changes: 1 addition & 1 deletion ci/scripts/cpp_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ if [[ -n "${MORPHEUS_MODIFIED_FILES}" ]]; then

CLANG_TIDY_DIFF=$(find_clang_tidy_diff)

# Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesnt return the correct error codes)
# Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesn't return the correct error codes)
CLANG_TIDY_OUTPUT=`get_unified_diff ${CPP_FILE_REGEX} | ${CLANG_TIDY_DIFF} -j 0 -path ${BUILD_DIR} -p1 -quiet 2>&1`

if [[ -n "${CLANG_TIDY_OUTPUT}" && ${CLANG_TIDY_OUTPUT} != "No relevant changes found." ]]; then
Expand Down
4 changes: 2 additions & 2 deletions ci/scripts/python_checks.sh
Original file line number Diff line number Diff line change
Expand Up @@ -41,13 +41,13 @@ if [[ -n "${MORPHEUS_MODIFIED_FILES}" ]]; then
done

if [[ "${SKIP_ISORT}" == "" ]]; then
# Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesnt return the correct error codes)
# Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesn't return the correct error codes)
ISORT_OUTPUT=`python3 -m isort --settings-file ${PY_CFG} --filter-files --check-only ${MORPHEUS_MODIFIED_FILES[@]} 2>&1`
ISORT_RETVAL=$?
fi

if [[ "${SKIP_FLAKE}" == "" ]]; then
# Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesnt return the correct error codes)
# Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesn't return the correct error codes)
FLAKE_OUTPUT=`python3 -m flake8 --config ${PY_CFG} ${MORPHEUS_MODIFIED_FILES[@]} 2>&1`
FLAKE_RETVAL=$?
fi
Expand Down
4 changes: 2 additions & 2 deletions cmake/setup_cache.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -44,11 +44,11 @@ function(configure_ccache cache_dir_name)
# Set the ccache options we need
set(CCACHE_CONFIGPATH "${CCACHE_DIR}/ccache.conf")

# Because CMake doesnt allow settings variables `CCACHE_COMPILERTYPE=gcc
# Because CMake doesn't allow settings variables `CCACHE_COMPILERTYPE=gcc
# ccache` in CMAKE_C_COMPILER_LAUNCHER, we need to put everything into a
# single script and use that for CMAKE_C_COMPILER_LAUNCHER. Also, since
# gxx_linux-64 sets the compiler to c++ instead of g++, we need to set the
# value of CCACHE_COMPILERTYPE otherwise caching doesnt work correctly. So
# value of CCACHE_COMPILERTYPE otherwise caching doesn't work correctly. So
# we need to make separate runners for each language with specific ccache
# settings for each
if(NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")
Expand Down
2 changes: 1 addition & 1 deletion docs/source/basics/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Multi-Monitor Throughput
^^^^^^^^^^^^^^^^^^^^^^^^

This example will report the throughput for each stage independently. Keep in mind, ``buffer`` stages are necessary to
decouple one stage from the next. Without the buffers, all montioring would show the same throughput.
decouple one stage from the next. Without the buffers, all monitoring would show the same throughput.

.. image:: img/multi_monitor_throughput.png

Expand Down
6 changes: 3 additions & 3 deletions docs/source/developer_guide/guides/2_real_world_phishing.md
Original file line number Diff line number Diff line change
Expand Up @@ -402,7 +402,7 @@ In our previous examples, we didn't define a constructor for the Python classes

Note that it is a best practice to perform any necessary validation checks in the constructor. This allows us to fail early rather than after the pipeline has started.

In our `RecipientFeaturesStage` example, we hard-coded the Bert separator token. Let's instead refactor the code to receive that as a constructor argument. This new constructor argument is documented following the [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html#parameters) formatting style allowing it to be documented propperly for both API and CLI users. Let's also take the opportunity to verify that the pipeline mode is set to `morpheus.config.PipelineModes.NLP`. Our refactored class definition now looks like:
In our `RecipientFeaturesStage` example, we hard-coded the Bert separator token. Let's instead refactor the code to receive that as a constructor argument. This new constructor argument is documented following the [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html#parameters) formatting style allowing it to be documented properly for both API and CLI users. Let's also take the opportunity to verify that the pipeline mode is set to `morpheus.config.PipelineModes.NLP`. Our refactored class definition now looks like:

```python
from morpheus.config import Config
Expand All @@ -417,7 +417,7 @@ class RecipientFeaturesStage(SinglePortStage):
config : morpheus.config.Config
Pipeline configuration instance.
sep_token : str
Bert separator toeken.
Bert separator token.
"""

def __init__(self, config: Config, sep_token: str = '[SEP]'):
Expand All @@ -442,7 +442,7 @@ Usage: morpheus run pipeline-nlp recipient-features [OPTIONS]
Pre-processing stage which counts the number of recipients in an email's metadata.
Options:
--sep_token TEXT Bert separator toeken. [default: [SEP]]
--sep_token TEXT Bert separator token. [default: [SEP]]
--help Show this message and exit.
```

Expand Down
2 changes: 1 addition & 1 deletion docs/source/developer_guide/guides/3_simple_cpp_stage.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ PassThruStage::PassThruStage() :
{}
```

However, this doesnt illustrate well how to customize a stage. So we will be using the long form signature for our examples.
However, this doesn't illustrate well how to customize a stage. So we will be using the long form signature for our examples.

The `build_operator` method defines an observer who is subscribed to our input `rxcpp::observable`. The observer consists of three functions that are typically lambdas: `on_next`, `on_error`, and `on_completed`. Typically, these three functions call the associated methods on the output subscriber.

Expand Down
4 changes: 2 additions & 2 deletions docs/source/morpheus_quickstart_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,7 +249,7 @@ pod/mlflow-6d98 1/1 Running 0 39s
```

### Model Deployment
Attach to the MLfLow pod to publish models to the MLflow server and then deploy it onto Morpheus AI Engine:
Attach to the MLflow pod to publish models to the MLflow server and then deploy it onto Morpheus AI Engine:

```bash
kubectl -n $NAMESPACE exec -it deploy/mlflow -- bash
Expand Down Expand Up @@ -903,7 +903,7 @@ Commands:
filter Filter message by a classification threshold.
from-file Load messages from a file.
from-kafka Load messages from a Kafka cluster.
gen-viz (Deprecated) Write out vizualization DataFrames.
gen-viz (Deprecated) Write out visualization DataFrames.
inf-identity Perform inference for testing that performs a no-op.
inf-pytorch Perform inference with PyTorch.
inf-triton Perform inference with Triton Inference Server.
Expand Down
4 changes: 2 additions & 2 deletions examples/abp_nvsmi_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ $ nvidia-smi dmon
0 281 57 - 85 54 0 0 7000 1740
```

Each line in the output represents the GPU metrics at a single point in time. As the tool progresses the GPU begins to be utilized and the SM% and Mem% values increase as memory is loaded into the GPU and computations are performed. The model we will be using can ingest this information and determine whether or not the GPU is mining cryptocurriences without needing additional information from the host machine.
Each line in the output represents the GPU metrics at a single point in time. As the tool progresses the GPU begins to be utilized and the SM% and Mem% values increase as memory is loaded into the GPU and computations are performed. The model we will be using can ingest this information and determine whether or not the GPU is mining cryptocurrencies without needing additional information from the host machine.

In this example we will be using the `examples/data/nvsmi.jsonlines` dataset that is known to contain mining behavior profiles. The dataset is in the `.jsonlines` format which means each new line represents a new JSON object. In order to parse this data, it must be ingested, split by lines into individual JSON objects, and parsed into cuDF dataframes. This will all be handled by Morpheus.

Expand Down Expand Up @@ -221,4 +221,4 @@ The output file `detections.jsonlines` will contain a single boolean value for e
...
```

We have stripped out the input data to make the detections easier to identify. Ommitting the argument `--include 'mining'` would show the input data in the detections file.
We have stripped out the input data to make the detections easier to identify. Omitting the argument `--include 'mining'` would show the input data in the detections file.
2 changes: 1 addition & 1 deletion examples/abp_pcap_detection/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ The pipeline will process the input `pcap_dump.jsonlines` sample data and write

### CLI Example
The above example is illustrative of using the Python API to build a custom Morpheus Pipeline.
Alternately the Morpheus command line could have been used to accomplush the same goal by registering the `abp_pcap_preprocessing.py` module as a plugin.
Alternately the Morpheus command line could have been used to accomplish the same goal by registering the `abp_pcap_preprocessing.py` module as a plugin.

From the root of the Morpheus repo run:
```bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ class RecipientFeaturesStage(SinglePortStage):
config : morpheus.config.Config
Pipeline configuration instance.
sep_token : str
Bert separator toeken.
Bert separator token.
"""

def __init__(self, config: Config, sep_token: str = '[SEP]'):
Expand Down
2 changes: 1 addition & 1 deletion examples/developer_guide/2_2_rabbitmq/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ limitations under the License.
-->

# Example RabbitMQ stages
This example inclues two stages `RabbitMQSourceStage` and `WriteToRabbitMQStage`
This example includes two stages `RabbitMQSourceStage` and `WriteToRabbitMQStage`

## Testing with a RabbitMQ container
Testing can be performed locally with the RabbitMQ supplied docker image from the [RabbitMQ container registry](https://registry.hub.docker.com/_/rabbitmq/):
Expand Down
2 changes: 1 addition & 1 deletion examples/digital_fingerprinting/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

## Organization

The DFP example workflows in Morpheus are designed to scale up to company wide workloads and handle several different log types which resulted in a large number of moving parts to handle the various services and configuration options. To simplify things, the DFP workflow is provided as two separate examples: a simple, "starter" pipeline for new users and a complex, "production" pipeline for full scale deployments. While these two examples both peform the same general tasks, they do so in very different ways. The following is a breakdown of the differences between the two examples.
The DFP example workflows in Morpheus are designed to scale up to company wide workloads and handle several different log types which resulted in a large number of moving parts to handle the various services and configuration options. To simplify things, the DFP workflow is provided as two separate examples: a simple, "starter" pipeline for new users and a complex, "production" pipeline for full scale deployments. While these two examples both perform the same general tasks, they do so in very different ways. The following is a breakdown of the differences between the two examples.

### The "Starter" Example

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ def on_data(self, message: MultiAEMessage):

experiment_name = self.user_id_to_experiment(user_id=user)

# Creates a new experiment if it doesnt exist
# Creates a new experiment if it doesn't exist
experiment = mlflow.set_experiment(experiment_name)

with mlflow.start_run(run_name="Duo autoencoder model training run",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ def append_dataframe(self, incoming_df: pd.DataFrame) -> bool:
# Set the filtered index
filtered_df.index = range(self.total_count, self.total_count + len(filtered_df))

# Save the row hash to make it easier to find later. Do this before the batch so it doesnt participate
# Save the row hash to make it easier to find later. Do this before the batch so it doesn't participate
filtered_df["_row_hash"] = pd.util.hash_pandas_object(filtered_df, index=False)

# Use batch id to distinguish groups in the same dataframe
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ def load_model_cache(self, client: MlflowClient, reg_model_name: str) -> ModelCa
latest_versions = client.get_latest_versions(reg_model_name)

if (len(latest_versions) == 0):
# Databricks doesnt like the `get_latest_versions` method for some reason. Before failing, try
# Databricks doesn't like the `get_latest_versions` method for some reason. Before failing, try
# to just get the model and then use latest versions
reg_model_obj = client.get_registered_model(reg_model_name)

Expand Down
4 changes: 2 additions & 2 deletions examples/digital_fingerprinting/starter/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ Commands:
from-azure Source stage is used to load Azure Active Directory messages.
from-cloudtrail Load messages from a Cloudtrail directory
from-duo Source stage is used to load Duo Authentication messages.
gen-viz (Deprecated) Write out vizualization data frames
gen-viz (Deprecated) Write out visualization data frames
inf-pytorch Perform inference with PyTorch
inf-triton Perform inference with Triton
monitor Display throughput numbers at a specific point in the
Expand Down Expand Up @@ -115,7 +115,7 @@ The following table shows mapping between the main Morpheus CLI commands and und

**Preprocessing stages**

`TrainAEStage` can either train user models using data matching a provided `--train_data_glob` or load pre-trained models from file using `--pretrained_filename`. When using `--train_data_glob`, user models can be saved using the `--models_output_filename` option. The `--source_stage_class` must also be used with `--train_data_glob` so that the training stage knows how to read the training data. The autoencoder implementation from this [fork](https://github.com/efajardo-nv/dfencoder/tree/morpheus-22.08) is used for user model training. The following are the available CLI options for the `TrainAEStage` (train-ae):
`TrainAEStage` can either train user models using data matching a provided `--train_data_glob` or load pre-trained models from file using `--pretrained_filename`. When using `--train_data_glob`, user models can be saved using the `--models_output_filename` option. The `--source_stage_class` must also be used with `--train_data_glob` so that the training stage knows how to read the training data. The autoencoder implementation used for user model training can be found [here](https://github.com/nv-morpheus/dfencoder). The following are the available CLI options for the `TrainAEStage` (train-ae):

| Option | Description
| ----------------------| ---------------------------------------------------------
Expand Down
4 changes: 2 additions & 2 deletions examples/gnn_fraud_detection_pipeline/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ limitations under the License.

## Requirements

Prior to running the gnn fruad detection pipeline, additional requirements must be installed in to your conda environment. A supplemental requirements file has been provided in this example directory.
Prior to running the gnn fraud detection pipeline, additional requirements must be installed in to your conda environment. A supplemental requirements file has been provided in this example directory.

```bash
mamba env update -n ${CONDA_DEFAULT_ENV} -f examples/gnn_fraud_detection_pipeline/requirements.yml
Expand Down Expand Up @@ -90,7 +90,7 @@ Serialize rate[Complete]: 265messages [00:01, 142.31messages/s]
```

### CLI Example
The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplush the same goal. To do this we must ensure that the `examples` directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.
The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplish the same goal. To do this we must ensure that the `examples` directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.
Note: Since the `gnn_fraud_detection_pipeline` module is visible to Python we can specify the plugins by their module name rather than the more verbose file path.

From the root of the Morpheus repo run:
Expand Down
2 changes: 1 addition & 1 deletion examples/log_parsing/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ Options:
```

### CLI Example
The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplush the same goal. To do this we must ensure that the `examples`/log_parsing directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.
The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplish the same goal. To do this we must ensure that the `examples`/log_parsing directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.

From the root of the Morpheus repo run:
```bash
Expand Down
Loading

0 comments on commit 530b40e

Please sign in to comment.