Merge branch 'branch-23.01' into david-23.01-docs

nv-morpheus · Dec 6, 2022 · 530b40e · 530b40e
2 parents 9dedfb8 + 40b1c95
commit 530b40e
Show file tree

Hide file tree

Showing 37 changed files with 53 additions and 53 deletions.
diff --git a/CMakeLists.txt b/CMakeLists.txt
@@ -139,7 +139,7 @@ add_custom_target(copy_python_source ALL
   DEPENDS ${OUTPUT_PYTHON_FILES}
 )
 
-# Manually install the below files. install(DIRECTORY) doesnt work well and
+# Manually install the below files. install(DIRECTORY) doesn't work well and
 # makes it impossible to get these files and MORPHEUS_PYTHON_FILES in one command.
 install(
   FILES ${MORPHEUS_ROOT_PYTHON_FILES}

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -332,7 +332,7 @@ TensorRT    :Skipped
 Complete!
 ```
 
-This indicates that only 3 out of 314 rows did not match the validation dataset. If you see errors similar to `:/ ( %)` or very high percentages, then the workflow did not complete sucessfully.
+This indicates that only 3 out of 314 rows did not match the validation dataset. If you see errors similar to `:/ ( %)` or very high percentages, then the workflow did not complete successfully.
 
 ### Troubleshooting the Build
 
@@ -342,7 +342,7 @@ Due to the large number of dependencies, it's common to run into build issues. T
    - To avoid rebuilding every compilation unit for all dependencies after each change, a fair amount of the build is cached. By default, the cache is located at `${MORPHEUS_ROOT}/.cache`. The cache contains both compiled object files, source repositories, ccache files, clangd files and even the cuDF build.
    - The entire cache folder can be deleted at any time and will be redownload/recreated on the next build
  - Message indicating `git apply ...` failed
-   - Many of the dependencies require small patches to make them work. These patches must be applied once and only once. If you see this error, try deleting the offending package from the `build/_deps/<offending_packag>` directory or from `.cache/cpm/<offending_package>`.
+   - Many of the dependencies require small patches to make them work. These patches must be applied once and only once. If you see this error, try deleting the offending package from the `build/_deps/<offending_package>` directory or from `.cache/cpm/<offending_package>`.
    - If all else fails, delete the entire `build/` directory and `.cache/` directory.
 
 ## Licensing

diff --git a/README.md b/README.md
@@ -240,7 +240,7 @@ Commands:
   filter        Filter message by a classification threshold
   from-file     Load messages from a file
   from-kafka    Load messages from a Kafka cluster
-  gen-viz       (Deprecated) Write out vizualization data frames
+  gen-viz       (Deprecated) Write out visualization data frames
   inf-identity  Perform a no-op inference for testing
   inf-pytorch   Perform inference with PyTorch
   inf-triton    Perform inference with Triton
@@ -317,4 +317,4 @@ Commands:
 Note: The available commands for different types of pipelines are not the same. This means that the same stage, when used in different pipelines, may have different options. Please check the CLI help for the most up-to-date information during development.
 
 ## Contributing
-Please see our [guide for contributing to Morpheus](./CONTRIBUTING.md).
+Please see our [guide for contributing to Morpheus](./CONTRIBUTING.md).
diff --git a/ci/scripts/cpp_checks.sh b/ci/scripts/cpp_checks.sh
@@ -58,7 +58,7 @@ if [[ -n "${MORPHEUS_MODIFIED_FILES}" ]]; then
 
       CLANG_TIDY_DIFF=$(find_clang_tidy_diff)
 
-      # Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesnt return the correct error codes)
+      # Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesn't return the correct error codes)
       CLANG_TIDY_OUTPUT=`get_unified_diff ${CPP_FILE_REGEX} | ${CLANG_TIDY_DIFF} -j 0 -path ${BUILD_DIR} -p1 -quiet 2>&1`
 
       if [[ -n "${CLANG_TIDY_OUTPUT}" && ${CLANG_TIDY_OUTPUT} != "No relevant changes found." ]]; then

diff --git a/ci/scripts/python_checks.sh b/ci/scripts/python_checks.sh
@@ -41,13 +41,13 @@ if [[ -n "${MORPHEUS_MODIFIED_FILES}" ]]; then
    done
 
    if [[ "${SKIP_ISORT}" == "" ]]; then
-      # Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesnt return the correct error codes)
+      # Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesn't return the correct error codes)
       ISORT_OUTPUT=`python3 -m isort --settings-file ${PY_CFG} --filter-files --check-only  ${MORPHEUS_MODIFIED_FILES[@]} 2>&1`
       ISORT_RETVAL=$?
    fi
 
    if [[ "${SKIP_FLAKE}" == "" ]]; then
-      # Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesnt return the correct error codes)
+      # Run using a clang-tidy wrapper to allow warnings-as-errors and to eliminate any output except errors (since clang-tidy-diff.py doesn't return the correct error codes)
       FLAKE_OUTPUT=`python3 -m flake8 --config ${PY_CFG} ${MORPHEUS_MODIFIED_FILES[@]} 2>&1`
       FLAKE_RETVAL=$?
    fi

diff --git a/cmake/setup_cache.cmake b/cmake/setup_cache.cmake
@@ -44,11 +44,11 @@ function(configure_ccache cache_dir_name)
   # Set the ccache options we need
   set(CCACHE_CONFIGPATH "${CCACHE_DIR}/ccache.conf")
 
-  # Because CMake doesnt allow settings variables `CCACHE_COMPILERTYPE=gcc
+  # Because CMake doesn't allow settings variables `CCACHE_COMPILERTYPE=gcc
   # ccache` in CMAKE_C_COMPILER_LAUNCHER, we need to put everything into a
   # single script and use that for CMAKE_C_COMPILER_LAUNCHER. Also, since
   # gxx_linux-64 sets the compiler to c++ instead of g++, we need to set the
-  # value of CCACHE_COMPILERTYPE otherwise caching doesnt work correctly. So
+  # value of CCACHE_COMPILERTYPE otherwise caching doesn't work correctly. So
   # we need to make separate runners for each language with specific ccache
   # settings for each
   if(NOT "${CMAKE_CXX_COMPILER_ID}" STREQUAL "Clang")

diff --git a/docs/source/basics/examples.rst b/docs/source/basics/examples.rst
@@ -72,7 +72,7 @@ Multi-Monitor Throughput
 ^^^^^^^^^^^^^^^^^^^^^^^^
 
 This example will report the throughput for each stage independently. Keep in mind, ``buffer`` stages are necessary to
-decouple one stage from the next. Without the buffers, all montioring would show the same throughput.
+decouple one stage from the next. Without the buffers, all monitoring would show the same throughput.
 
 .. image:: img/multi_monitor_throughput.png
 

diff --git a/docs/source/developer_guide/guides/2_real_world_phishing.md b/docs/source/developer_guide/guides/2_real_world_phishing.md
@@ -402,7 +402,7 @@ In our previous examples, we didn't define a constructor for the Python classes
 
 Note that it is a best practice to perform any necessary validation checks in the constructor. This allows us to fail early rather than after the pipeline has started.
 
-In our `RecipientFeaturesStage` example, we hard-coded the Bert separator token. Let's instead refactor the code to receive that as a constructor argument.  This new constructor argument is documented following the [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html#parameters) formatting style allowing it to be documented propperly for both API and CLI users.  Let's also take the opportunity to verify that the pipeline mode is set to `morpheus.config.PipelineModes.NLP`. Our refactored class definition now looks like:
+In our `RecipientFeaturesStage` example, we hard-coded the Bert separator token. Let's instead refactor the code to receive that as a constructor argument.  This new constructor argument is documented following the [numpydoc](https://numpydoc.readthedocs.io/en/latest/format.html#parameters) formatting style allowing it to be documented properly for both API and CLI users.  Let's also take the opportunity to verify that the pipeline mode is set to `morpheus.config.PipelineModes.NLP`. Our refactored class definition now looks like:
 
 ```python
 from morpheus.config import Config
@@ -417,7 +417,7 @@ class RecipientFeaturesStage(SinglePortStage):
     config : morpheus.config.Config
         Pipeline configuration instance.
     sep_token : str
-        Bert separator toeken.
+        Bert separator token.
     """
 
     def __init__(self, config: Config, sep_token: str = '[SEP]'):
@@ -442,7 +442,7 @@ Usage: morpheus run pipeline-nlp recipient-features [OPTIONS]
   Pre-processing stage which counts the number of recipients in an email's metadata.
 
 Options:
-  --sep_token TEXT  Bert separator toeken.  [default: [SEP]]
+  --sep_token TEXT  Bert separator token.  [default: [SEP]]
   --help            Show this message and exit.
 ```
 

diff --git a/docs/source/developer_guide/guides/3_simple_cpp_stage.md b/docs/source/developer_guide/guides/3_simple_cpp_stage.md
@@ -205,7 +205,7 @@ PassThruStage::PassThruStage() :
 {}
 ```
 
-However, this doesnt illustrate well how to customize a stage. So we will be using the long form signature for our examples.
+However, this doesn't illustrate well how to customize a stage. So we will be using the long form signature for our examples.
 
 The `build_operator` method defines an observer who is subscribed to our input `rxcpp::observable`. The observer consists of three functions that are typically lambdas:  `on_next`, `on_error`, and `on_completed`. Typically, these three functions call the associated methods on the output subscriber.
 

diff --git a/docs/source/morpheus_quickstart_guide.md b/docs/source/morpheus_quickstart_guide.md
@@ -249,7 +249,7 @@ pod/mlflow-6d98        1/1     Running   0          39s
 ```
 
 ### Model Deployment
-Attach to the MLfLow pod to publish models to the MLflow server and then deploy it onto Morpheus AI Engine:
+Attach to the MLflow pod to publish models to the MLflow server and then deploy it onto Morpheus AI Engine:
 
 ```bash
 kubectl -n $NAMESPACE exec -it deploy/mlflow -- bash
@@ -903,7 +903,7 @@ Commands:
   filter        Filter message by a classification threshold.
   from-file     Load messages from a file.
   from-kafka    Load messages from a Kafka cluster.
-  gen-viz       (Deprecated) Write out vizualization DataFrames.
+  gen-viz       (Deprecated) Write out visualization DataFrames.
   inf-identity  Perform inference for testing that performs a no-op.
   inf-pytorch   Perform inference with PyTorch.
   inf-triton    Perform inference with Triton Inference Server.

diff --git a/examples/abp_nvsmi_detection/README.md b/examples/abp_nvsmi_detection/README.md
@@ -44,7 +44,7 @@ $ nvidia-smi dmon
     0   281    57     -    85    54     0     0  7000  1740
 ```
 
-Each line in the output represents the GPU metrics at a single point in time. As the tool progresses the GPU begins to be utilized and the SM% and Mem% values increase as memory is loaded into the GPU and computations are performed. The model we will be using can ingest this information and determine whether or not the GPU is mining cryptocurriences without needing additional information from the host machine.
+Each line in the output represents the GPU metrics at a single point in time. As the tool progresses the GPU begins to be utilized and the SM% and Mem% values increase as memory is loaded into the GPU and computations are performed. The model we will be using can ingest this information and determine whether or not the GPU is mining cryptocurrencies without needing additional information from the host machine.
 
 In this example we will be using the `examples/data/nvsmi.jsonlines` dataset that is known to contain mining behavior profiles. The dataset is in the `.jsonlines` format which means each new line represents a new JSON object. In order to parse this data, it must be ingested, split by lines into individual JSON objects, and parsed into cuDF dataframes. This will all be handled by Morpheus.
 
@@ -221,4 +221,4 @@ The output file `detections.jsonlines` will contain a single boolean value for e
 ...
 ```
 
- We have stripped out the input data to make the detections easier to identify. Ommitting the argument `--include 'mining'` would show the input data in the detections file.
+ We have stripped out the input data to make the detections easier to identify. Omitting the argument `--include 'mining'` would show the input data in the detections file.
diff --git a/examples/abp_pcap_detection/README.md b/examples/abp_pcap_detection/README.md
@@ -98,7 +98,7 @@ The pipeline will process the input `pcap_dump.jsonlines` sample data and write
 
 ### CLI Example
 The above example is illustrative of using the Python API to build a custom Morpheus Pipeline.
-Alternately the Morpheus command line could have been used to accomplush the same goal by registering the `abp_pcap_preprocessing.py` module as a plugin.
+Alternately the Morpheus command line could have been used to accomplish the same goal by registering the `abp_pcap_preprocessing.py` module as a plugin.
 
 From the root of the Morpheus repo run:
 ```bash

diff --git a/examples/developer_guide/2_1_real_world_phishing/recipient_features_stage.py b/examples/developer_guide/2_1_real_world_phishing/recipient_features_stage.py
@@ -35,7 +35,7 @@ class RecipientFeaturesStage(SinglePortStage):
     config : morpheus.config.Config
         Pipeline configuration instance.
     sep_token : str
-        Bert separator toeken.
+        Bert separator token.
     """
 
     def __init__(self, config: Config, sep_token: str = '[SEP]'):

diff --git a/examples/developer_guide/2_2_rabbitmq/README.md b/examples/developer_guide/2_2_rabbitmq/README.md
@@ -16,7 +16,7 @@ limitations under the License.
 -->
 
 # Example RabbitMQ stages
-This example inclues two stages `RabbitMQSourceStage` and `WriteToRabbitMQStage`
+This example includes two stages `RabbitMQSourceStage` and `WriteToRabbitMQStage`
 
 ## Testing with a RabbitMQ container
 Testing can be performed locally with the RabbitMQ supplied docker image from the [RabbitMQ container registry](https://registry.hub.docker.com/_/rabbitmq/):

diff --git a/examples/digital_fingerprinting/README.md b/examples/digital_fingerprinting/README.md
@@ -18,7 +18,7 @@
 
 ## Organization
 
-The DFP example workflows in Morpheus are designed to scale up to company wide workloads and handle several different log types which resulted in a large number of moving parts to handle the various services and configuration options. To simplify things, the DFP workflow is provided as two separate examples: a simple, "starter" pipeline for new users and a complex, "production" pipeline for full scale deployments. While these two examples both peform the same general tasks, they do so in very different ways. The following is a breakdown of the differences between the two examples.
+The DFP example workflows in Morpheus are designed to scale up to company wide workloads and handle several different log types which resulted in a large number of moving parts to handle the various services and configuration options. To simplify things, the DFP workflow is provided as two separate examples: a simple, "starter" pipeline for new users and a complex, "production" pipeline for full scale deployments. While these two examples both perform the same general tasks, they do so in very different ways. The following is a breakdown of the differences between the two examples.
 
 ### The "Starter" Example
 

diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_mlflow_model_writer.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_mlflow_model_writer.py
@@ -150,7 +150,7 @@ def on_data(self, message: MultiAEMessage):
 
             experiment_name = self.user_id_to_experiment(user_id=user)
 
-            # Creates a new experiment if it doesnt exist
+            # Creates a new experiment if it doesn't exist
             experiment = mlflow.set_experiment(experiment_name)
 
             with mlflow.start_run(run_name="Duo autoencoder model training run",

diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_rolling_window_stage.py b/examples/digital_fingerprinting/production/morpheus/dfp/stages/dfp_rolling_window_stage.py
@@ -84,7 +84,7 @@ def append_dataframe(self, incoming_df: pd.DataFrame) -> bool:
         # Set the filtered index
         filtered_df.index = range(self.total_count, self.total_count + len(filtered_df))
 
-        # Save the row hash to make it easier to find later. Do this before the batch so it doesnt participate
+        # Save the row hash to make it easier to find later. Do this before the batch so it doesn't participate
         filtered_df["_row_hash"] = pd.util.hash_pandas_object(filtered_df, index=False)
 
         # Use batch id to distinguish groups in the same dataframe

diff --git a/examples/digital_fingerprinting/production/morpheus/dfp/utils/model_cache.py b/examples/digital_fingerprinting/production/morpheus/dfp/utils/model_cache.py
@@ -271,7 +271,7 @@ def load_model_cache(self, client: MlflowClient, reg_model_name: str) -> ModelCa
                     latest_versions = client.get_latest_versions(reg_model_name)
 
                     if (len(latest_versions) == 0):
-                        # Databricks doesnt like the `get_latest_versions` method for some reason. Before failing, try
+                        # Databricks doesn't like the `get_latest_versions` method for some reason. Before failing, try
                         # to just get the model and then use latest versions
                         reg_model_obj = client.get_registered_model(reg_model_name)
 

diff --git a/examples/digital_fingerprinting/starter/README.md b/examples/digital_fingerprinting/starter/README.md
@@ -80,7 +80,7 @@ Commands:
   from-azure       Source stage is used to load Azure Active Directory messages.
   from-cloudtrail  Load messages from a Cloudtrail directory
   from-duo         Source stage is used to load Duo Authentication messages.
-  gen-viz          (Deprecated) Write out vizualization data frames
+  gen-viz          (Deprecated) Write out visualization data frames
   inf-pytorch      Perform inference with PyTorch
   inf-triton       Perform inference with Triton
   monitor          Display throughput numbers at a specific point in the
@@ -115,7 +115,7 @@ The following table shows mapping between the main Morpheus CLI commands and und
 
 **Preprocessing stages**
 
-`TrainAEStage` can either train user models using data matching a provided `--train_data_glob` or load pre-trained models from file using `--pretrained_filename`. When using `--train_data_glob`, user models can be saved using the `--models_output_filename` option. The `--source_stage_class` must also be used with `--train_data_glob` so that the training stage knows how to read the training data. The autoencoder implementation from this [fork](https://github.com/efajardo-nv/dfencoder/tree/morpheus-22.08) is used for user model training. The following are the available CLI options for the `TrainAEStage` (train-ae):
+`TrainAEStage` can either train user models using data matching a provided `--train_data_glob` or load pre-trained models from file using `--pretrained_filename`. When using `--train_data_glob`, user models can be saved using the `--models_output_filename` option. The `--source_stage_class` must also be used with `--train_data_glob` so that the training stage knows how to read the training data. The autoencoder implementation used for user model training can be found [here](https://github.com/nv-morpheus/dfencoder). The following are the available CLI options for the `TrainAEStage` (train-ae):
 
 | Option                | Description
 | ----------------------| ---------------------------------------------------------

diff --git a/examples/gnn_fraud_detection_pipeline/README.md b/examples/gnn_fraud_detection_pipeline/README.md
@@ -18,7 +18,7 @@ limitations under the License.
 
 ## Requirements
 
-Prior to running the gnn fruad detection pipeline, additional requirements must be installed in to your conda environment. A supplemental requirements file has been provided in this example directory.
+Prior to running the gnn fraud detection pipeline, additional requirements must be installed in to your conda environment. A supplemental requirements file has been provided in this example directory.
 
 ```bash
 mamba env update -n ${CONDA_DEFAULT_ENV} -f examples/gnn_fraud_detection_pipeline/requirements.yml
@@ -90,7 +90,7 @@ Serialize rate[Complete]: 265messages [00:01, 142.31messages/s]
 ```
 
 ### CLI Example
-The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplush the same goal. To do this we must ensure that the `examples` directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.
+The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplish the same goal. To do this we must ensure that the `examples` directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.
 Note: Since the `gnn_fraud_detection_pipeline` module is visible to Python we can specify the plugins by their module name rather than the more verbose file path.
 
 From the root of the Morpheus repo run:

diff --git a/examples/log_parsing/README.md b/examples/log_parsing/README.md
@@ -95,7 +95,7 @@ Options:
 ```
 
 ### CLI Example
-The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplush the same goal. To do this we must ensure that the `examples`/log_parsing directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.
+The above example is illustrative of using the Python API to build a custom Morpheus Pipeline. Alternately the Morpheus command line could have been used to accomplish the same goal. To do this we must ensure that the `examples`/log_parsing directory is available in the `PYTHONPATH` and each of the custom stages are registered as plugins.
 
 From the root of the Morpheus repo run:
 ```bash
-Original file line number
+Diff line change
@@ Expand Up / @@ -205,7 +205,7 @@ PassThruStage::PassThruStage() : @@
     {}
     ```
-    However, this doesnt illustrate well how to customize a stage. So we will be using the long form signature for our examples.
+    However, this doesn't illustrate well how to customize a stage. So we will be using the long form signature for our examples.
     The `build_operator` method defines an observer who is subscribed to our input `rxcpp::observable`. The observer consists of three functions that are typically lambdas:  `on_next`, `on_error`, and `on_completed`. Typically, these three functions call the associated methods on the output subscriber.
@@ Expand Down @@