From bb058c529bae0434fa6d04ab483cdba7db7b5ed2 Mon Sep 17 00:00:00 2001 From: David Gardner <96306125+dagardner-nv@users.noreply.github.com> Date: Fri, 6 Jan 2023 15:08:59 -0800 Subject: [PATCH] Ensure Kafka & Triton deps are documented when used (#598) * Replace `from-kafka` with `from-file` for first example in the `Building a Pipeline` doc * `from-kafka` introduced later as a more advanced example. * Link to the `Quick Launch Kafka Cluster` doc in Kafka example * Merge the `Basic Usage Examples` doc with the `Building a Pipeline` doc. These two docs were similar in nature, and the symlinks from the docs dir into the examples dir broke some links. * Ensure the `.tmp` dir exists, and add `examples/data/dfp` to `.gitignore` fixes #591 Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Bhargav Suryadevara (https://github.com/bsuryadevara) - Pete MacKinnon (https://github.com/pdmack) URL: https://github.com/nv-morpheus/Morpheus/pull/598 --- .gitignore | 1 + .tmp/.gitkeep | 2 + docs/source/basics/building_a_pipeline.md | 256 ++++++++++++++++++ docs/source/basics/building_a_pipeline.rst | 104 ------- docs/source/examples.md | 1 - docs/source/examples/basic_usage/README.md | 1 - docs/source/examples/basic_usage/img | 1 - docs/source/examples/index.rst | 1 - .../source}/img/monitor_throughput.png | 0 .../source}/img/multi_monitor_throughput.png | 0 .../source}/img/nlp_kitchen_sink.png | 0 .../img/remove_fields_from_json_objects.png | 0 docs/source/img/simple_identity.png | 3 + examples/basic_usage/README.md | 127 --------- examples/basic_usage/img/simple_identity.png | 3 - scripts/validation/kafka_testing.md | 2 +- 16 files changed, 263 insertions(+), 239 deletions(-) create mode 100644 .tmp/.gitkeep create mode 100644 docs/source/basics/building_a_pipeline.md delete mode 100644 docs/source/basics/building_a_pipeline.rst delete mode 120000 docs/source/examples/basic_usage/README.md delete mode 120000 docs/source/examples/basic_usage/img rename {examples/basic_usage => docs/source}/img/monitor_throughput.png (100%) rename {examples/basic_usage => docs/source}/img/multi_monitor_throughput.png (100%) rename {examples/basic_usage => docs/source}/img/nlp_kitchen_sink.png (100%) rename {examples/basic_usage => docs/source}/img/remove_fields_from_json_objects.png (100%) create mode 100644 docs/source/img/simple_identity.png delete mode 100644 examples/basic_usage/README.md delete mode 100644 examples/basic_usage/img/simple_identity.png diff --git a/.gitignore b/.gitignore index 22c36327a8..613f5af976 100755 --- a/.gitignore +++ b/.gitignore @@ -182,6 +182,7 @@ tags # End of https://www.gitignore.io/api/vim,c++,cmake,python,synology .tmp +examples/data/dfp viz_frames* notebooks/output dask-worker-space diff --git a/.tmp/.gitkeep b/.tmp/.gitkeep new file mode 100644 index 0000000000..cd19074234 --- /dev/null +++ b/.tmp/.gitkeep @@ -0,0 +1,2 @@ +Ensure that the .tmp dir exists as many scripts and examples depend on this dir. +Everything else in this dir is ignored by .gitignore. diff --git a/docs/source/basics/building_a_pipeline.md b/docs/source/basics/building_a_pipeline.md new file mode 100644 index 0000000000..6f782f8615 --- /dev/null +++ b/docs/source/basics/building_a_pipeline.md @@ -0,0 +1,256 @@ + + +# Building a Pipeline +> **Prerequisites** +> The following examples assume that the example dataset has been fetched, from the root of the Morpheus repo run: +>```bash +>./scripts/fetch_data.py fetch examples +>``` + +To build a pipeline via the CLI, users must first specify the type of pipeline, a source object, followed by a sequential list of stages. For each stage, options can be specified to configure the particular stage. Since stages are listed sequentially the output of one stage becomes the input to the next. Unless heavily customized, pipelines will start with either: + +```bash +# For NLP Pipelines +morpheus run pipeline-nlp ... +# For FIL Pipelines +morpheus run pipeline-fil ... +``` + +While each stage will have configuration options, there are options that apply to the pipeline as a whole as well. Check +``morpheus run --help``, ``morpheus run pipeline-nlp --help`` and ``morpheus run pipeline-fil --help`` for these global +Pipeline options. + +## Source Stages + +All pipelines configured with the CLI need to start with a source object. Two commonly used source stages included with Morpheus are: + +* `from-file` + - Reads from a local file into the Pipeline + - Supports JSON lines format + - All lines are read at the start and queued into the pipeline at one time. Useful for performance testing. + - Refer to `morpheus.stages.input.file_source_stage.FileSourceStage` for more information +* `from-kafka` + - Pulls messages from a Kafka cluster into the Pipeline + - Kafka cluster can be running on the localhost or remotely + - Refer to `morpheus.stages.input.kafka_source_stage.KafkaSourceStage` for more information + +## Stages + +From this point on, any number of stages can be sequentially added to the command line from start to finish. For example, we could build a trivial pipeline that reads from a file, deserializes messages, serializes them, and then writes to a file, use the following: +```bash +morpheus --log_level=DEBUG run pipeline-nlp --viz_file=.tmp/simple_identity.png \ + from-file --filename=examples/data/pcap_dump.jsonlines \ + deserialize \ + serialize \ + to-file --overwrite --filename .tmp/temp_out.json +``` +![../img/simple_identity.png](../img/simple_identity.png) + +The output should be similar to: +```console +Configuring Pipeline via CLI +Parameter, 'labels_file', with relative path, 'data/labels_nlp.txt', does not exist. Using package relative location: '/home/dagardner/work/morpheus/morpheus/data/labels_nlp.txt' +Loaded labels file. Current labels: [['address', 'bank_acct', 'credit_card', 'email', 'govt_id', 'name', 'password', 'phone_num', 'secret_keys', 'user']] +Starting pipeline via CLI... Ctrl+C to Quit +Config: +{ + "ae": null, + "class_labels": [ + "address", + "bank_acct", + "credit_card", + "email", + "govt_id", + "name", + "password", + "phone_num", + "secret_keys", + "user" + ], + "debug": false, + "edge_buffer_size": 128, + "feature_length": 256, + "fil": null, + "log_config_file": null, + "log_level": 10, + "mode": "NLP", + "model_max_batch_size": 8, + "num_threads": 64, + "pipeline_batch_size": 256, + "plugins": [] +} +CPP Enabled: True +====Registering Pipeline==== +====Building Pipeline==== +====Building Segment: linear_segment_0==== +====Building Segment Complete!==== +====Building Pipeline Complete!==== +Starting! Time: 1672959248.7163541 +====Registering Pipeline Complete!==== +====Starting Pipeline==== +====Pipeline Started==== +Added source: + └─> morpheus.MessageMeta +Added stage: + └─ morpheus.MessageMeta -> morpheus.MultiMessage +Added stage: + └─ morpheus.MultiMessage -> morpheus.MessageMeta +Added stage: + └─ morpheus.MessageMeta -> morpheus.MessageMeta +====Pipeline Complete==== +Pipeline visualization saved to .tmp/simple_identity.png +``` + +### Pipeline Build Checks +After the `====Building Pipeline====` message, if logging is `INFO` or greater, the CLI will print a list of all stages and the type transformations of each stage. To be a valid Pipeline, the output type of one stage must match the input type of the next. Many stages are flexible and will determine their type at runtime but some stages require a specific input type. If your Pipeline is configured incorrectly, Morpheus will report the error. For example, if we run the same command as above but forget the `serialize` stage: +```bash +morpheus --log_level=DEBUG run pipeline-nlp \ + from-file --filename=examples/data/pcap_dump.jsonlines \ + deserialize \ + to-file --overwrite --filename .tmp/temp_out.json +``` + +Then the following error will be displayed: +``` +RuntimeError: The to-file stage cannot handle input of . Accepted input types: (,) +``` + +This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.pipeline.messages.MultiMessage`. +This is because the ``to-file`` stage has no idea how to write that class to a file, it only knows how to write instances of `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine the `Accepted input types: (,)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `MultiMessage`, to `MessageMeta`, which is exactly what the `serialize` stage does. + +### Kafka Source Example +The above example essentially just copies a file. However it is an important to note that most Morpheus pipelines are similar in structure, in that they begin with a source stage (`from-file`) followed by a `deserialize` stage, end with a `serialize` stage followed by a sink stage (`to-file`), with the actual training or inference logic occurring in between. + +We could also easily swap out the source or sink stages in the above example without any impact to the pipeline as a whole. For example, to read from a Kafka topic, simply replace the `from-file` stage with `from-kafka`: + +> **Note**: This assumes a Kafka broker running on the localhost listening to port 9092. For testing Morpheus with Kafka follow steps 1-8 in [Quick Launch Kafka Cluster](../developer_guide/contributing.md#quick-launch-kafka-cluster) section of [contributing.md](../developer_guide/contributing.md), creating a topic named `test_pcap` then replace port `9092` with the port your Kafka instance is listening on. + +```bash +morpheus --log_level=DEBUG run pipeline-nlp \ + from-kafka --input_topic test_pcap --bootstrap_servers localhost:9092 \ + deserialize \ + serialize \ + to-file --filename .tmp/temp_out.json +``` + +## Available Stages +For a complete list of available stages, use the CLI help commands. The available stages can also be queried from the CLI using ``morpheus run pipeline-nlp --help`` or ``morpheus run pipeline-fil --help``. + +## Basic Usage Examples + +### Remove Fields from JSON Objects +This example will only copy the fields 'timestamp', 'src_ip' and 'dest_ip' from `examples/data/pcap_dump.jsonlines` to +`out.jsonlines`. + +![../img/remove_fields_from_json_objects.png](../img/remove_fields_from_json_objects.png) + +```bash +morpheus run pipeline-nlp --viz_file=.tmp/remove_fields_from_json_objects.png \ + from-file --filename examples/data/pcap_dump.jsonlines \ + deserialize \ + serialize --include 'timestamp' --include 'src_ip' --include 'dest_ip' \ + to-file --overwrite --filename out.jsonlines +``` + +### Monitor Throughput + +This example will report the throughput on the command line. + +![../img/monitor_throughput.png](../img/monitor_throughput.png) + +```bash +morpheus run pipeline-nlp --viz_file=.tmp/monitor_throughput.png \ + from-file --filename examples/data/pcap_dump.jsonlines \ + deserialize \ + monitor --description "Lines Throughput" --smoothing 0.1 --unit "lines" \ + serialize \ + to-file --overwrite --filename out.jsonlines +``` + +Output: +```console +Configuring Pipeline via CLI +Starting pipeline via CLI... Ctrl+C to Quit +Lines Throughput[Complete]: 93085 lines [00:03, 29446.18 lines/s] +Pipeline visualization saved to .tmp/monitor_throughput.png +``` + +### Multi-Monitor Throughput + +This example will report the throughput for each stage independently. + +![../img/multi_monitor_throughput.png](../img/multi_monitor_throughput.png) + +```bash +morpheus run pipeline-nlp --viz_file=.tmp/multi_monitor_throughput.png \ + from-file --filename examples/data/pcap_dump.jsonlines \ + monitor --description "From File Throughput" \ + deserialize \ + monitor --description "Deserialize Throughput" \ + serialize \ + monitor --description "Serialize Throughput" \ + to-file --filename out.jsonlines --overwrite +``` + +Output: +```console +Configuring Pipeline via CLI +Starting pipeline via CLI... Ctrl+C to Quit +From File Throughput[Complete]: 93085 messages [00:00, 168118.35 messages/s] +Deserialize Throughput[Complete]: 93085 messages [00:04, 22584.37 messages/s] +Serialize Throughput[Complete]: 93085 messages [00:06, 14095.36 messages/s] +Pipeline visualization saved to .tmp/multi_monitor_throughput.png +``` + +### NLP Kitchen Sink +This example shows an NLP Pipeline which uses several stages available in Morpheus. This example utilizes the Triton Inference Server to perform inference, and writes the output to a Kafka topic named `inference_output`. Both of which need to be started prior to launching Morpheus. + +#### Launching Triton +From the Morpheus repo root directory, run the following to launch Triton and load the `sid-minibert` model: +```bash +docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 -v $PWD/models:/models nvcr.io/nvidia/tritonserver:22.08-py3 tritonserver --model-repository=/models/triton-model-repo --exit-on-error=false --model-control-mode=explicit --load-model sid-minibert-onnx +``` + +#### Launching Kafka +Follow steps 1-8 in [Quick Launch Kafka Cluster](../developer_guide/contributing.md#quick-launch-kafka-cluster) section of [contributing.md](../developer_guide/contributing.md), creating a topic named `inference_output` then replace port `9092` with the port your Kafka instance is listening on. + +![../img/nlp_kitchen_sink.png](../img/nlp_kitchen_sink.png) + +```bash +morpheus run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \ + pipeline-nlp --viz_file=.tmp/nlp_kitchen_sink.png \ + from-file --filename examples/data/pcap_dump.jsonlines \ + deserialize \ + preprocess \ + inf-triton --model_name=sid-minibert-onnx --server_url=localhost:8001 \ + monitor --description "Inference Rate" --smoothing=0.001 --unit "inf" \ + add-class \ + filter --threshold=0.8 \ + serialize --include 'timestamp' --exclude '^_ts_' \ + to-kafka --bootstrap_servers localhost:9092 --output_topic "inference_output" \ + monitor --description "ToKafka Rate" --smoothing=0.001 --unit "msg" +``` + +Output: +```console +Configuring Pipeline via CLI +Starting pipeline via CLI... Ctrl+C to Quit +Inference Rate[Complete]: 93085 inf [00:07, 12334.49 inf/s] +ToKafka Rate[Complete]: 93085 msg [00:07, 13297.85 msg/s] +Pipeline visualization saved to .tmp/nlp_kitchen_sink.png +``` diff --git a/docs/source/basics/building_a_pipeline.rst b/docs/source/basics/building_a_pipeline.rst deleted file mode 100644 index c3fa9d909e..0000000000 --- a/docs/source/basics/building_a_pipeline.rst +++ /dev/null @@ -1,104 +0,0 @@ -.. - SPDX-FileCopyrightText: Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved. - SPDX-License-Identifier: Apache-2.0 - - Licensed under the Apache License, Version 2.0 (the "License"); - you may not use this file except in compliance with the License. - You may obtain a copy of the License at - - http://www.apache.org/licenses/LICENSE-2.0 - - Unless required by applicable law or agreed to in writing, software - distributed under the License is distributed on an "AS IS" BASIS, - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - See the License for the specific language governing permissions and - limitations under the License. - -Building a Pipeline -=================== - -To build a pipeline via the CLI, users must first specify the type of pipeline, a source object, followed by a sequential list of stages. For each stage, options can be specified to configure the particular stage. Since stages are listed sequentially the output of one stage becomes the input to the next. Unless heavily customized, pipelines will start with either: - -.. code-block:: bash - - # For NLP Pipelines - morpheus run pipeline-nlp ... - # For FIL Pipelines - morpheus run pipeline-fil ... - -While each stage will have configuration options, there are options that apply to the pipeline as a whole as well. Check -``morpheus run --help``, ``morpheus run pipeline-nlp --help`` and ``morpheus run pipeline-fil --help`` for these global -Pipeline options. - -Source Stages -^^^^^^^^^^^^^ - -All pipelines configured with the CLI need to start with a source object. Currently Morpheus supports two source types: - - * ``from-kafka`` - - Pulls messages from a Kafka cluster into the Pipeline - - Kafka cluster can be remote or local - - Refer to :py:obj:`~morpheus.pipeline.input.from_kafka.KafkaSourceStage` for more information - * ``from-file`` - - Reads from a local file into the Pipeline - - Supports JSON lines format - - All lines are read at the start and queued into the pipeline at one time. Useful for performance testing. - - Refer to :py:obj:`~morpheus.pipeline.input.from_file.FileSourceStage` for more information - -Stages -^^^^^^ - -From this point on, any number of stages can be sequentially added to the command line from start to finish. For example, to build a simple pipeline that reads from kafka, deserializes messages, serializes them, and then writes to a file, use the following: - -.. code-block:: console - - $ morpheus --log_level=DEBUG run pipeline-nlp \ - from-kafka --input_topic test_pcap \ - deserialize \ - serialize \ - to-file --filename .tmp/temp_out.json - ... - ====Building Pipeline==== - Added source: from-kafka-0 - └─> cudf.DataFrame - Added stage: deserialize-1 - └─ cudf.DataFrame -> morpheus.MultiMessage - Added stage: serialize-2 - └─ morpheus.MultiMessage -> List[str] - Added stage: to-file-3 - └─ List[str] -> List[str] - ====Building Pipeline Complete!==== - ... - -After the ``====Building Pipeline====`` message, if logging is ``INFO`` or greater, the CLI will print a list of all -stages and the type transformations of each stage. To be a valid Pipeline, the output type of one stage must match the -input type of the next. Many stages are flexible and will determine their type at runtime but some stages require a -specific input type. If your Pipeline is configured incorrectly, Morpheus will report the error. For example, if we run -the same command as above but forget the ``serialize`` stage, the following will be displayed: - -.. code-block:: console - - $ morpheus --log_level=DEBUG run pipeline-nlp \ - from-kafka --input_topic test_pcap \ - deserialize \ - to-file --filename .tmp/temp_out.json - ... - - ====Building Pipeline==== - Added source: from-file-0 - └─> cudf.DataFrame - Added stage: buffer-1 - └─ cudf.DataFrame -> cudf.DataFrame - Error occurred during Pipeline.build(). Exiting. - RuntimeError: The preprocess-nlp stage cannot handle input of . Accepted input types: (, typing.StreamFuture[morpheus.pipeline.messages.MultiMessage]) - -This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.pipeline.messages.MultiMessage`. -This is because the ``to-file`` stage has no idea how to write that class to a file, it only knows how to write strings. -To ensure you have a valid pipeline, examine the ``Accepted input types: (typing.List[str],)`` portion of the message. -This indicates you need a stage that converts from the output type of the ``deserialize`` stage, -`morpheus.pipeline.messages.MultiMessage`, to `typing.List[str]`, which is exactly what the ``serialize`` stage does. - -Available Stages -^^^^^^^^^^^^^^^^ - -For a complete list of available stages, use the CLI help commands. The available stages can also be queried from the CLI using ``morpheus run pipeline-nlp --help`` or ``morpheus run pipeline-fil --help``. diff --git a/docs/source/examples.md b/docs/source/examples.md index 5134d7d474..422fac1738 100644 --- a/docs/source/examples.md +++ b/docs/source/examples.md @@ -16,7 +16,6 @@ limitations under the License. --> # Examples -* [Basic CLI Usage](./examples/basic_usage/README.md) * [Anomalous Behavior Profiling with Forest Inference Library (FIL) Example](./examples/abp_nvsmi_detection/README.md) * [ABP Detection Example Using Morpheus](./examples/abp_pcap_detection/README.md) * [GNN Fraud Detection Pipeline](./examples/gnn_fraud_detection_pipeline/README.md) diff --git a/docs/source/examples/basic_usage/README.md b/docs/source/examples/basic_usage/README.md deleted file mode 120000 index 1d5c3fb20c..0000000000 --- a/docs/source/examples/basic_usage/README.md +++ /dev/null @@ -1 +0,0 @@ -../../../../examples/basic_usage/README.md \ No newline at end of file diff --git a/docs/source/examples/basic_usage/img b/docs/source/examples/basic_usage/img deleted file mode 120000 index 976ee33c16..0000000000 --- a/docs/source/examples/basic_usage/img +++ /dev/null @@ -1 +0,0 @@ -../../../../examples/basic_usage/img \ No newline at end of file diff --git a/docs/source/examples/index.rst b/docs/source/examples/index.rst index 7929e214a9..d6bf58eaf8 100644 --- a/docs/source/examples/index.rst +++ b/docs/source/examples/index.rst @@ -5,7 +5,6 @@ Examples .. toctree:: :maxdepth: 20 - basic_usage/README.md abp_nvsmi_detection/README.md abp_pcap_detection/README.md gnn_fraud_detection_pipeline/README.md diff --git a/examples/basic_usage/img/monitor_throughput.png b/docs/source/img/monitor_throughput.png similarity index 100% rename from examples/basic_usage/img/monitor_throughput.png rename to docs/source/img/monitor_throughput.png diff --git a/examples/basic_usage/img/multi_monitor_throughput.png b/docs/source/img/multi_monitor_throughput.png similarity index 100% rename from examples/basic_usage/img/multi_monitor_throughput.png rename to docs/source/img/multi_monitor_throughput.png diff --git a/examples/basic_usage/img/nlp_kitchen_sink.png b/docs/source/img/nlp_kitchen_sink.png similarity index 100% rename from examples/basic_usage/img/nlp_kitchen_sink.png rename to docs/source/img/nlp_kitchen_sink.png diff --git a/examples/basic_usage/img/remove_fields_from_json_objects.png b/docs/source/img/remove_fields_from_json_objects.png similarity index 100% rename from examples/basic_usage/img/remove_fields_from_json_objects.png rename to docs/source/img/remove_fields_from_json_objects.png diff --git a/docs/source/img/simple_identity.png b/docs/source/img/simple_identity.png new file mode 100644 index 0000000000..f0dc6fc7ce --- /dev/null +++ b/docs/source/img/simple_identity.png @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d6e5f5f50533b3d8426a91be82d94dc9d86a301e56a337f1373f572a36aee0fc +size 15381 diff --git a/examples/basic_usage/README.md b/examples/basic_usage/README.md deleted file mode 100644 index d491c45a76..0000000000 --- a/examples/basic_usage/README.md +++ /dev/null @@ -1,127 +0,0 @@ - - -# Basic Usage Examples - -## Simple Identity - -This example will copy the values from Kafka into `out.jsonlines`. - -![img/simple_identity.png](img/simple_identity.png) - -```bash -morpheus run pipeline-nlp --viz_file=basic_usage_img/simple_identity.png \ - from-kafka --bootstrap_servers localhost:9092 --input_topic test_pcap \ - deserialize \ - serialize \ - to-file --overwrite --filename out.jsonlines -``` - -## Remove Fields from JSON Objects - -This example will only copy the fields 'timestamp', 'src_ip' and 'dest_ip' from `examples/data/pcap_dump.jsonlines` to -`out.jsonlines`. - -![img/remove_fields_from_json_objects.png](img/remove_fields_from_json_objects.png) - -```bash -morpheus run pipeline-nlp --viz_file=basic_usage_img/remove_fields_from_json_objects.png \ - from-file --filename examples/data/pcap_dump.jsonlines \ - deserialize \ - serialize --include 'timestamp' --include 'src_ip' --include 'dest_ip' \ - to-file --overwrite --filename out.jsonlines -``` - -## Monitor Throughput - -This example will report the throughput on the command line. - -![img/monitor_throughput.png](img/monitor_throughput.png) - -```bash -morpheus run pipeline-nlp --viz_file=basic_usage_img/monitor_throughput.png \ - from-file --filename examples/data/pcap_dump.jsonlines \ - deserialize \ - monitor --description "Lines Throughput" --smoothing 0.1 --unit "lines" \ - serialize \ - to-file --overwrite --filename out.jsonlines -``` - -Output: -```console -Configuring Pipeline via CLI -Starting pipeline via CLI... Ctrl+C to Quit -Lines Throughput[Complete]: 93085 lines [00:04, 19261.06 lines/s] -Pipeline visualization saved to basic_usage_img/monitor_throughput.png -``` - -## Multi-Monitor Throughput - -This example will report the throughput for each stage independently. - -![img/multi_monitor_throughput.png](img/multi_monitor_throughput.png) - -```bash -morpheus run pipeline-nlp --viz_file=basic_usage_img/multi_monitor_throughput.png \ - from-file --filename examples/data/pcap_dump.jsonlines \ - monitor --description "From File Throughput" \ - deserialize \ - monitor --description "Deserialize Throughput" \ - serialize \ - monitor --description "Serialize Throughput" \ - to-file --filename out.jsonlines --overwrite -``` - -Output: -```console -Configuring Pipeline via CLI -Starting pipeline via CLI... Ctrl+C to Quit -From File Throughput[Complete]: 93085 messages [00:00, 93852.05 messages/s] -Deserialize Throughput[Complete]: 93085 messages [00:05, 16898.32 messages/s] -Serialize Throughput[Complete]: 93085 messages [00:08, 11110.10 messages/s] -Pipeline visualization saved to basic_usage_img/multi_monitor_throughput.png -``` - -## NLP Kitchen Sink - -This example shows an NLP Pipeline which uses most stages available in Morpheus. - -![img/nlp_kitchen_sink.png](img/nlp_kitchen_sink.png) - -```bash -morpheus run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \ - pipeline-nlp --viz_file=basic_usage_img/nlp_kitchen_sink.png \ - from-file --filename examples/data/pcap_dump.jsonlines \ - deserialize \ - preprocess \ - inf-triton --model_name=sid-minibert-onnx --server_url=localhost:8001 \ - monitor --description "Inference Rate" --smoothing=0.001 --unit "inf" \ - add-class \ - filter --threshold=0.8 \ - serialize --include 'timestamp' --exclude '^_ts_' \ - to-kafka --bootstrap_servers localhost:9092 --output_topic "inference_output" \ - monitor --description "ToKafka Rate" --smoothing=0.001 --unit "msg" -``` - -Output: -```console -Configuring Pipeline via CLI -Starting pipeline via CLI... Ctrl+C to Quit -Inference Rate[Complete]: 93085 inf [00:07, 12334.49 inf/s] -ToKafka Rate[Complete]: 93085 msg [00:07, 13297.85 msg/s] -Pipeline visualization saved to basic_usage_img/nlp_kitchen_sink.png -``` diff --git a/examples/basic_usage/img/simple_identity.png b/examples/basic_usage/img/simple_identity.png deleted file mode 100644 index ac8fbfd4c3..0000000000 --- a/examples/basic_usage/img/simple_identity.png +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:f3c5391a701ab615a389d73ec2837a9b1ed839d222f8bfe71514e037fb47d7f1 -size 15645 diff --git a/scripts/validation/kafka_testing.md b/scripts/validation/kafka_testing.md index 123f97a5b4..0a368155f9 100644 --- a/scripts/validation/kafka_testing.md +++ b/scripts/validation/kafka_testing.md @@ -23,7 +23,7 @@ pytest --run_slow --run_kafka ```bash mamba install -c conda-forge jq ``` -1. Launch Kafka using instructions from the [Quick Launch Kafka Cluster](../../CONTRIBUTING.md#quick-launch-kafka-cluster) section of [CONTRIBUTING.md](../../CONTRIBUTING.md) following steps 1-6. +1. Launch Kafka using instructions from the [Quick Launch Kafka Cluster](../../docs/source/developer_guide/contributing.md#quick-launch-kafka-cluster) section of [contributing.md](../../docs/source/developer_guide/contributing.md) following steps 1-6. 1. The testing steps below will require two separate terminal windows. Each will need to have the `KAFKA_ADVERTISED_HOST_NAME`, `BROKER_LIST` and `MORPHEUS_ROOT` environment variables set. In the example below both morpheus and kafka-docker repositories have been checked out into the `~work` directory, replacing these paths with the location of your checkouts. ```bash