Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

23.07 doc fixes #1071

Merged
merged 53 commits into from
Jul 20, 2023
Merged
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
8de52f5
Update minimum driver version to match rapids 23.06
dagardner-nv Jul 18, 2023
22453d1
Update example Kafka output
dagardner-nv Jul 18, 2023
69db452
Update package name for MultiMessage
dagardner-nv Jul 18, 2023
6b1e656
Update package name for MultiMessage
dagardner-nv Jul 18, 2023
ec6ce8e
Update docstring so that it makes sense for both CLI & API users
dagardner-nv Jul 18, 2023
651988b
Since we use cuda-11.8 our actual minimum driver version is 520.61.05…
dagardner-nv Jul 19, 2023
fb2a69c
doca docstring
cwharris Jul 19, 2023
9da618a
update DocaSourceStage docstring
cwharris Jul 19, 2023
fc8156d
Update CLI output for nlp and fil pipelines
dagardner-nv Jul 19, 2023
8744f44
Update CLI output for ae pipeline
dagardner-nv Jul 19, 2023
25292e4
Prevent property methods from being displayed side-by-side
dagardner-nv Jul 19, 2023
ba869e9
Cleanup console output to not wrap
dagardner-nv Jul 19, 2023
cd687c7
Add parquet to list of supported extensions
dagardner-nv Jul 19, 2023
2e3f332
npl and fil are no longer our only two pipeline modes.
dagardner-nv Jul 19, 2023
b17de1d
Use pipeline-other for examples not using a model, set log level to i…
dagardner-nv Jul 19, 2023
1abbf21
Remove unused ignore
dagardner-nv Jul 19, 2023
be77b1d
Fix minimum driver version and fix indentation of closing back-ticks
dagardner-nv Jul 19, 2023
a247e68
Display urls as links, put them in a bullet list
dagardner-nv Jul 19, 2023
574ad11
Add parquet format, and sort filetypes
dagardner-nv Jul 19, 2023
083e705
Change syntax so that it isn't incorrectly highlighting text as python
dagardner-nv Jul 19, 2023
3061e95
Update to match formatting of example
dagardner-nv Jul 19, 2023
86b41be
Refer users to the 3.10 version of the docs for the GIL as the GIL ma…
dagardner-nv Jul 19, 2023
edb9936
Add a test to verify snippet in dev doc ex 3, need to move this somew…
dagardner-nv Jul 19, 2023
d6a35ee
The SerializeStage was refactored and no longer represents the code s…
dagardner-nv Jul 19, 2023
195985e
The SerializeStage was refactored and no longer represents the code s…
dagardner-nv Jul 19, 2023
2204f82
Merge branch 'branch-23.07' into david-docs-23.07p2
dagardner-nv Jul 19, 2023
8f5ec22
IWYU fixes
dagardner-nv Jul 19, 2023
b7da00a
Formatting fix
dagardner-nv Jul 19, 2023
83dd64b
Fix link
dagardner-nv Jul 19, 2023
ea74a60
Document imports
dagardner-nv Jul 19, 2023
6dd07da
Add imports
dagardner-nv Jul 19, 2023
bd216b8
Remove duplicated section
dagardner-nv Jul 19, 2023
9ad67a5
Formatting and fix comparison in code snippet
dagardner-nv Jul 19, 2023
d2f446b
Fix path to example code
dagardner-nv Jul 19, 2023
a6b95bc
Update code snippet to match actual code in example
dagardner-nv Jul 19, 2023
bedb090
Fix typeo in path to source file
dagardner-nv Jul 19, 2023
79483f9
wip
dagardner-nv Jul 19, 2023
dbdb20f
Fixing the documentation build
mdemoret-nv Jul 20, 2023
e153206
Fix paths, add back-ticks around module names
dagardner-nv Jul 20, 2023
84f78b3
Fix heading, update parameters
dagardner-nv Jul 20, 2023
a694157
wip
dagardner-nv Jul 20, 2023
dee8512
Rename serializer.md to match name of module
dagardner-nv Jul 20, 2023
6712b22
wip
dagardner-nv Jul 20, 2023
d17f1e6
Merge branch 'branch-23.07' into david-docs-23.07p2
dagardner-nv Jul 20, 2023
92b8cf3
Cleanup
dagardner-nv Jul 20, 2023
a8fc7b4
Remove irrelevent argument from docstring
dagardner-nv Jul 20, 2023
822d7d2
Update docstrings
dagardner-nv Jul 20, 2023
76e1e8a
Update docstrings and other pylint suggestions
dagardner-nv Jul 20, 2023
ba6857e
Update docstrings and other pylint suggestions
dagardner-nv Jul 20, 2023
d2b4e5a
Formatting fix
dagardner-nv Jul 20, 2023
66deda7
Formatting fix
dagardner-nv Jul 20, 2023
b3a860c
Remove redundant method
dagardner-nv Jul 20, 2023
25fb026
parquet is not supported by the WriteToFileStage
dagardner-nv Jul 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/source/_static/py_properties.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
/* Fix property methods to not be displayed inline */
.property {
display: unset !important;
padding-right: unset !important;
max-width: unset !important;
}
70 changes: 30 additions & 40 deletions docs/source/basics/building_a_pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,10 @@ limitations under the License.
>./scripts/fetch_data.py fetch examples
>```

To build a pipeline via the CLI, users must first specify the type of pipeline, a source object, followed by a sequential list of stages. For each stage, options can be specified to configure the particular stage. Since stages are listed sequentially the output of one stage becomes the input to the next. Unless heavily customized, pipelines start with either:
To build a pipeline via the CLI, users must first specify the type of pipeline, a source object, followed by a sequential list of stages. For each stage, options can be specified to configure the particular stage. Since stages are listed sequentially the output of one stage becomes the input to the next. Unless heavily customized, pipelines typically start with `morpheus run` followed by the pipeline mode such as `pipeline-nlp` or `pipeline-fil`. For example, to run the NLP pipeline, use:

```bash
# For NLP Pipelines
morpheus run pipeline-nlp ...
# For FIL Pipelines
morpheus run pipeline-fil ...
```

While each stage has configuration options, there are options that apply to the pipeline as a whole as well. Check
Expand All @@ -41,7 +38,7 @@ All pipelines configured with the CLI need to start with a source object. Two co

* `from-file`
- Reads from a local file into the Pipeline
- Supports JSON lines format
- Supports CSV, JSON, JSON lines and Parquet formats
- All lines are read at the start and queued into the pipeline at one time. Useful for performance testing.
- Refer to `morpheus.stages.input.file_source_stage.FileSourceStage` for more information
* `from-kafka`
Expand All @@ -53,7 +50,7 @@ All pipelines configured with the CLI need to start with a source object. Two co

From this point on, any number of stages can be sequentially added to the command line from start to finish. For example, we could build a trivial pipeline that reads from a file, deserializes messages, serializes them, and then writes to a file use the following:
```bash
morpheus --log_level=DEBUG run pipeline-nlp --viz_file=.tmp/simple_identity.png \
morpheus --log_level=DEBUG run pipeline-other --viz_file=.tmp/simple_identity.png \
from-file --filename=examples/data/pcap_dump.jsonlines \
deserialize \
serialize \
Expand All @@ -64,31 +61,20 @@ morpheus --log_level=DEBUG run pipeline-nlp --viz_file=.tmp/simple_identity.png
The output should be similar to:
```console
Configuring Pipeline via CLI
Parameter, 'labels_file', with relative path, 'data/labels_nlp.txt', does not exist. Using package relative location: '/home/dagardner/work/morpheus/morpheus/data/labels_nlp.txt'
Loaded labels file. Current labels: [['address', 'bank_acct', 'credit_card', 'email', 'govt_id', 'name', 'password', 'phone_num', 'secret_keys', 'user']]
Starting pipeline via CLI... Ctrl+C to Quit
Config:
{
"ae": null,
"class_labels": [
"address",
"bank_acct",
"credit_card",
"email",
"govt_id",
"name",
"password",
"phone_num",
"secret_keys",
"user"
],
"class_labels": [],
"debug": false,
"edge_buffer_size": 128,
"feature_length": 256,
"fil": null,
"feature_length": 1,
"fil": {
"feature_columns": null
},
"log_config_file": null,
"log_level": 10,
"mode": "NLP",
"mode": "OTHER",
"model_max_batch_size": 8,
"num_threads": 64,
"pipeline_batch_size": 256,
Expand All @@ -97,29 +83,29 @@ Config:
CPP Enabled: True
====Registering Pipeline====
====Building Pipeline====
====Building Segment: linear_segment_0====
====Building Segment Complete!====
====Building Pipeline Complete!====
Starting! Time: 1672959248.7163541
Starting! Time: 1689786614.4988477
====Registering Pipeline Complete!====
====Starting Pipeline====
====Pipeline Started====
Added source: <from-file-0; FileSourceStage(filename=examples/data/pcap_dump.jsonlines, iterative=False, file_type=FileTypes.Auto, repeat=1, filter_null=True, cudf_kwargs=None)>
====Building Segment: linear_segment_0====
Added source: <from-file-0; FileSourceStage(filename=examples/data/pcap_dump.jsonlines, iterative=False, file_type=FileTypes.Auto, repeat=1, filter_null=True)>
└─> morpheus.MessageMeta
Added stage: <deserialize-1; DeserializeStage()>
Added stage: <deserialize-1; DeserializeStage(ensure_sliceable_index=True)>
└─ morpheus.MessageMeta -> morpheus.MultiMessage
Added stage: <serialize-2; SerializeStage(include=(), exclude=('^ID$', '^_ts_'), fixed_columns=True)>
└─ morpheus.MultiMessage -> morpheus.MessageMeta
Added stage: <to-file-3; WriteToFileStage(filename=.tmp/temp_out.json, overwrite=True, file_type=FileTypes.Auto, include_index_col=True)>
Added stage: <to-file-3; WriteToFileStage(filename=.tmp/temp_out.json, overwrite=True, file_type=FileTypes.Auto, include_index_col=True, flush=False)>
└─ morpheus.MessageMeta -> morpheus.MessageMeta
====Building Segment Complete!====
====Pipeline Started====
====Pipeline Complete====
Pipeline visualization saved to .tmp/simple_identity.png
```

### Pipeline Build Checks
After the `====Building Pipeline====` message, if logging is `INFO` or greater, the CLI prints a list of all stages and the type transformations of each stage. To be a valid Pipeline, the output type of one stage must match the input type of the next. Many stages are flexible and determines their type at runtime but some stages require a specific input type. If your Pipeline is configured incorrectly, Morpheus reports the error. For example, if we run the same command as above but forget the `serialize` stage:
After the `====Building Pipeline====` message, if logging is `INFO` or greater, the CLI prints a list of all stages and the type transformations of each stage. To be a valid Pipeline, the output type of one stage must match the input type of the next. Many stages are flexible and determine their type at runtime but some stages require a specific input type. If your Pipeline is configured incorrectly, Morpheus reports the error. For example, if we run the same command as above but forget the `serialize` stage:
```bash
morpheus --log_level=DEBUG run pipeline-nlp \
morpheus --log_level=DEBUG run pipeline-other \
from-file --filename=examples/data/pcap_dump.jsonlines \
deserialize \
to-file --overwrite --filename .tmp/temp_out.json
Expand All @@ -130,8 +116,7 @@ Then the following error displays:
RuntimeError: The to-file stage cannot handle input of <class 'morpheus.messages.multi_message.MultiMessage'>. Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)
```

This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.pipeline.messages.MultiMessage`.
This is because the ``to-file`` stage has no idea how to write that class to a file; it only knows how to write instances of `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine the `Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `MultiMessage`, to `MessageMeta`, which is exactly what the `serialize` stage does.
This indicates that the ``to-file`` stage cannot accept the input type of `morpheus.messages.multi_message.MultiMessage`. This is because the ``to-file`` stage has no idea how to write that class to a file; it only knows how to write instances of `morpheus.messages.message_meta.MessageMeta`. To ensure you have a valid pipeline, examine the `Accepted input types: (<class 'morpheus.messages.message_meta.MessageMeta'>,)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `MultiMessage`, to `MessageMeta`, which is exactly what the `serialize` stage does.

### Kafka Source Example
The above example essentially just copies a file. However, it is an important to note that most Morpheus pipelines are similar in structure, in that they begin with a source stage (`from-file`) followed by a `deserialize` stage, end with a `serialize` stage followed by a sink stage (`to-file`), with the actual training or inference logic occurring in between.
Expand All @@ -141,15 +126,18 @@ We could also easily swap out the source or sink stages in the above example wit
> **Note**: This assumes a Kafka broker running on the localhost listening to port 9092. For testing Morpheus with Kafka follow steps 1-8 in [Quick Launch Kafka Cluster](../developer_guide/contributing.md#quick-launch-kafka-cluster) section of [contributing.md](../developer_guide/contributing.md), creating a topic named `test_pcap` then replace port `9092` with the port your Kafka instance is listening on.

```bash
morpheus --log_level=DEBUG run pipeline-nlp \
morpheus --log_level=DEBUG run pipeline-other \
from-kafka --input_topic test_pcap --bootstrap_servers localhost:9092 \
deserialize \
serialize \
to-file --filename .tmp/temp_out.json
```

## Available Stages
For a complete list of available stages, use the CLI help commands. The available stages can also be queried from the CLI using ``morpheus run pipeline-nlp --help`` or ``morpheus run pipeline-fil --help``.
For a complete list of available stages for a particular pipeline mode, use the CLI help commands. First `morpheus run --help` can be used to list the available pipeline modes. Then `morpheus run <mode> --help` can be used to list the available stages for that mode. For example, to list the available stages for the `pipeline-nlp` mode:
```bash
morpheus run pipeline-nlp --help
```

## Basic Usage Examples

Expand All @@ -160,7 +148,7 @@ This example only copies the fields 'timestamp', 'src_ip' and 'dest_ip' from `ex
![../img/remove_fields_from_json_objects.png](../img/remove_fields_from_json_objects.png)

```bash
morpheus run pipeline-nlp --viz_file=.tmp/remove_fields_from_json_objects.png \
morpheus run pipeline-other --viz_file=.tmp/remove_fields_from_json_objects.png \
from-file --filename examples/data/pcap_dump.jsonlines \
deserialize \
serialize --include 'timestamp' --include 'src_ip' --include 'dest_ip' \
Expand All @@ -174,7 +162,7 @@ This example reports the throughput on the command line.
![../img/monitor_throughput.png](../img/monitor_throughput.png)

```bash
morpheus run pipeline-nlp --viz_file=.tmp/monitor_throughput.png \
morpheus --log_level=INFO run pipeline-other --viz_file=.tmp/monitor_throughput.png \
from-file --filename examples/data/pcap_dump.jsonlines \
deserialize \
monitor --description "Lines Throughput" --smoothing 0.1 --unit "lines" \
Expand All @@ -190,14 +178,16 @@ Lines Throughput[Complete]: 93085 lines [00:03, 29446.18 lines/s]
Pipeline visualization saved to .tmp/monitor_throughput.png
```

> **Note**: By default the monitor stage will omit itself from the pipeline if the `log_level` is set to `WARNING` or below.

### Multi-Monitor Throughput

This example reports the throughput for each stage independently.

![../img/multi_monitor_throughput.png](../img/multi_monitor_throughput.png)

```bash
morpheus run pipeline-nlp --viz_file=.tmp/multi_monitor_throughput.png \
morpheus --log_level=INFO run pipeline-nlp --viz_file=.tmp/multi_monitor_throughput.png \
from-file --filename examples/data/pcap_dump.jsonlines \
monitor --description "From File Throughput" \
deserialize \
Expand Down Expand Up @@ -232,7 +222,7 @@ Follow steps 1-8 in [Quick Launch Kafka Cluster](../developer_guide/contributing
![../img/nlp_kitchen_sink.png](../img/nlp_kitchen_sink.png)

```bash
morpheus run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
morpheus --log_level=INFO run --num_threads=8 --pipeline_batch_size=1024 --model_max_batch_size=32 \
pipeline-nlp --viz_file=.tmp/nlp_kitchen_sink.png \
from-file --filename examples/data/pcap_dump.jsonlines \
deserialize \
Expand Down
25 changes: 14 additions & 11 deletions docs/source/basics/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,15 @@ Similar to the run command, we can get help on the tools:
$ morpheus tools --help
Usage: morpheus tools [OPTIONS] COMMAND [ARGS]...

Tools subcommand

Options:
--help Show this message and exit. [default: False]
--help Show this message and exit.

Commands:
autocomplete Utility for installing/updating/removing shell completion for
Morpheus
onnx-to-trt Converts an ONNX model to a TRT engine
autocomplete Utility for installing/updating/removing shell completion for Morpheus
onnx-to-trt Converts an ONNX model to a TRT engine


The help text will show arguments, options and all possible sub-commands. Help for each of these sub-commands can be
queried in the same manner:
Expand All @@ -91,14 +93,15 @@ queried in the same manner:
$ morpheus tools onnx-to-trt --help
Usage: morpheus tools onnx-to-trt [OPTIONS]

Converts an ONNX model to a TRT engine

Options:
--input_model PATH [required]
--output_model PATH [required]
--batches <INTEGER INTEGER>... [required]
--seq_length INTEGER [required]
--max_workspace_size INTEGER [default: 16000]
--help Show this message and exit. [default:
False]
--input_model PATH [required]
--output_model PATH [required]
--batches <INTEGER INTEGER>... [required]
--seq_length INTEGER [required]
--max_workspace_size INTEGER [default: 16000]
--help Show this message and exit.

AutoComplete
------------
Expand Down
1 change: 1 addition & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -315,6 +315,7 @@ def setup(app):
app.add_css_file('infoboxes.css')
app.add_css_file('params.css')
app.add_css_file('references.css')
app.add_css_file('py_properties.css')


# The following is used by sphinx.ext.linkcode to provide links to github
Expand Down
9 changes: 5 additions & 4 deletions docs/source/developer_guide/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,7 @@ Note: These instructions assume the user is using `mamba` instead of `conda` sin
#### Prerequisites

- Pascal architecture GPU or better
- NVIDIA driver `450.80.02` or higher
- NVIDIA driver `520.61.05` or higher
- [CUDA 11.8](https://developer.nvidia.com/cuda-11-8-0-download-archive)
- `conda` and `mamba`
- Refer to the [Getting Started Guide](https://conda.io/projects/conda/en/latest/user-guide/install/index.html) if `conda` is not already installed
Expand Down Expand Up @@ -297,7 +297,7 @@ Launching a full production Kafka cluster is outside the scope of this project;
```bash
$ echo $KAFKA_ADVERTISED_HOST_NAME
"172.17.0.1"
```
```
6. Launch kafka with 3 instances:

```bash
Expand Down Expand Up @@ -447,5 +447,6 @@ Ex:
---

## Attribution
Portions adopted from https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md \
Portions adopted from https://github.com/dask/dask/blob/master/docs/source/develop.rst
Portions adopted from
* [https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md](https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md)
* [https://github.com/dask/dask/blob/master/docs/source/develop.rst](https://github.com/dask/dask/blob/master/docs/source/develop.rst)
Loading