Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure the getting started guide #536

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
0409bbf
removed the $ symbol from all bash commands
bsuryadevara Oct 11, 2022
a608298
Remove exclamation, and morpheus owns the docs
dagardner-nv Dec 2, 2022
74648f9
Include readme's from examples in the source
dagardner-nv Dec 2, 2022
fcb6e9c
Non-owning
dagardner-nv Dec 2, 2022
296e5cd
Replace usage of words like 'see' and 'look' in documentation
dagardner-nv Dec 6, 2022
254126b
Remove explcit calls to pipeline.build, this is no longer needed
dagardner-nv Dec 6, 2022
6fbeda7
Replace unseen with previously undetected
dagardner-nv Dec 6, 2022
9dedfb8
The logs are anonymized
dagardner-nv Dec 6, 2022
530b40e
Merge branch 'branch-23.01' into david-23.01-docs
dagardner-nv Dec 6, 2022
94da28d
Update the github url for Morpheus
dagardner-nv Dec 6, 2022
20b8651
Use main branch
dagardner-nv Dec 6, 2022
5b4080b
Add a toc link to the code of conduct
dagardner-nv Dec 7, 2022
12f2008
Shorten the title
dagardner-nv Dec 7, 2022
2363ee0
Merge branch 'branch-23.01' into david-23.01-docs-reorg
dagardner-nv Dec 7, 2022
8619f53
Merge branch 'branch-23.01' into david-23.01-docs
dagardner-nv Dec 7, 2022
9b40ad2
Move most of the contents of README to a new file under docs/source
dagardner-nv Dec 7, 2022
1c9cc13
Add getting_started & contributing guides
dagardner-nv Dec 7, 2022
c76a083
Link to developer guides from readme, guides doc converted from rst t…
dagardner-nv Dec 7, 2022
73507f6
Merge branch 'branch-23.01' into david-23.01-docs
dagardner-nv Dec 7, 2022
e4f7ddc
Merge branch 'david-23.01-docs' of github.com:dagardner-nv/Morpheus i…
dagardner-nv Dec 7, 2022
9781e58
Merge branch 'david-23.01-docs' into david-23.01-docs-reorg
dagardner-nv Dec 7, 2022
56502e7
Replace NGC intro with a list of getting started docs
dagardner-nv Dec 7, 2022
d5b9310
wip
dagardner-nv Dec 7, 2022
f37ee41
Move the contributing symlink
dagardner-nv Dec 7, 2022
9b0a9d9
wip
dagardner-nv Dec 7, 2022
4cd561d
wip
dagardner-nv Dec 7, 2022
e1602a7
Replace smart quotes
dagardner-nv Dec 7, 2022
c96429a
Sphinx wants the section to be named 'Examples' and it wants the inte…
dagardner-nv Dec 8, 2022
dac60f7
The myst parser doesn't like 'yml' as an alias for 'yaml'
dagardner-nv Dec 8, 2022
e653d64
Fix links
dagardner-nv Dec 8, 2022
c5ac203
TODO was causing a parse error in sphinx
dagardner-nv Dec 8, 2022
764e01f
Fix formatting for docstrings
dagardner-nv Dec 8, 2022
2d66cd5
Fix what appears to be a cut-off docstring
dagardner-nv Dec 8, 2022
3266860
Merge branch 'branch-23.01' into david-23.01-docs-reorg
dagardner-nv Dec 8, 2022
0f52edf
Misc docstring fixes
dagardner-nv Dec 8, 2022
19b2ef0
Restructure symlinks for examples, preventing name conflicts and addi…
dagardner-nv Dec 8, 2022
62ecd75
Removed usage of buffer
bsuryadevara Dec 8, 2022
4df69fe
Merge branch 'branch-23.01' into 524-doc-remove-usage-of-buffer-stage…
bsuryadevara Dec 8, 2022
aa8d452
Removed usage of buffer
bsuryadevara Dec 8, 2022
8a48e62
Removed usage of buffer
bsuryadevara Dec 8, 2022
3f1e456
Update link to options reference
dagardner-nv Dec 8, 2022
38b56b6
Update docs list
dagardner-nv Dec 8, 2022
a1189ab
updated monitor throughput image
bsuryadevara Dec 8, 2022
b147b77
Morpheus is the name of the project, and Morpheus is an SDK so "Morph…
dagardner-nv Dec 8, 2022
fe3b3f8
Morpheus is just Morpheus, removing SDK and SDK client
dagardner-nv Dec 8, 2022
33ac925
Update our elevator pitch paragraph to the version in the README.md file
dagardner-nv Dec 9, 2022
6af4992
Add license to md files
dagardner-nv Dec 9, 2022
8a63ca6
wip
dagardner-nv Dec 9, 2022
6a46175
wip
dagardner-nv Dec 9, 2022
298dd0b
Markdown doesn't support a table-of-contents that is usable in the na…
dagardner-nv Dec 9, 2022
46e14c5
wip
dagardner-nv Dec 9, 2022
87d950c
Remove usage of 'see'
dagardner-nv Dec 9, 2022
39a7984
Merge branch 'branch-23.01' into david-docs-23.01-getting-started
dagardner-nv Dec 9, 2022
2e6a785
replace referneces to the quickstart guide
dagardner-nv Dec 9, 2022
9b744b1
resolved conflicts
bsuryadevara Dec 9, 2022
7040acb
wip
dagardner-nv Dec 9, 2022
14245ed
resolved conflicts
bsuryadevara Dec 9, 2022
cb4b7f6
Merge branch '524-doc-remove-usage-of-buffer-stage-from-examplesrst' …
dagardner-nv Dec 9, 2022
3572442
Move the basics/examples.rst under the main examples to avoid having …
dagardner-nv Dec 10, 2022
978d2a2
symlink to include basic usage
dagardner-nv Dec 10, 2022
a2bdcc4
Add link to images
dagardner-nv Dec 10, 2022
4ba6c27
Merge branch 'branch-23.01' into david-docs-23.01-getting-started
dagardner-nv Dec 10, 2022
43c6737
Moving to lfs
dagardner-nv Dec 10, 2022
9031598
Update path to basic usage lfs images
dagardner-nv Dec 10, 2022
870dd4c
Moving to lfs
dagardner-nv Dec 10, 2022
437d50f
A few more tweaks
dagardner-nv Dec 10, 2022
14a8e55
Move links to the examples, feature them as one of the top-level links
dagardner-nv Dec 10, 2022
a9d9393
Merge branch 'branch-23.01' into david-23.01-docs-reorg
dagardner-nv Dec 13, 2022
7af783e
Merge branch 'david-23.01-docs-reorg' into david-23.01-docs-sdk
dagardner-nv Dec 13, 2022
14d97b6
Merge branch 'david-23.01-docs-sdk' into david-docs-23.01-getting-sta…
dagardner-nv Dec 13, 2022
df9cbe3
Merge branch 'branch-23.01' into david-docs-23.01-getting-started
dagardner-nv Dec 13, 2022
dfb0135
Having a custom stage no longer requires the python api
dagardner-nv Dec 13, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Move most of the contents of README to a new file under docs/source
  • Loading branch information
dagardner-nv committed Dec 7, 2022
commit 9b40ad239ee84d51b00113598a1dce24912b2450
316 changes: 4 additions & 312 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,316 +5,8 @@
NVIDIA Morpheus is an open AI application framework that provides cybersecurity developers with a highly optimized AI framework and pre-trained AI capabilities that allow them to instantaneously inspect all IP traffic across their data center fabric. The Morpheus developer framework allows teams to build their own optimized pipelines that address cybersecurity and information security use cases. Bringing a new level of security to data centers, Morpheus provides development capabilities around dynamic protection, real-time telemetry, adaptive policies, and cyber defenses for detecting and remediating cybersecurity threats.

## Documentation
Full documentation (including a quick start guide, a developer/user guide, and API documentation) is available online at [https://docs.nvidia.com/morpheus/](https://docs.nvidia.com/morpheus/).
* [Getting Started with Morpheus](.docs/source/getting_started.md)
* [Contributing to Morpheus](./CONTRIBUTING.md).
* [Developer Guides] TODO

## Getting Started with Morpheus
There are three ways to get started with Morpheus:
- Using pre-built Docker containers
- Building the Morpheus Docker container
- Building Morpheus from source

The pre-built Docker containers are the easiest way to get started with the latest release of Morpheus. Instructions on how to download and run these containers, including the necessary data and models, can be found on NGC [here](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/morpheus/collections/morpheus_).

More advanced users, or those who are interested in using the latest pre-release features, will need to build the Morpheus container or build from source. Step-by-step instructions for these users can be found in the following section.

### Prerequisites
The following sections must be followed prior to building the Morpheus container or building Morpheus from source.

#### Requirements
- Pascal architecture GPU or better
- NVIDIA driver `450.80.02` or higher
- [Docker](https://docs.docker.com/get-docker/)
- [The NVIDIA container toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker)
- [NVIDIA Triton Inference Server](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/tritonserver) `22.06` or higher
- [Git LFS](https://git-lfs.github.com/)


#### Clone the Repository

```bash
MORPHEUS_ROOT=$(pwd)/morpheus
git clone https://github.com/nv-morpheus/Morpheus.git $MORPHEUS_ROOT
cd $MORPHEUS_ROOT
```

#### Git LFS

The large model and data files in this repo are stored using [Git Large File Storage (LFS)](https://git-lfs.github.com/). Only those files which are strictly needed to run Morpheus are downloaded by default when the repository is cloned.

The `scripts/fetch_data.py` script can be used to fetch the Morpheus pre-trained models, and other files required for running the training/validation scripts and example pipelines.

Usage of the script is as follows:
```bash
scripts/fetch_data.py fetch <dataset> [<dataset>...]
```

At time of writing the defined datasets are:
* all - Metaset includes all others
* examples - Data needed by scripts in the `examples` subdir
* models - Morpheus models (largest dataset)
* tests - Data used by unittests
* validation - Subset of the models dataset needed by some unittests

To download just the examples and models:
```bash
scripts/fetch_data.py fetch examples models
```

To download the data needed for unittests:
```bash
scripts/fetch_data.py fetch tests validation
```

If `Git LFS` is not installed the before cloning the repository, the `scripts/fetch_data.py` script will fail. If this is the case follow the instructions for installing `Git LFS` from [here](https://git-lfs.github.com/), and then run the following command:
```bash
git lfs install
```

### Build Morpheus Container

To assist in building the Morpheus container, several scripts have been provided in the `./docker` directory. To build the "release" container, run the following:

```bash
./docker/build_container_release.sh
```

This will create an image named `nvcr.io/nvidia/morpheus/morpheus:${MORPHEUS_VERSION}-runtime` where `$MORPHEUS_VERSION` is replaced by the output of `git describe --tags --abbrev=0`.

To run the built "release" container, use the following:

```bash
./docker/run_container_release.sh
```

You can specify different Docker images and tags by passing the script the `DOCKER_IMAGE_TAG`, and `DOCKER_IMAGE_TAG` variables respectively. For example, to run version `v22.11.00a` use the following:

```bash
DOCKER_IMAGE_TAG="v22.11.00a-runtime" ./docker/run_container_release.sh
```

### Build from Source

It's possible to build from source outside of a container. However, due to the large number of dependencies, this can be complex and is only necessary for developers. Instructions for developers and contributors can be found in [CONTRIBUTING.md](./CONTRIBUTING.md).

## Launching Triton Server

Many of the validation tests and example workflows require a Triton server to function.
Use the following command to launch a Docker container for Triton loading all of the included pre-trained models:

```bash
docker run --rm -ti --gpus=all -p8000:8000 -p8001:8001 -p8002:8002 \
-v $PWD/models:/models \
nvcr.io/nvidia/tritonserver:22.08-py3 \
tritonserver --model-repository=/models/triton-model-repo \
--exit-on-error=false \
--log-info=true \
--strict-readiness=false
```
This will launch Triton using the default network ports (8000 for HTTP, 8001 for GRPC, and 8002 for metrics).

## Running Morpheus

To run Morpheus, users will need to choose from the Morpheus Command Line Interface (CLI) or Python interface. Which interface to use depends on the user's needs, amount of customization, and operating environment. More information on each interface can be found below.

### Morpheus Python Interface

The Morpheus python interface allows users to configure their pipelines using a python script file. This is ideal for users who are working in a Jupyter notebook, users who need complex initialization logic or users who have customized stages. Documentation on using the Morpheus python interface can be found at [`docs/source/developer_guide/guides.rst`](./docs/source/developer_guide/guides.rst).

For full example pipelines using the python interface, see the `./examples` directory.

### Morpheus Command Line Interface (CLI)

The CLI allows users to completely configure a Morpheus pipeline directly from a terminal. This is ideal for users who do not need customized stages and for users configuring a pipeline in Kubernetes. The Morpheus CLI can be invoked using the `morpheus` command and is capable of running linear pipelines as well as additional tools. Instructions for using the CLI can be queried directly in the terminal using `morpheus --help`:

```bash
$ morpheus
Usage: morpheus [OPTIONS] COMMAND [ARGS]...

Options:
--debug / --no-debug [default: no-debug]
--log_level [CRITICAL|FATAL|ERROR|WARN|WARNING|INFO|DEBUG]
Specify the logging level to use. [default:
WARNING]
--log_config_file FILE Config file to use to configure logging. Use
only for advanced situations. Can accept
both JSON and ini style configurations
--plugin TEXT Adds a Morpheus CLI plugin. Can either be a
module name or path to a python module
--version Show the version and exit.
--help Show this message and exit.

Commands:
run Run one of the available pipelines
tools Run a utility tool
```

Each command in the CLI has its own help information. Use `morpheus [command] [...sub-command] --help` to get instructions for each command and sub command. For example:

```bash
$ morpheus run pipeline-nlp inf-triton --help
Configuring Pipeline via CLI
Usage: morpheus run pipeline-nlp inf-triton [OPTIONS]

Options:
--model_name TEXT Model name in Triton to send messages to
[required]
--server_url TEXT Triton server URL (IP:Port) [required]
--force_convert_inputs BOOLEAN Instructs this stage to forcibly convert all
input types to match what Triton is
expecting. Even if this is set to `False`,
automatic conversion will be done only if
there would be no data loss (i.e. int32 ->
int64). [default: False]
--use_shared_memory BOOLEAN Whether or not to use CUDA Shared IPC Memory
for transferring data to Triton. Using CUDA
IPC reduces network transfer time but
requires that Morpheus and Triton are
located on the same machine [default:
False]
--help Show this message and exit. [default:
False]
```

Several examples on using the Morpheus CLI can be found at [`docs/source/basics/examples.rst`](./docs/source/basics/examples.rst).

#### CLI Stage Configuration

When configuring a pipeline via the CLI, you start with the command `morpheus run pipeline` and then list the stages in order from start to finish. The order that the commands are placed in will be the order that data flows from start to end. The output of each stage will be linked to the input of the next. For example, to build a simple pipeline that reads from Kafka, deserializes messages, serializes them, and then writes to a file, use the following:

```bash
morpheus run pipeline-nlp from-kafka --input_topic test_pcap deserialize serialize to-file --filename .tmp/temp_out.json
```

You should see some output similar to:

```log
====Building Pipeline====
Added source: <from-kafka-0; KafkaSourceStage(bootstrap_servers=localhost:9092, input_topic=test_pcap, group_id=custreamz, poll_interval=10millis)>
└─> morpheus.MessageMeta
Added stage: <deserialize-1; DeserializeStage()>
└─ morpheus.MessageMeta -> morpheus.MultiMessage
Added stage: <serialize-2; SerializeStage(include=[], exclude=['^ID$', '^_ts_'], output_type=pandas)>
└─ morpheus.MultiMessage -> pandas.DataFrame
Added stage: <to-file-3; WriteToFileStage(filename=.tmp/temp_out.json, overwrite=False, file_type=auto)>
└─ pandas.DataFrame -> pandas.DataFrame
====Building Pipeline Complete!====
```

This is important because it shows you the order of the stages and the output type of each one. Since some stages cannot accept all types of inputs, Morpheus will report an error if you have configured your pipeline incorrectly. For example, if we run the same command as above but forget the `serialize` stage, you will see the following:

```bash
$ morpheus run pipeline-nlp from-kafka --input_topic test_pcap deserialize to-file --filename .tmp/temp_out.json --overwrite

====Building Pipeline====
Added source: from-kafka -> <class 'cudf.core.dataframe.DataFrame'>
Added stage: deserialize -> <class 'morpheus.pipeline.messages.MultiMessage'>

Traceback (most recent call last):
File "morpheus/pipeline/pipeline.py", line 228, in build_and_start
current_stream_and_type = await s.build(current_stream_and_type)
File "morpheus/pipeline/pipeline.py", line 108, in build
raise RuntimeError("The {} stage cannot handle input of {}. Accepted input types: {}".format(
RuntimeError: The to-file stage cannot handle input of <class 'morpheus.pipeline.messages.MultiMessage'>. Accepted input types: (typing.List[str],)
```

This indicates that the `to-file` stage cannot accept the input type of `morpheus.pipeline.messages.MultiMessage`. This is because the `to-file` stage has no idea how to write that class to a file; it only knows how to write strings. To ensure you have a valid pipeline, look at the `Accepted input types: (typing.List[str],)` portion of the message. This indicates you need a stage that converts from the output type of the `deserialize` stage, `morpheus.pipeline.messages.MultiMessage`, to `typing.List[str]`, which is exactly what the `serialize` stage does.

## Pipeline Stages

A complete list of the pipeline stages will be added in the future. For now, you can query the available stages for each pipeline type via:

```bash
$ morpheus run pipeline-nlp --help
Usage: morpheus run pipeline-nlp [OPTIONS] COMMAND1 [ARGS]... [COMMAND2
[ARGS]...]...

<Help Paragraph Omitted>

Commands:
add-class Add detected classifications to each message
add-scores Add probability scores to each message
buffer (Deprecated) Buffer results
delay (Deprecated) Delay results for a certain duration
deserialize Deserialize source data from JSON.
dropna Drop null data entries from a DataFrame
filter Filter message by a classification threshold
from-file Load messages from a file
from-kafka Load messages from a Kafka cluster
gen-viz (Deprecated) Write out visualization data frames
inf-identity Perform a no-op inference for testing
inf-pytorch Perform inference with PyTorch
inf-triton Perform inference with Triton
mlflow-drift Report model drift statistics to ML Flow
monitor Display throughput numbers at a specific point in the pipeline
preprocess Convert messages to tokens
serialize Serializes messages into a text format
to-file Write all messages to a file
to-kafka Write all messages to a Kafka cluster
validate Validates pipeline output against an expected output
```

And for the FIL pipeline:

```bash
$ morpheus run pipeline-fil --help
Usage: morpheus run pipeline-fil [OPTIONS] COMMAND1 [ARGS]... [COMMAND2
[ARGS]...]...

<Help Paragraph Omitted>

Commands:
add-class Add detected classifications to each message
add-scores Add probability scores to each message
buffer (Deprecated) Buffer results
delay (Deprecated) Delay results for a certain duration
deserialize Deserialize source data from JSON.
dropna Drop null data entries from a DataFrame
filter Filter message by a classification threshold
from-file Load messages from a file
from-kafka Load messages from a Kafka cluster
inf-identity Perform a no-op inference for testing
inf-pytorch Perform inference with PyTorch
inf-triton Perform inference with Triton
mlflow-drift Report model drift statistics to ML Flow
monitor Display throughput numbers at a specific point in the pipeline
preprocess Convert messages to tokens
serialize Serializes messages into a text format
to-file Write all messages to a file
to-kafka Write all messages to a Kafka cluster
validate Validates pipeline output against an expected output
```

And for the AE pipeline:

```bash
$ morpheus run pipeline-ae --help
Usage: morpheus run pipeline-ae [OPTIONS] COMMAND1 [ARGS]... [COMMAND2
[ARGS]...]...

<Help Paragraph Omitted>

Commands:
add-class Add detected classifications to each message.
add-scores Add probability scores to each message.
buffer (Deprecated) Buffer results.
delay (Deprecated) Delay results for a certain duration.
filter Filter message by a classification threshold.
from-azure Source stage is used to load Azure Active Directory messages.
from-cloudtrail Load messages from a Cloudtrail directory.
from-duo Source stage is used to load Duo Authentication messages.
inf-pytorch Perform inference with PyTorch.
inf-triton Perform inference with Triton Inference Server.
monitor Display throughput numbers at a specific point in the pipeline.
preprocess Prepare Autoencoder input DataFrames for inference.
serialize Include & exclude columns from messages.
timeseries Perform time series anomaly detection and add prediction.
to-file Write all messages to a file.
to-kafka Write all messages to a Kafka cluster.
train-ae Train an Autoencoder model on incoming data.
trigger Buffer data until previous stage has completed.
validate Validate pipeline output for testing.
```
Note: The available commands for different types of pipelines are not the same. This means that the same stage, when used in different pipelines, may have different options. Please check the CLI help for the most up-to-date information during development.

## Contributing
Please see our [guide for contributing to Morpheus](./CONTRIBUTING.md).
Full documentation for the latest official release is available at [https://docs.nvidia.com/morpheus/](https://docs.nvidia.com/morpheus/).
1 change: 1 addition & 0 deletions docs/source/contributing.md
Loading