Skip to content

Commit

Permalink
Update DFP E2E Benchmarks README to use dev container (nv-morpheus#1125)
Browse files Browse the repository at this point in the history
+ Add dev container setup instructions to README
+ Allows common container/environment setup for both core and DFP benchmarks

Closes nv-morpheus#1124

Authors:
  - Eli Fajardo (https://github.com/efajardo-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: nv-morpheus#1125
  • Loading branch information
efajardo-nv authored Aug 30, 2023
1 parent 4d4a7ee commit 1704643
Show file tree
Hide file tree
Showing 3 changed files with 62 additions and 9 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -16,17 +16,58 @@

# Running DFP E2E Benchmarks

### Set Environment
### Set up Morpheus Dev Container

To set up and run the benchmarks on production DFP pipeline, follow the instructions provided [here](../../README.md). Once the Morpheus container and the MLflow server have been set up and running with `docker compose`. Attach to the Morpheus pipeline container and download the sample data from S3 per the document's instructions.
If you don't already have the Morpheus Dev container, run the following to build it:
```
./docker/build_container_dev.sh
```

Now run the container:
```
./docker/run_container_dev.sh
```

Note that Morpheus containers are tagged by date. By default, `run_container_dev.sh` will try to use current date as tag. Therefore, if you are trying to run a container that was not built on the current date, you must set the `DOCKER_IMAGE_TAG` environment variable. For example,
```
DOCKER_IMAGE_TAG=dev-221003 ./docker/run_container_dev.sh
```

In the `/workspace` directory of the container, run the following to compile Morpheus:
```
./scripts/compile.sh
```

Now install Morpheus:
```
pip install -e /workspace
```

## Requirements
> **Note**: Make sure `gputil`, `dask` and `distributed` are installed in your Conda environment before running the benchmarks. Run the installation command specified below if not.
Install additonal required dependencies:
```
export CUDA_VER=11.8
mamba env update -n morpheus --file docker/conda/environments/cuda${CUDA_VER}_examples.yml
```

```bash
conda install gputil 'dask>=2023.1.1' 'distributed>=2023.1.1'
Fetch input data for benchmarks:
```
./examples/digital_fingerprinting/fetch_example_data.py all
```

### Start MLflow

MLflow is used as the model repository where the trained DFP models will be published and used for inference by the pipelines. Run the following to start MLflow in a host terminal window (not container):

```
# from root of Morpheus repo
cd examples/digital_fingerprinting/production
```

```
docker compose up mlflow
```


### Run E2E Benchmarks

Benchmarks are run using `pytest-benchmark`. By default, there are five rounds of measurement. For each round, there will be one iteration of each workflow. Measurements are taken for each round. Final results such as `min`, `max` and `mean` times will be based on these measurements.
Expand Down Expand Up @@ -55,10 +96,10 @@ When using the MRC SegmentModule in a pipeline, it will also require a module co
To ensure the [file_to_df_loader.py](../../../../../morpheus/loaders/file_to_df_loader.py) utilizes the same type of downloading mechanism, set `MORPHEUS_FILE_DOWNLOAD_TYPE` environment variable with any one of given choices (`multiprocess`, `dask`, `dask thread`, `single thread`).

```
export MORPHEUS_FILE_DOWNLOAD_TYPE=multiprocess
export MORPHEUS_FILE_DOWNLOAD_TYPE=dask
```

Benchmarks for an individual workflow can be run using the following:
Benchmarks for an individual workflow can be run using the following in your dev container:

```
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"tracking_uri": "http://mlflow:5000",
"tracking_uri": "http://localhost:5000",
"test_dfp_modules_azure_payload_inference_e2e": {
"message_path": "./resource/control_messages/azure_payload_inference.json",
"num_threads": 12,
Expand Down
12 changes: 12 additions & 0 deletions tests/benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -180,3 +180,15 @@ Additional benchmark stats for each workflow:
- max_throughput_bytes
- mean_throughput_bytes
- median_throughput_bytes


### Production DFP E2E Benchmarks

Note that the `test_cloudtrail_ae_e2e` benchmarks measure performance of a pipeline built using [Starter DFP](../../examples/digital_fingerprinting/starter/README.md) stages. Separate benchmark tests are also provided to measure performance of the example [Production DFP](../../examples/digital_fingerprinting/production/README.md) pipelines. More information about running those benchmarks can be found [here](../../examples/digital_fingerprinting/production/morpheus/benchmarks/README.md).

You can use the same Dev container created here to run the Production DFP benchmarks. You would just need to install additional dependencies as follows:

```
export CUDA_VER=11.8
mamba env update -n morpheus --file docker/conda/environments/cuda${CUDA_VER}_examples.yml
```

0 comments on commit 1704643

Please sign in to comment.