Update DFP E2E Benchmarks README to use dev container (nv-morpheus#1125)

+ Add dev container setup instructions to README + Allows common container/environment setup for both core and DFP benchmarks Closes nv-morpheus#1124 Authors: - Eli Fajardo (https://github.com/efajardo-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: nv-morpheus#1125
pthalasta · Aug 30, 2023 · 1704643 · 1704643
1 parent 4d4a7ee
commit 1704643
Show file tree

Hide file tree

Showing 3 changed files with 62 additions and 9 deletions.
diff --git a/examples/digital_fingerprinting/production/morpheus/benchmarks/README.md b/examples/digital_fingerprinting/production/morpheus/benchmarks/README.md
@@ -16,17 +16,58 @@
 
 # Running DFP E2E Benchmarks
 
-### Set Environment
+### Set up Morpheus Dev Container
 
-To set up and run the benchmarks on production DFP pipeline, follow the instructions provided [here](../../README.md). Once the Morpheus container and the MLflow server have been set up and running with `docker compose`. Attach to the Morpheus pipeline container and download the sample data from S3 per the document's instructions.
+If you don't already have the Morpheus Dev container, run the following to build it:
+```
+./docker/build_container_dev.sh
+```
+
+Now run the container:
+```
+./docker/run_container_dev.sh
+```
+
+Note that Morpheus containers are tagged by date. By default, `run_container_dev.sh` will try to use current date as tag. Therefore, if you are trying to run a container that was not built on the current date, you must set the `DOCKER_IMAGE_TAG` environment variable. For example,
+```
+DOCKER_IMAGE_TAG=dev-221003 ./docker/run_container_dev.sh
+```
+
+In the `/workspace` directory of the container, run the following to compile Morpheus:
+```
+./scripts/compile.sh
+```
+
+Now install Morpheus:
+```
+pip install -e /workspace
+```
 
-## Requirements
-> **Note**: Make sure `gputil`, `dask` and `distributed` are installed in your Conda environment before running the benchmarks. Run the installation command specified below if not.
+Install additonal required dependencies:
+```
+export CUDA_VER=11.8
+mamba env update -n morpheus --file docker/conda/environments/cuda${CUDA_VER}_examples.yml
+```
 
-```bash
-conda install gputil 'dask>=2023.1.1' 'distributed>=2023.1.1'
+Fetch input data for benchmarks:
+```
+./examples/digital_fingerprinting/fetch_example_data.py all
 ```
 
+### Start MLflow
+
+MLflow is used as the model repository where the trained DFP models will be published and used for inference by the pipelines. Run the following to start MLflow in a host terminal window (not container):
+
+```
+# from root of Morpheus repo
+cd examples/digital_fingerprinting/production
+```
+
+```
+docker compose up mlflow
+```
+
+
 ### Run E2E Benchmarks
 
 Benchmarks are run using `pytest-benchmark`. By default, there are five rounds of measurement. For each round, there will be one iteration of each workflow. Measurements are taken for each round. Final results such as `min`, `max` and `mean` times will be based on these measurements.
@@ -55,10 +96,10 @@ When using the MRC SegmentModule in a pipeline, it will also require a module co
 To ensure the [file_to_df_loader.py](../../../../../morpheus/loaders/file_to_df_loader.py) utilizes the same type of downloading mechanism, set `MORPHEUS_FILE_DOWNLOAD_TYPE` environment variable with any one of given choices (`multiprocess`, `dask`, `dask thread`, `single thread`).
 
 ```
-export MORPHEUS_FILE_DOWNLOAD_TYPE=multiprocess
+export MORPHEUS_FILE_DOWNLOAD_TYPE=dask
 ```
 
-Benchmarks for an individual workflow can be run using the following:
+Benchmarks for an individual workflow can be run using the following in your dev container:
 
 ```
 

diff --git a/examples/digital_fingerprinting/production/morpheus/benchmarks/resource/pipelines_conf.json b/examples/digital_fingerprinting/production/morpheus/benchmarks/resource/pipelines_conf.json
@@ -1,5 +1,5 @@
 {
-	"tracking_uri": "http://mlflow:5000",
+	"tracking_uri": "http://localhost:5000",
 	"test_dfp_modules_azure_payload_inference_e2e": {
 		"message_path": "./resource/control_messages/azure_payload_inference.json",
 		"num_threads": 12,

diff --git a/tests/benchmarks/README.md b/tests/benchmarks/README.md
@@ -180,3 +180,15 @@ Additional benchmark stats for each workflow:
 - max_throughput_bytes
 - mean_throughput_bytes
 - median_throughput_bytes
+
+
+### Production DFP E2E Benchmarks
+
+Note that the `test_cloudtrail_ae_e2e` benchmarks measure performance of a pipeline built using [Starter DFP](../../examples/digital_fingerprinting/starter/README.md) stages. Separate benchmark tests are also provided to measure performance of the example [Production DFP](../../examples/digital_fingerprinting/production/README.md) pipelines. More information about running those benchmarks can be found [here](../../examples/digital_fingerprinting/production/morpheus/benchmarks/README.md).
+
+You can use the same Dev container created here to run the Production DFP benchmarks. You would just need to install additional dependencies as follows:
+
+```
+export CUDA_VER=11.8
+mamba env update -n morpheus --file docker/conda/environments/cuda${CUDA_VER}_examples.yml
+```