Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual testing of Morpheus with Kafka & Validation improvements #290

Merged
37 commits merged into from
Aug 8, 2022

Conversation

dagardner-nv
Copy link
Contributor

@dagardner-nv dagardner-nv commented Jul 19, 2022

  • instructions for manually testing of Morpheus using Kafka. Adds a Kafka version for each of the four validation scripts in scripts/validation
  • csv & json serializers now support an include_index_col flag to control exporting the Dataframe's index column. Note due to a limitation of cudf & pandas this has no impact on JSON:
  • morpheus.utils.logging renamed to morpheus.utils.logger so that other modules in morpheus.utils can import the standard lib logging module.
  • Comparison logic in the ValidationStage has been moved to it's own module morpheus.utils.compare_df so that the functionality can be used outside of the stage.

fixes #265

@dagardner-nv dagardner-nv added non-breaking Non-breaking change doc Improvements or additions to documentation 2 - In Progress labels Jul 19, 2022
@dagardner-nv dagardner-nv requested review from a team as code owners July 19, 2022 23:53
Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions

scripts/validation/kafka_testing.md Outdated Show resolved Hide resolved
scripts/validation/kafka_testing.md Outdated Show resolved Hide resolved
scripts/validation/kafka_testing.md Outdated Show resolved Hide resolved
scripts/validation/strip_first_csv_col.py Outdated Show resolved Hide resolved
@pdmack
Copy link
Contributor

pdmack commented Jul 26, 2022

LGTM per our discussion @dagardner-nv

@dagardner-nv dagardner-nv requested a review from pdmack July 26, 2022 20:13
@dagardner-nv dagardner-nv changed the title Instructions for manual testing of Morpheus with Kafka Manual testing of Morpheus with Kafka & Validation improvements Jul 26, 2022
Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the main() from compare_df into the scripts directory to separate the script and arg parsing from the module.

@dagardner-nv
Copy link
Contributor Author

Move the main() from compare_df into the scripts directory to separate the script and arg parsing from the module.

Done

@dagardner-nv
Copy link
Contributor Author

rerun tests

Copy link
Contributor

@pdmack pdmack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
If I'm being honest there is a LOT of rote details here that are somewhat simpler in Kubernetes. But it's nice to have the comprehensive detail for anyone looking for it.

I do think there are sometimes confusing points between the spectrum of CONTRIBUTING.md and the new kafka_testing.md as to whether commands are issued inside a container or in the conda env. Potentially a Good First Issue for an external contributor to weave in a bit more clarity.

It may be worth adding a comment about a low fd ulimit (1024) impacting morpheus/librdkafka.

Copy link
Contributor

@mdemoret-nv mdemoret-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once this PR has been updated with the latest upstream changes, we can merge.

@mdemoret-nv
Copy link
Contributor

@gpucibot merge

@ghost ghost merged commit bb200ab into nv-morpheus:branch-22.08 Aug 8, 2022
ghost pushed a commit that referenced this pull request Sep 7, 2022
Breaking changes (#296):
* Default `group_id` for `KafkaSourceStage` changes from "custreamz" to "morpheus"
* Remove default topic value for `KafkaSourceStage`.

Other changes:
* Adds Kafka integration tests for ABP, Hammah, Phishing & Sid validation pipelines.
* Uses pytest_kafka to add zookeeper & kafka pytest fixtures
* Configures pytest to only run kafka tests when `--run_kafka` command line flag is set. 
* Adds a `stop_after` argument to `KafkaSourceStage` when defined the source will stop once `stop_after` records have been read from the source.
* Fixes bug in C++ impl preventing KafkaSourceStage from respecting max_batch_size (#309)

Note: Many of the kafka tests are also slower integration tests in effect requiring both the `--run_kafka` & `--run_slow` flags being passed to `pytest`

requires PR #290 
fixes #258
fixes #296
fixes #309

Authors:
  - David Gardner (https://github.com/dagardner-nv)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)

URL: #308
ghost pushed a commit that referenced this pull request Nov 14, 2022
…#422)

* Address feedback from #290 
* Add note about automated Kafka tests, which didn't exist at time the document was first written
* Fix spelling errors

Fixes #322

Authors:
  - David Gardner (https://github.com/dagardner-nv)
  - Michael Demoret (https://github.com/mdemoret-nv)
  - Pete MacKinnon (https://github.com/pdmack)

Approvers:
  - Michael Demoret (https://github.com/mdemoret-nv)
  - Pete MacKinnon (https://github.com/pdmack)

URL: #422
@dagardner-nv dagardner-nv deleted the david-test-kafka branch February 12, 2024 23:21
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc Improvements or additions to documentation non-breaking Non-breaking change
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

[FEA] Create Kafka Testing Procedure
3 participants