-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manual testing of Morpheus with Kafka & Validation improvements #290
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions
…SV output, due to a known issue in cudf & pandas (rapidsai/cudf#11317 & pandas-dev/pandas#37600) this option has no effect on JSON output
… modules in morpheus.utils to import the std logger
…s() will return non-None but diff_rows=0
LGTM per our discussion @dagardner-nv |
…, and add cleanup instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move the main()
from compare_df
into the scripts
directory to separate the script and arg parsing from the module.
Done |
rerun tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
If I'm being honest there is a LOT of rote details here that are somewhat simpler in Kubernetes. But it's nice to have the comprehensive detail for anyone looking for it.
I do think there are sometimes confusing points between the spectrum of CONTRIBUTING.md and the new kafka_testing.md as to whether commands are issued inside a container or in the conda env. Potentially a Good First Issue for an external contributor to weave in a bit more clarity.
It may be worth adding a comment about a low fd ulimit (1024) impacting morpheus/librdkafka.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once this PR has been updated with the latest upstream changes, we can merge.
@gpucibot merge |
Breaking changes (#296): * Default `group_id` for `KafkaSourceStage` changes from "custreamz" to "morpheus" * Remove default topic value for `KafkaSourceStage`. Other changes: * Adds Kafka integration tests for ABP, Hammah, Phishing & Sid validation pipelines. * Uses pytest_kafka to add zookeeper & kafka pytest fixtures * Configures pytest to only run kafka tests when `--run_kafka` command line flag is set. * Adds a `stop_after` argument to `KafkaSourceStage` when defined the source will stop once `stop_after` records have been read from the source. * Fixes bug in C++ impl preventing KafkaSourceStage from respecting max_batch_size (#309) Note: Many of the kafka tests are also slower integration tests in effect requiring both the `--run_kafka` & `--run_slow` flags being passed to `pytest` requires PR #290 fixes #258 fixes #296 fixes #309 Authors: - David Gardner (https://github.com/dagardner-nv) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) URL: #308
…#422) * Address feedback from #290 * Add note about automated Kafka tests, which didn't exist at time the document was first written * Fix spelling errors Fixes #322 Authors: - David Gardner (https://github.com/dagardner-nv) - Michael Demoret (https://github.com/mdemoret-nv) - Pete MacKinnon (https://github.com/pdmack) Approvers: - Michael Demoret (https://github.com/mdemoret-nv) - Pete MacKinnon (https://github.com/pdmack) URL: #422
scripts/validation
include_index_col
flag to control exporting the Dataframe's index column. Note due to a limitation of cudf & pandas this has no impact on JSON:morpheus.utils.logging
renamed tomorpheus.utils.logger
so that other modules inmorpheus.utils
can import the standard lib logging module.ValidationStage
has been moved to it's own modulemorpheus.utils.compare_df
so that the functionality can be used outside of the stage.fixes #265