Move the Advisor Service into the Inventory Service code#108
Open
Move the Advisor Service into the Inventory Service code#108
Conversation
Add settings, imports, and helper functions needed for merging service/service.py into advisor_inventory_service.py. - Add THREAD_POOL_SIZE, SERVICE_APP_NAME, SERVICE_GROUP_ID settings - Add DISABLE_PROMETHEUS setting - Add FILTER_OUT_NON_RHEL and FILTER_OUT_RHEL6 settings - Import BoundedExecutor, thread_storage, payload_tracker, utils, reports - Import additional Django models (Ack, CurrentReport, etc.) - Add make_async_handler() wrapper to submit handlers to thread pool - Add on_thread_done() callback for exception handling Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy create_db_reports from service/service.py for creating and updating reports, uploads, and hosts from engine results and rule hits. - Creates/updates Host records with satellite and branch information - Handles autoack logic for new accounts - Creates/updates Upload records with source tracking - Filters reports by RHEL version (FILTER_OUT_NON_RHEL, FILTER_OUT_RHEL6) - Creates/updates CurrentReport records with impacted dates - Triggers webhooks and remediations via report_hooks - Uses atomic transactions with select_for_update() locks - Tracks performance metrics via thread_storage and prometheus Database retry logic removed - Django handles connection retry automatically. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy handle_engine_results from service/service.py for processing Insights Engine rule processing results from platform.engine.results topic. - Validates required keys in engine results message - Extracts system data, platform metadata, and engine reports - Sets thread_storage values for payload tracking and metrics - Validates system type exists in database - Calls create_db_reports() to create Upload and CurrentReport records - Handles satellite_managed, satellite_id, and branch_id fields - Tracks timing metrics for performance monitoring Updated to use KafkaDispatcher signature (topic, message). Removed manual database retry logic - Django handles automatically. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy handle_rule_hits from service/service.py for processing third-party rule hits from platform.insights.rule-hits topic. - Validates required keys: org_id, source, host_product, host_role, inventory_id, hits - Normalizes host_role and host_product to lowercase - Validates system type exists in database - Sets thread_storage values for payload tracking - Calls create_db_reports() to create reports from rule hits - Tracks timing metrics for performance monitoring Updated to use KafkaDispatcher signature (topic, message). Removed manual database retry logic - Django handles automatically. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add INSIGHTS_ADVISOR_SERVICE_INVENTORY_EVENTS_ELAPSED timer decorator to the existing handle_inventory_event() handler for consistency with other handlers and performance tracking. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace Command.handle() method with comprehensive service initialization that integrates BoundedExecutor thread pool with KafkaDispatcher. Changes: - Initialize BoundedExecutor with THREAD_POOL_SIZE workers - Start Prometheus server in background thread if not disabled - Setup Prometheus metrics (INSIGHTS_ADVISOR_STATUS, INSIGHTS_ADVISOR_UP) - Register three handlers with async wrappers: * ENGINE_RESULTS_TOPIC -> handle_engine_results * RULE_HITS_TOPIC -> handle_rule_hits * INVENTORY_TOPIC -> handle_inventory_event - All handlers submitted to thread pool via make_async_handler wrapper - Setup SIGTERM/SIGINT signal handlers for graceful shutdown - Update Prometheus state through service lifecycle - Shutdown executor cleanly on exit - Update help text to reflect expanded scope This completes the merge of service/service.py functionality into advisor_inventory_service.py with concurrent message processing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy sample JSON test data files from service/tests/ to api/advisor/api/tests/: - sample_engine_results.json: Engine results from Insights Engine (262KB) - sample_satellite_engine_results.json: Satellite-managed system results - sample_rhel6_engine_results.json: RHEL 6 system results for filtering tests - sample_rule_hits.json: Third-party rule hits from aiops/RHV - sample_report.json: Sample report data for webhook tests These files contain test fixtures for validating engine results processing, rule hits handling, RHEL version filtering, and webhook event generation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Port tests from service/tests/test_advisor_service.py to Django TestCase format in test_advisor_service_engine_rule_hits.py. Tests ported: - test_similar_uploads: Verify upload record updates for repeated uploads - test_impacted_date: Test impacted_date preservation and reset logic - test_autoacks_for_new_account: Validate autoack creation for new accounts - test_handle_engine_results: Basic engine results processing - test_handle_engine_results_two_sources: Multiple sources for same system - test_satellite_handle_engine_results: Satellite-managed systems - test_handle_engine_results_bad_keys: Invalid message validation - test_handle_rule_hits: Third-party rule hits processing - test_handle_rule_hits_missing_keys: Invalid rule hits rejection - test_non_rhel_system_filtering: FILTER_OUT_NON_RHEL setting - test_rhel6_system_filtering: FILTER_OUT_RHEL6 setting Uses responses library for HTTP mocking instead of patch, override_settings for configuration changes, and loads sample data from JSON files. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Port webhook tests from service/tests/test_advisor_service.py to Django TestCase format in test_advisor_service_webhooks.py. Tests ported: - test_generate_webhook_msgs_new_report: Validate webhook messages for new reports, including account_id, org_id, context, events payload with rule details (rule_id, reboot_required, has_incident), and remediations events - test_generate_webhook_msgs_resolved_report: Validate webhook messages for resolved reports when active rules are no longer present Uses DummyProducer from kafka_utils to capture Kafka messages sent by reports.trigger_report_hooks() instead of mocking functions. This allows inspection of actual message content sent to Kafka topics. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Port database error handling tests from service/tests/test_advisor_service.py to Django TestCase format in test_advisor_service_database.py. Tests ported: - test_db_reports_bad_upload: Verify exception handling in upload operations - test_db_reports_bad_upload_source: Test None upload source handling - test_db_reports_upload_source_exception: Test get_or_create exceptions Note: Database retry tests (test_db_one_failure, test_db_repeated_failure) were not ported as Django's built-in connection management handles database retries automatically. Manual retry logic was removed from create_db_reports per project requirements. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove unused imports: - SystemType from api.models (not used in tests) - constants from api.tests (not used in tests) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copied the following modules from service/ to api/advisor/: - bounded_executor.py - thread_storage.py - payload_tracker.py - utils.py - reports.py - build_info.py - prometheus.py Updated all copied modules to use Django settings instead of service settings: - Changed "import settings" to "from django.conf import settings" - Removed Django setup code from reports.py (already handled by Django) Use DummyProducer in tests to avoid Kafka connection hangs: - reports.py and payload_tracker.py now import DummyProducer when settings.TESTING is True - Renamed producer variable from 'p' to '_producer' for clarity - Updated test_advisor_service_webhooks.py to patch '_producer' This allows the Django test suite to import these modules without blocking on Kafka connections.
Since reports._producer is already initialized with DummyProducer when settings.TESTING is True, we don't need to patch it in tests. Access reports._producer directly instead. Changes: - Removed patch.object() calls - access reports._producer directly - Removed unused imports (patch, DummyProducer) - Added setUp() method to call reports._producer.reset_calls() between tests
The previous code checked settings.TESTING at module import time, which caused Django configuration errors when importing outside of tests. Changed to check settings.TESTING at runtime when creating the producer instance, not at import time. This allows the modules to be imported before Django settings are configured.
The send_remediations_event() function calls produce() with key= and value= parameters, but DummyProducer only accepted message parameter. Updated DummyProducer.produce() to accept both calling styles: - produce(topic, message, callback) - original style - produce(topic, key=..., value=..., callback) - remediations style Used Optional[type] for parameters that can be None.
Set defaults when CLOWDER is not enabled: - WEBHOOKS_TOPIC defaults to 'hooks.outbox' - PAYLOAD_TRACKER_TOPIC defaults to 'platform.payload-status' This ensures the Kafka producers are initialized in test environments.
When both webhook and remediation events are sent, there are 2 poll and 2 flush calls (one from each function). Updated the first test to expect 2 calls instead of 1. The second test only sends webhook events (no remediation), so 1 call is correct.
The original test had a Python bug: 'assert x.count(), 4' doesn't check equality, it just checks that both values are truthy. The correct assertion should check count() == expected_value. Updated to match actual sample data: - insights-client upload: 1 report (sample_engine_results has 1 report) - aiops upload: 2 reports (sample_rule_hits has 2 hits)
Added Counter metrics for inventory host lifecycle events: - INVENTORY_HOST_CREATED - INVENTORY_HOST_UPDATED - INVENTORY_HOST_DELETED These are used by advisor_inventory_service.py to track inventory event processing.
Removed from git tracking but kept on filesystem: - *.db files (Prometheus counter/histogram databases) - test_reports/ directory (XML test output files) These files are generated at runtime and should not be tracked in git.
Ignore patterns for: - Test coverage reports (.coverage, htmlcov/) - Test reports (test_reports/, XML files) - Prometheus database files (*.db) - Generated API schema files (YAML, JSON) - Temporary content files - Service account credentials - Patch files - Type stubs - Standard Python/IDE/OS artifacts This prevents generated and temporary files from being committed to the repository.
Removed from git but kept on filesystem: - Coverage files (.coverage, api/.coverage) - Test reports (junit-service.xml) - API schemas (*.json, *.yaml) - Content files (playbook_content.yaml, rule_content.yaml) - Service account credentials (tasks_pd_runs_service_account.json) - Patch files (tasks_service_async_claude.patch) - Type stubs (type_stubs/, api/advisor/type_stubs.pyi) - Build artifacts (tekton_catalog_uses) - Lock file backup (Pipfile.lock.old) Total: 14 files + 27 type stub files removed from tracking These files are now properly ignored per .gitignore and won't be accidentally committed in the future.
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Moved thread-local storage cleanup from individual handler functions into the make_async_handler wrapper, where it executes automatically before each task. This is cleaner and ensures cleanup happens consistently for all wrapped handlers. Changes: - Updated make_async_handler() to inline thread_storage cleanup before calling the handler function - Removed utils.clean_threading_cruft() calls from handle_engine_results() and handle_rule_hits() - Removed utils.clean_threading_cruft() call from prometheus.start_prometheus() (it doesn't use thread_storage) - Removed clean_threading_cruft() function from utils.py along with its outdated Python 3.7 TODO comment - Removed unused imports: thread_storage_object from utils.py, utils from prometheus.py The cleanup is still necessary (thread pools reuse threads, causing data leakage between tasks), but now it's handled automatically by the wrapper pattern rather than requiring manual calls in each handler. All 16 advisor service tests pass.
Consolidates repetitive thread_storage setting patterns throughout the advisor service into three reusable helper functions: - start_operation_tracking(): Sets up tracking context and start time - track_operation_failure(): Records failure with timing and error message - track_operation_success(): Records successful completion with timing Benefits: - Reduces code duplication across handlers - Makes tracking pattern more consistent and maintainable - Simplifies handler logic by encapsulating tracking details - Context fields passed as keyword arguments for clarity Updated handlers: - handle_engine_results(): Now uses helpers for all tracking - handle_rule_hits(): Now uses helpers for all tracking - create_db_reports(): Now uses helpers for all tracking All existing tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-written by Paul Wayper and Claude Sonnet 4.5 Signed-off-by: Paul Wayper <paulway@redhat.com>
There was a problem hiding this comment.
Sorry @PaulWay, your pull request is larger than the review limit of 150000 diff characters
Consolidates duplicate SystemType lookup and error handling from handle_engine_results() and handle_rule_hits() into a single helper function. Changes: - New get_system_type_or_fail() helper function - Uses .get() instead of .filter().first() since (role, product_code) is unique - Handles DoesNotExist exception with consistent error logging - Automatically tracks operation failure and sends payload status - Reduces code duplication by ~25 lines Benefits: - Single source of truth for SystemType validation - More efficient database query using .get() - Consistent error handling across handlers - Easier to maintain and test All existing tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
…ail failure Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Collaborator
|
Hey @PaulWay. This is getting too big to review (252 files changed). There some JSON test results committed that shouldn't be there. I strongly advise to restart this effort with one concern at a time. There is always something that could be done iteratively with smaller changes and merged, without impacting the overall service. It would be easier to review with smaller sets of changes. Thank you. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Much of this work has been done by Claude Sonnet 4.5.
The purpose of this work is two-fold.
Firstly, to move the service code fully into using Django, like the rest of the code base. This allows us to have one test suite that works
Secondly, to integrate the service code into the existing Django management command to process Inventory event replication.
A tertiary goal is to significantly reduce the size and complexity of the service code, by using Django's own database connection retry logic rather than manual intervention, and to reduce duplication by refactoring common practices into simpler functions.