Move the Advisor Service into the Inventory Service code by PaulWay · Pull Request #108 · RedHatInsights/advisor-backend

PaulWay · 2026-02-16T22:25:19Z

Much of this work has been done by Claude Sonnet 4.5.

The purpose of this work is two-fold.

Firstly, to move the service code fully into using Django, like the rest of the code base. This allows us to have one test suite that works

Secondly, to integrate the service code into the existing Django management command to process Inventory event replication.

A tertiary goal is to significantly reduce the size and complexity of the service code, by using Django's own database connection retry logic rather than manual intervention, and to reduce duplication by refactoring common practices into simpler functions.

Add settings, imports, and helper functions needed for merging service/service.py into advisor_inventory_service.py. - Add THREAD_POOL_SIZE, SERVICE_APP_NAME, SERVICE_GROUP_ID settings - Add DISABLE_PROMETHEUS setting - Add FILTER_OUT_NON_RHEL and FILTER_OUT_RHEL6 settings - Import BoundedExecutor, thread_storage, payload_tracker, utils, reports - Import additional Django models (Ack, CurrentReport, etc.) - Add make_async_handler() wrapper to submit handlers to thread pool - Add on_thread_done() callback for exception handling Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copy create_db_reports from service/service.py for creating and updating reports, uploads, and hosts from engine results and rule hits. - Creates/updates Host records with satellite and branch information - Handles autoack logic for new accounts - Creates/updates Upload records with source tracking - Filters reports by RHEL version (FILTER_OUT_NON_RHEL, FILTER_OUT_RHEL6) - Creates/updates CurrentReport records with impacted dates - Triggers webhooks and remediations via report_hooks - Uses atomic transactions with select_for_update() locks - Tracks performance metrics via thread_storage and prometheus Database retry logic removed - Django handles connection retry automatically. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copy handle_engine_results from service/service.py for processing Insights Engine rule processing results from platform.engine.results topic. - Validates required keys in engine results message - Extracts system data, platform metadata, and engine reports - Sets thread_storage values for payload tracking and metrics - Validates system type exists in database - Calls create_db_reports() to create Upload and CurrentReport records - Handles satellite_managed, satellite_id, and branch_id fields - Tracks timing metrics for performance monitoring Updated to use KafkaDispatcher signature (topic, message). Removed manual database retry logic - Django handles automatically. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copy handle_rule_hits from service/service.py for processing third-party rule hits from platform.insights.rule-hits topic. - Validates required keys: org_id, source, host_product, host_role, inventory_id, hits - Normalizes host_role and host_product to lowercase - Validates system type exists in database - Sets thread_storage values for payload tracking - Calls create_db_reports() to create reports from rule hits - Tracks timing metrics for performance monitoring Updated to use KafkaDispatcher signature (topic, message). Removed manual database retry logic - Django handles automatically. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add INSIGHTS_ADVISOR_SERVICE_INVENTORY_EVENTS_ELAPSED timer decorator to the existing handle_inventory_event() handler for consistency with other handlers and performance tracking. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Replace Command.handle() method with comprehensive service initialization that integrates BoundedExecutor thread pool with KafkaDispatcher. Changes: - Initialize BoundedExecutor with THREAD_POOL_SIZE workers - Start Prometheus server in background thread if not disabled - Setup Prometheus metrics (INSIGHTS_ADVISOR_STATUS, INSIGHTS_ADVISOR_UP) - Register three handlers with async wrappers: * ENGINE_RESULTS_TOPIC -> handle_engine_results * RULE_HITS_TOPIC -> handle_rule_hits * INVENTORY_TOPIC -> handle_inventory_event - All handlers submitted to thread pool via make_async_handler wrapper - Setup SIGTERM/SIGINT signal handlers for graceful shutdown - Update Prometheus state through service lifecycle - Shutdown executor cleanly on exit - Update help text to reflect expanded scope This completes the merge of service/service.py functionality into advisor_inventory_service.py with concurrent message processing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copy sample JSON test data files from service/tests/ to api/advisor/api/tests/: - sample_engine_results.json: Engine results from Insights Engine (262KB) - sample_satellite_engine_results.json: Satellite-managed system results - sample_rhel6_engine_results.json: RHEL 6 system results for filtering tests - sample_rule_hits.json: Third-party rule hits from aiops/RHV - sample_report.json: Sample report data for webhook tests These files contain test fixtures for validating engine results processing, rule hits handling, RHEL version filtering, and webhook event generation. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Port tests from service/tests/test_advisor_service.py to Django TestCase format in test_advisor_service_engine_rule_hits.py. Tests ported: - test_similar_uploads: Verify upload record updates for repeated uploads - test_impacted_date: Test impacted_date preservation and reset logic - test_autoacks_for_new_account: Validate autoack creation for new accounts - test_handle_engine_results: Basic engine results processing - test_handle_engine_results_two_sources: Multiple sources for same system - test_satellite_handle_engine_results: Satellite-managed systems - test_handle_engine_results_bad_keys: Invalid message validation - test_handle_rule_hits: Third-party rule hits processing - test_handle_rule_hits_missing_keys: Invalid rule hits rejection - test_non_rhel_system_filtering: FILTER_OUT_NON_RHEL setting - test_rhel6_system_filtering: FILTER_OUT_RHEL6 setting Uses responses library for HTTP mocking instead of patch, override_settings for configuration changes, and loads sample data from JSON files. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Port webhook tests from service/tests/test_advisor_service.py to Django TestCase format in test_advisor_service_webhooks.py. Tests ported: - test_generate_webhook_msgs_new_report: Validate webhook messages for new reports, including account_id, org_id, context, events payload with rule details (rule_id, reboot_required, has_incident), and remediations events - test_generate_webhook_msgs_resolved_report: Validate webhook messages for resolved reports when active rules are no longer present Uses DummyProducer from kafka_utils to capture Kafka messages sent by reports.trigger_report_hooks() instead of mocking functions. This allows inspection of actual message content sent to Kafka topics. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Port database error handling tests from service/tests/test_advisor_service.py to Django TestCase format in test_advisor_service_database.py. Tests ported: - test_db_reports_bad_upload: Verify exception handling in upload operations - test_db_reports_bad_upload_source: Test None upload source handling - test_db_reports_upload_source_exception: Test get_or_create exceptions Note: Database retry tests (test_db_one_failure, test_db_repeated_failure) were not ported as Django's built-in connection management handles database retries automatically. Manual retry logic was removed from create_db_reports per project requirements. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Remove unused imports: - SystemType from api.models (not used in tests) - constants from api.tests (not used in tests) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copied the following modules from service/ to api/advisor/: - bounded_executor.py - thread_storage.py - payload_tracker.py - utils.py - reports.py - build_info.py - prometheus.py Updated all copied modules to use Django settings instead of service settings: - Changed "import settings" to "from django.conf import settings" - Removed Django setup code from reports.py (already handled by Django) Use DummyProducer in tests to avoid Kafka connection hangs: - reports.py and payload_tracker.py now import DummyProducer when settings.TESTING is True - Renamed producer variable from 'p' to '_producer' for clarity - Updated test_advisor_service_webhooks.py to patch '_producer' This allows the Django test suite to import these modules without blocking on Kafka connections.

Since reports._producer is already initialized with DummyProducer when settings.TESTING is True, we don't need to patch it in tests. Access reports._producer directly instead. Changes: - Removed patch.object() calls - access reports._producer directly - Removed unused imports (patch, DummyProducer) - Added setUp() method to call reports._producer.reset_calls() between tests

The previous code checked settings.TESTING at module import time, which caused Django configuration errors when importing outside of tests. Changed to check settings.TESTING at runtime when creating the producer instance, not at import time. This allows the modules to be imported before Django settings are configured.

The send_remediations_event() function calls produce() with key= and value= parameters, but DummyProducer only accepted message parameter. Updated DummyProducer.produce() to accept both calling styles: - produce(topic, message, callback) - original style - produce(topic, key=..., value=..., callback) - remediations style Used Optional[type] for parameters that can be None.

Set defaults when CLOWDER is not enabled: - WEBHOOKS_TOPIC defaults to 'hooks.outbox' - PAYLOAD_TRACKER_TOPIC defaults to 'platform.payload-status' This ensures the Kafka producers are initialized in test environments.

When both webhook and remediation events are sent, there are 2 poll and 2 flush calls (one from each function). Updated the first test to expect 2 calls instead of 1. The second test only sends webhook events (no remediation), so 1 call is correct.

The original test had a Python bug: 'assert x.count(), 4' doesn't check equality, it just checks that both values are truthy. The correct assertion should check count() == expected_value. Updated to match actual sample data: - insights-client upload: 1 report (sample_engine_results has 1 report) - aiops upload: 2 reports (sample_rule_hits has 2 hits)

Added Counter metrics for inventory host lifecycle events: - INVENTORY_HOST_CREATED - INVENTORY_HOST_UPDATED - INVENTORY_HOST_DELETED These are used by advisor_inventory_service.py to track inventory event processing.

Removed from git tracking but kept on filesystem: - *.db files (Prometheus counter/histogram databases) - test_reports/ directory (XML test output files) These files are generated at runtime and should not be tracked in git.

Ignore patterns for: - Test coverage reports (.coverage, htmlcov/) - Test reports (test_reports/, XML files) - Prometheus database files (*.db) - Generated API schema files (YAML, JSON) - Temporary content files - Service account credentials - Patch files - Type stubs - Standard Python/IDE/OS artifacts This prevents generated and temporary files from being committed to the repository.

Removed from git but kept on filesystem: - Coverage files (.coverage, api/.coverage) - Test reports (junit-service.xml) - API schemas (*.json, *.yaml) - Content files (playbook_content.yaml, rule_content.yaml) - Service account credentials (tasks_pd_runs_service_account.json) - Patch files (tasks_service_async_claude.patch) - Type stubs (type_stubs/, api/advisor/type_stubs.pyi) - Build artifacts (tekton_catalog_uses) - Lock file backup (Pipfile.lock.old) Total: 14 files + 27 type stub files removed from tracking These files are now properly ignored per .gitignore and won't be accidentally committed in the future.

Signed-off-by: Paul Wayper <paulway@redhat.com>

Moved thread-local storage cleanup from individual handler functions into the make_async_handler wrapper, where it executes automatically before each task. This is cleaner and ensures cleanup happens consistently for all wrapped handlers. Changes: - Updated make_async_handler() to inline thread_storage cleanup before calling the handler function - Removed utils.clean_threading_cruft() calls from handle_engine_results() and handle_rule_hits() - Removed utils.clean_threading_cruft() call from prometheus.start_prometheus() (it doesn't use thread_storage) - Removed clean_threading_cruft() function from utils.py along with its outdated Python 3.7 TODO comment - Removed unused imports: thread_storage_object from utils.py, utils from prometheus.py The cleanup is still necessary (thread pools reuse threads, causing data leakage between tasks), but now it's handled automatically by the wrapper pattern rather than requiring manual calls in each handler. All 16 advisor service tests pass.

Consolidates repetitive thread_storage setting patterns throughout the advisor service into three reusable helper functions: - start_operation_tracking(): Sets up tracking context and start time - track_operation_failure(): Records failure with timing and error message - track_operation_success(): Records successful completion with timing Benefits: - Reduces code duplication across handlers - Makes tracking pattern more consistent and maintainable - Simplifies handler logic by encapsulating tracking details - Context fields passed as keyword arguments for clarity Updated handlers: - handle_engine_results(): Now uses helpers for all tracking - handle_rule_hits(): Now uses helpers for all tracking - create_db_reports(): Now uses helpers for all tracking All existing tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Co-written by Paul Wayper and Claude Sonnet 4.5 Signed-off-by: Paul Wayper <paulway@redhat.com>

sourcery-ai

Sorry @PaulWay, your pull request is larger than the review limit of 150000 diff characters

Consolidates duplicate SystemType lookup and error handling from handle_engine_results() and handle_rule_hits() into a single helper function. Changes: - New get_system_type_or_fail() helper function - Uses .get() instead of .filter().first() since (role, product_code) is unique - Handles DoesNotExist exception with consistent error logging - Automatically tracks operation failure and sends payload status - Reduces code duplication by ~25 lines Benefits: - Single source of truth for SystemType validation - More efficient database query using .get() - Consistent error handling across handlers - Easier to maintain and test All existing tests pass. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Signed-off-by: Paul Wayper <paulway@redhat.com>

…ail failure Signed-off-by: Paul Wayper <paulway@redhat.com>

Signed-off-by: Paul Wayper <paulway@redhat.com>

vkrizan · 2026-03-02T15:31:13Z

Hey @PaulWay. This is getting too big to review (252 files changed). There some JSON test results committed that shouldn't be there. I strongly advise to restart this effort with one concern at a time. There is always something that could be done iteratively with smaller changes and merged, without impacting the overall service. It would be easier to review with smaller sets of changes. Thank you.

PaulWay and others added 30 commits February 13, 2026 17:45

Fix Flake8 linter errors in engine/rule hits tests

e5d10d8

Remove unused imports: - SystemType from api.models (not used in tests) - constants from api.tests (not used in tests) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add default values for WEBHOOKS_TOPIC and PAYLOAD_TRACKER_TOPIC

4ed5cf7

Set defaults when CLOWDER is not enabled: - WEBHOOKS_TOPIC defaults to 'hooks.outbox' - PAYLOAD_TRACKER_TOPIC defaults to 'platform.payload-status' This ensures the Kafka producers are initialized in test environments.

Add missing Prometheus metrics for inventory events

4d04095

Added Counter metrics for inventory host lifecycle events: - INVENTORY_HOST_CREATED - INVENTORY_HOST_UPDATED - INVENTORY_HOST_DELETED These are used by advisor_inventory_service.py to track inventory event processing.

Remove test artifacts and database files from git tracking

7082125

Removed from git tracking but kept on filesystem: - *.db files (Prometheus counter/histogram databases) - test_reports/ directory (XML test output files) These files are generated at runtime and should not be tracked in git.

Remove previous temporary work

0ce4329

Signed-off-by: Paul Wayper <paulway@redhat.com>

Remove erroneous HostAck import, unneeded pyright: ignore comment

715feba

Signed-off-by: Paul Wayper <paulway@redhat.com>

Adding typing hints to traverse_keys

bf91ba2

Signed-off-by: Paul Wayper <paulway@redhat.com>

Inverting check logic to avoid excess indent

1cf7636

Signed-off-by: Paul Wayper <paulway@redhat.com>

Rename to send_bad_payload_msg for clarity, better messages

90a4b6c

Signed-off-by: Paul Wayper <paulway@redhat.com>

Adding Claude.md to describe how Claude should work with Advisor

5b9f228

Co-written by Paul Wayper and Claude Sonnet 4.5 Signed-off-by: Paul Wayper <paulway@redhat.com>

PaulWay requested review from brantleyr, chenlizhong, dkuc, mhuth and vkrizan February 16, 2026 22:25

PaulWay self-assigned this Feb 16, 2026

PaulWay added Advisor The 'Advisor' component of the code. Service The long running service components of Advisor and Tasks labels Feb 16, 2026

sourcery-ai bot reviewed Feb 16, 2026

View reviewed changes

PaulWay added the AI Flags work where AI code generation is a significant contribution. label Feb 16, 2026

PaulWay and others added 4 commits February 17, 2026 09:35

Fold finish time calculations into track_operation_finish()

7016cd9

Signed-off-by: Paul Wayper <paulway@redhat.com>

Test messages ignored when unleash flag not set, get_system_type_or_f…

a09c74b

…ail failure Signed-off-by: Paul Wayper <paulway@redhat.com>

Claude should ALWAYS sign its work

e24e2f4

Signed-off-by: Paul Wayper <paulway@redhat.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move the Advisor Service into the Inventory Service code#108

Move the Advisor Service into the Inventory Service code#108
PaulWay wants to merge 34 commits intomainfrom
service_into_advisor_claude

PaulWay commented Feb 16, 2026

Uh oh!

sourcery-ai bot left a comment

Uh oh!

vkrizan commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

PaulWay commented Feb 16, 2026

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

vkrizan commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants