Skip to content

Move the Advisor Service into the Inventory Service code#108

Open
PaulWay wants to merge 34 commits intomainfrom
service_into_advisor_claude
Open

Move the Advisor Service into the Inventory Service code#108
PaulWay wants to merge 34 commits intomainfrom
service_into_advisor_claude

Conversation

@PaulWay
Copy link
Collaborator

@PaulWay PaulWay commented Feb 16, 2026

Much of this work has been done by Claude Sonnet 4.5.

The purpose of this work is two-fold.

Firstly, to move the service code fully into using Django, like the rest of the code base. This allows us to have one test suite that works

Secondly, to integrate the service code into the existing Django management command to process Inventory event replication.

A tertiary goal is to significantly reduce the size and complexity of the service code, by using Django's own database connection retry logic rather than manual intervention, and to reduce duplication by refactoring common practices into simpler functions.

PaulWay and others added 30 commits February 13, 2026 17:45
Add settings, imports, and helper functions needed for merging
service/service.py into advisor_inventory_service.py.

- Add THREAD_POOL_SIZE, SERVICE_APP_NAME, SERVICE_GROUP_ID settings
- Add DISABLE_PROMETHEUS setting
- Add FILTER_OUT_NON_RHEL and FILTER_OUT_RHEL6 settings
- Import BoundedExecutor, thread_storage, payload_tracker, utils, reports
- Import additional Django models (Ack, CurrentReport, etc.)
- Add make_async_handler() wrapper to submit handlers to thread pool
- Add on_thread_done() callback for exception handling

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy create_db_reports from service/service.py for creating and updating
reports, uploads, and hosts from engine results and rule hits.

- Creates/updates Host records with satellite and branch information
- Handles autoack logic for new accounts
- Creates/updates Upload records with source tracking
- Filters reports by RHEL version (FILTER_OUT_NON_RHEL, FILTER_OUT_RHEL6)
- Creates/updates CurrentReport records with impacted dates
- Triggers webhooks and remediations via report_hooks
- Uses atomic transactions with select_for_update() locks
- Tracks performance metrics via thread_storage and prometheus

Database retry logic removed - Django handles connection retry automatically.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy handle_engine_results from service/service.py for processing
Insights Engine rule processing results from platform.engine.results topic.

- Validates required keys in engine results message
- Extracts system data, platform metadata, and engine reports
- Sets thread_storage values for payload tracking and metrics
- Validates system type exists in database
- Calls create_db_reports() to create Upload and CurrentReport records
- Handles satellite_managed, satellite_id, and branch_id fields
- Tracks timing metrics for performance monitoring

Updated to use KafkaDispatcher signature (topic, message).
Removed manual database retry logic - Django handles automatically.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy handle_rule_hits from service/service.py for processing
third-party rule hits from platform.insights.rule-hits topic.

- Validates required keys: org_id, source, host_product, host_role,
  inventory_id, hits
- Normalizes host_role and host_product to lowercase
- Validates system type exists in database
- Sets thread_storage values for payload tracking
- Calls create_db_reports() to create reports from rule hits
- Tracks timing metrics for performance monitoring

Updated to use KafkaDispatcher signature (topic, message).
Removed manual database retry logic - Django handles automatically.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add INSIGHTS_ADVISOR_SERVICE_INVENTORY_EVENTS_ELAPSED timer decorator
to the existing handle_inventory_event() handler for consistency with
other handlers and performance tracking.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Replace Command.handle() method with comprehensive service initialization
that integrates BoundedExecutor thread pool with KafkaDispatcher.

Changes:
- Initialize BoundedExecutor with THREAD_POOL_SIZE workers
- Start Prometheus server in background thread if not disabled
- Setup Prometheus metrics (INSIGHTS_ADVISOR_STATUS, INSIGHTS_ADVISOR_UP)
- Register three handlers with async wrappers:
  * ENGINE_RESULTS_TOPIC -> handle_engine_results
  * RULE_HITS_TOPIC -> handle_rule_hits
  * INVENTORY_TOPIC -> handle_inventory_event
- All handlers submitted to thread pool via make_async_handler wrapper
- Setup SIGTERM/SIGINT signal handlers for graceful shutdown
- Update Prometheus state through service lifecycle
- Shutdown executor cleanly on exit
- Update help text to reflect expanded scope

This completes the merge of service/service.py functionality into
advisor_inventory_service.py with concurrent message processing.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copy sample JSON test data files from service/tests/ to api/advisor/api/tests/:
- sample_engine_results.json: Engine results from Insights Engine (262KB)
- sample_satellite_engine_results.json: Satellite-managed system results
- sample_rhel6_engine_results.json: RHEL 6 system results for filtering tests
- sample_rule_hits.json: Third-party rule hits from aiops/RHV
- sample_report.json: Sample report data for webhook tests

These files contain test fixtures for validating engine results processing,
rule hits handling, RHEL version filtering, and webhook event generation.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Port tests from service/tests/test_advisor_service.py to Django TestCase
format in test_advisor_service_engine_rule_hits.py.

Tests ported:
- test_similar_uploads: Verify upload record updates for repeated uploads
- test_impacted_date: Test impacted_date preservation and reset logic
- test_autoacks_for_new_account: Validate autoack creation for new accounts
- test_handle_engine_results: Basic engine results processing
- test_handle_engine_results_two_sources: Multiple sources for same system
- test_satellite_handle_engine_results: Satellite-managed systems
- test_handle_engine_results_bad_keys: Invalid message validation
- test_handle_rule_hits: Third-party rule hits processing
- test_handle_rule_hits_missing_keys: Invalid rule hits rejection
- test_non_rhel_system_filtering: FILTER_OUT_NON_RHEL setting
- test_rhel6_system_filtering: FILTER_OUT_RHEL6 setting

Uses responses library for HTTP mocking instead of patch, override_settings
for configuration changes, and loads sample data from JSON files.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Port webhook tests from service/tests/test_advisor_service.py to Django
TestCase format in test_advisor_service_webhooks.py.

Tests ported:
- test_generate_webhook_msgs_new_report: Validate webhook messages for new
  reports, including account_id, org_id, context, events payload with rule
  details (rule_id, reboot_required, has_incident), and remediations events
- test_generate_webhook_msgs_resolved_report: Validate webhook messages for
  resolved reports when active rules are no longer present

Uses DummyProducer from kafka_utils to capture Kafka messages sent by
reports.trigger_report_hooks() instead of mocking functions. This allows
inspection of actual message content sent to Kafka topics.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Port database error handling tests from service/tests/test_advisor_service.py
to Django TestCase format in test_advisor_service_database.py.

Tests ported:
- test_db_reports_bad_upload: Verify exception handling in upload operations
- test_db_reports_bad_upload_source: Test None upload source handling
- test_db_reports_upload_source_exception: Test get_or_create exceptions

Note: Database retry tests (test_db_one_failure, test_db_repeated_failure)
were not ported as Django's built-in connection management handles database
retries automatically. Manual retry logic was removed from create_db_reports
per project requirements.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Remove unused imports:
- SystemType from api.models (not used in tests)
- constants from api.tests (not used in tests)

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Copied the following modules from service/ to api/advisor/:
- bounded_executor.py
- thread_storage.py
- payload_tracker.py
- utils.py
- reports.py
- build_info.py
- prometheus.py

Updated all copied modules to use Django settings instead of service
settings:
- Changed "import settings" to "from django.conf import settings"
- Removed Django setup code from reports.py (already handled by Django)

Use DummyProducer in tests to avoid Kafka connection hangs:
- reports.py and payload_tracker.py now import DummyProducer when
  settings.TESTING is True
- Renamed producer variable from 'p' to '_producer' for clarity
- Updated test_advisor_service_webhooks.py to patch '_producer'

This allows the Django test suite to import these modules without
blocking on Kafka connections.
Since reports._producer is already initialized with DummyProducer when
settings.TESTING is True, we don't need to patch it in tests. Access
reports._producer directly instead.

Changes:
- Removed patch.object() calls - access reports._producer directly
- Removed unused imports (patch, DummyProducer)
- Added setUp() method to call reports._producer.reset_calls() between tests
The previous code checked settings.TESTING at module import time, which
caused Django configuration errors when importing outside of tests.

Changed to check settings.TESTING at runtime when creating the producer
instance, not at import time. This allows the modules to be imported
before Django settings are configured.
The send_remediations_event() function calls produce() with key= and
value= parameters, but DummyProducer only accepted message parameter.

Updated DummyProducer.produce() to accept both calling styles:
- produce(topic, message, callback) - original style
- produce(topic, key=..., value=..., callback) - remediations style

Used Optional[type] for parameters that can be None.
Set defaults when CLOWDER is not enabled:
- WEBHOOKS_TOPIC defaults to 'hooks.outbox'
- PAYLOAD_TRACKER_TOPIC defaults to 'platform.payload-status'

This ensures the Kafka producers are initialized in test environments.
When both webhook and remediation events are sent, there are 2 poll
and 2 flush calls (one from each function). Updated the first test to
expect 2 calls instead of 1.

The second test only sends webhook events (no remediation), so 1
call is correct.
The original test had a Python bug: 'assert x.count(), 4' doesn't check
equality, it just checks that both values are truthy. The correct
assertion should check count() == expected_value.

Updated to match actual sample data:
- insights-client upload: 1 report (sample_engine_results has 1 report)
- aiops upload: 2 reports (sample_rule_hits has 2 hits)
Added Counter metrics for inventory host lifecycle events:
- INVENTORY_HOST_CREATED
- INVENTORY_HOST_UPDATED
- INVENTORY_HOST_DELETED

These are used by advisor_inventory_service.py to track inventory
event processing.
Removed from git tracking but kept on filesystem:
- *.db files (Prometheus counter/histogram databases)
- test_reports/ directory (XML test output files)

These files are generated at runtime and should not be tracked in git.
Ignore patterns for:
- Test coverage reports (.coverage, htmlcov/)
- Test reports (test_reports/, XML files)
- Prometheus database files (*.db)
- Generated API schema files (YAML, JSON)
- Temporary content files
- Service account credentials
- Patch files
- Type stubs
- Standard Python/IDE/OS artifacts

This prevents generated and temporary files from being committed to the
repository.
Removed from git but kept on filesystem:
- Coverage files (.coverage, api/.coverage)
- Test reports (junit-service.xml)
- API schemas (*.json, *.yaml)
- Content files (playbook_content.yaml, rule_content.yaml)
- Service account credentials (tasks_pd_runs_service_account.json)
- Patch files (tasks_service_async_claude.patch)
- Type stubs (type_stubs/, api/advisor/type_stubs.pyi)
- Build artifacts (tekton_catalog_uses)
- Lock file backup (Pipfile.lock.old)

Total: 14 files + 27 type stub files removed from tracking

These files are now properly ignored per .gitignore and won't be
accidentally committed in the future.
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
Moved thread-local storage cleanup from individual handler functions into
the make_async_handler wrapper, where it executes automatically before
each task. This is cleaner and ensures cleanup happens consistently for
all wrapped handlers.

Changes:
- Updated make_async_handler() to inline thread_storage cleanup before
  calling the handler function
- Removed utils.clean_threading_cruft() calls from handle_engine_results()
  and handle_rule_hits()
- Removed utils.clean_threading_cruft() call from prometheus.start_prometheus()
  (it doesn't use thread_storage)
- Removed clean_threading_cruft() function from utils.py along with its
  outdated Python 3.7 TODO comment
- Removed unused imports: thread_storage_object from utils.py, utils from
  prometheus.py

The cleanup is still necessary (thread pools reuse threads, causing data
leakage between tasks), but now it's handled automatically by the wrapper
pattern rather than requiring manual calls in each handler.

All 16 advisor service tests pass.
Consolidates repetitive thread_storage setting patterns throughout
the advisor service into three reusable helper functions:

- start_operation_tracking(): Sets up tracking context and start time
- track_operation_failure(): Records failure with timing and error message
- track_operation_success(): Records successful completion with timing

Benefits:
- Reduces code duplication across handlers
- Makes tracking pattern more consistent and maintainable
- Simplifies handler logic by encapsulating tracking details
- Context fields passed as keyword arguments for clarity

Updated handlers:
- handle_engine_results(): Now uses helpers for all tracking
- handle_rule_hits(): Now uses helpers for all tracking
- create_db_reports(): Now uses helpers for all tracking

All existing tests pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-written by Paul Wayper and Claude Sonnet 4.5

Signed-off-by: Paul Wayper <paulway@redhat.com>
@PaulWay PaulWay self-assigned this Feb 16, 2026
@PaulWay PaulWay added Advisor The 'Advisor' component of the code. Service The long running service components of Advisor and Tasks labels Feb 16, 2026
Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @PaulWay, your pull request is larger than the review limit of 150000 diff characters

@PaulWay PaulWay added the AI Flags work where AI code generation is a significant contribution. label Feb 16, 2026
PaulWay and others added 4 commits February 17, 2026 09:35
Consolidates duplicate SystemType lookup and error handling from
handle_engine_results() and handle_rule_hits() into a single helper
function.

Changes:
- New get_system_type_or_fail() helper function
- Uses .get() instead of .filter().first() since (role, product_code)
  is unique
- Handles DoesNotExist exception with consistent error logging
- Automatically tracks operation failure and sends payload status
- Reduces code duplication by ~25 lines

Benefits:
- Single source of truth for SystemType validation
- More efficient database query using .get()
- Consistent error handling across handlers
- Easier to maintain and test

All existing tests pass.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
…ail failure

Signed-off-by: Paul Wayper <paulway@redhat.com>
Signed-off-by: Paul Wayper <paulway@redhat.com>
@vkrizan
Copy link
Collaborator

vkrizan commented Mar 2, 2026

Hey @PaulWay. This is getting too big to review (252 files changed). There some JSON test results committed that shouldn't be there. I strongly advise to restart this effort with one concern at a time. There is always something that could be done iteratively with smaller changes and merged, without impacting the overall service. It would be easier to review with smaller sets of changes. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Advisor The 'Advisor' component of the code. AI Flags work where AI code generation is a significant contribution. Service The long running service components of Advisor and Tasks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants