Skip to content

Conversation

MarcusSorealheis
Copy link
Collaborator

@MarcusSorealheis MarcusSorealheis commented May 23, 2025

with validation and recovery.

Description

Add comprehensive validation and self-healing capabilities to the memory awaited action database to handle sync errors between data structures and provide an API + instrumentation for management.

  1. Enhanced error handling with diagnostic logging and recovery context
  2. Added validate_consistency() method to detect data structure corruption
  3. Implemented attempt_recovery() to rebuild hash key mappings from authoritative sources
  4. Added public validate_and_recover() API for external monitoring
  5. Integrated OpenTelemetry instrumentation for observability
  6. Changed error-level logs to warnings with recovery guidance

Fixes #1637 #1638

Type of change

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Added comprehensive test suite covering validation scenarios: (1) test correct behavior/steady state, (2) test case where some actions cannot be cached, (3) test duplicate actions, (4) test after evictions.

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

@MarcusSorealheis
Copy link
Collaborator Author

One thing I did not test because I got sleepy. I couldn't remember if this first surfaced after a big migration for a user, but that did not make sense to me. If it did, I think we need to incorporate migration testing sometime in the next month.

We should also hold on merging this until the @rejuvenile can test this change.

@MarcusSorealheis
Copy link
Collaborator Author

MarcusSorealheis commented May 23, 2025

there are also ohter things I want to do here so it's not quite ready for review. I want to chat with the team and the affected user about how they feel it is best to use the API.

@MarcusSorealheis MarcusSorealheis marked this pull request as draft May 23, 2025 09:13
@MarcusSorealheis MarcusSorealheis marked this pull request as ready for review May 23, 2025 09:18
@MarcusSorealheis MarcusSorealheis force-pushed the bugfix_action_info_hash_key_corrupted branch from 883d620 to 3c8bec8 Compare May 23, 2025 09:19
@MarcusSorealheis MarcusSorealheis changed the title Fix action_info_hash_key_to_awaited_action sync issues (in progress) Fix action_info_hash_key_to_awaited_action sync issues [DO NOT MERGE] May 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

action_info_hash_key_to_awaited_action out of sync
1 participant