Skip to content

Feat!: Ensure metadata snapshots with modified audits are still audited #4341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

VaggelisD
Copy link
Contributor

Fixes #3970

@VaggelisD VaggelisD force-pushed the vaggelisd/audit_ux branch from 7c40682 to eebae76 Compare May 8, 2025 18:19
@VaggelisD VaggelisD force-pushed the vaggelisd/audit_ux branch from eebae76 to b48005a Compare May 8, 2025 18:46
@@ -1069,6 +1069,34 @@ def _data_hash_values(self) -> t.List[str]:

return data # type: ignore

def _audit_metadata(self) -> t.List[str]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_audit_metadata_hash_values to be consistent with _data_hash_values

start, end = interval

try:
scheduler._audit_snapshot(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like this API. Can we somehow reuse the entire scheduler machinery as is by supporting audit_only mode in run API? With audit_only we'll bypass evaluate and interval recording but do everything else. We can also use the restatements argument to remove relevant intervals so that they can be reprocessed again.

Copy link
Contributor Author

@VaggelisD VaggelisD May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that was my initial approach I didn't find it better, we would end up hacking run() to switch off all lines except of the audit. This will get messy with if not audit_only everywhere imo, but happy to give it a try.

We can also use the restatements argument to remove relevant intervals so that they can be reprocessed again.

Sorry if I'm not understanding, but won't this add perf regressions to this PR? Currently, we retrieve the intervals from the latest snapshot so we can run the audits without having to reevaluate or compute anything.

Afaict, with restatements we can simply pass in [model_start, today] for each snapshot and get this "computation" for free, but in exchange we have to remove + recompute + restore the intervals all over again only to get the same interval set back, right?

previous_snapshot_id
]

new_audits = snapshot.model._audit_metadata()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest having a proper top level method for this audit_metadata_hash

Copy link
Contributor Author

@VaggelisD VaggelisD May 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you'd like a top-level method in the model definition that returns both the list and it's hash, right?

# core.model.definition

def audit_metadata_hash(self):
  audits = self._audit_metadata()
  return audits, hash_data(audits)

So then the highlighted section would be:

# core.plan.evaluator

_, previous_audits_hash = previous_snapshot.model.audit_metadata_hash()
new_audits, new_audits_hash = snapshot.model.audit_metadata_hash()

if (previous_audits_hash != new_audits_hash) and new_audits:
  ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adding an audit is a metadata change and it doesn't run on plan application
2 participants