Feat!: Ensure metadata snapshots with modified audits are still audited #4341

VaggelisD · 2025-05-08T18:18:43Z

izeigerman · 2025-05-08T19:35:18Z

sqlmesh/core/model/definition.py

@@ -1069,6 +1069,34 @@ def _data_hash_values(self) -> t.List[str]:

        return data  # type: ignore

+    def _audit_metadata(self) -> t.List[str]:


_audit_metadata_hash_values to be consistent with _data_hash_values

izeigerman · 2025-05-08T20:57:33Z

sqlmesh/core/plan/evaluator.py

+                start, end = interval
+
+                try:
+                    scheduler._audit_snapshot(


I'm not sure I like this API. Can we somehow reuse the entire scheduler machinery as is by supporting audit_only mode in run API? With audit_only we'll bypass evaluate and interval recording but do everything else. We can also use the restatements argument to remove relevant intervals so that they can be reprocessed again.

Yeah, that was my initial approach I didn't find it better, we would end up hacking run() to switch off all lines except of the audit. This will get messy with if not audit_only everywhere imo, but happy to give it a try.

We can also use the restatements argument to remove relevant intervals so that they can be reprocessed again.

Sorry if I'm not understanding, but won't this add perf regressions to this PR? Currently, we retrieve the intervals from the latest snapshot so we can run the audits without having to reevaluate or compute anything.

Afaict, with restatements we can simply pass in [model_start, today] for each snapshot and get this "computation" for free, but in exchange we have to remove + recompute + restore the intervals all over again only to get the same interval set back, right?

izeigerman · 2025-05-08T20:59:05Z

sqlmesh/core/plan/evaluator.py

+                previous_snapshot_id
+            ]
+
+            new_audits = snapshot.model._audit_metadata()


I suggest having a proper top level method for this audit_metadata_hash

So you'd like a top-level method in the model definition that returns both the list and it's hash, right?

# core.model.definition def audit_metadata_hash(self): audits = self._audit_metadata() return audits, hash_data(audits)

So then the highlighted section would be:

# core.plan.evaluator _, previous_audits_hash = previous_snapshot.model.audit_metadata_hash() new_audits, new_audits_hash = snapshot.model.audit_metadata_hash() if (previous_audits_hash != new_audits_hash) and new_audits: ...

VaggelisD force-pushed the vaggelisd/audit_ux branch from 7c40682 to eebae76 Compare May 8, 2025 18:19

Feat: Ensure audits run even if adding them is a metadata change

b48005a

VaggelisD force-pushed the vaggelisd/audit_ux branch from eebae76 to b48005a Compare May 8, 2025 18:46

izeigerman reviewed May 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat!: Ensure metadata snapshots with modified audits are still audited #4341

Feat!: Ensure metadata snapshots with modified audits are still audited #4341

VaggelisD commented May 8, 2025

izeigerman May 8, 2025

izeigerman May 8, 2025

VaggelisD May 9, 2025 •

edited

Loading

izeigerman May 8, 2025

VaggelisD May 9, 2025 •

edited

Loading

		@@ -1069,6 +1069,34 @@ def _data_hash_values(self) -> t.List[str]:

		return data # type: ignore

		def _audit_metadata(self) -> t.List[str]:

Feat!: Ensure metadata snapshots with modified audits are still audited #4341

Are you sure you want to change the base?

Feat!: Ensure metadata snapshots with modified audits are still audited #4341

Conversation

VaggelisD commented May 8, 2025

izeigerman May 8, 2025

Choose a reason for hiding this comment

izeigerman May 8, 2025

Choose a reason for hiding this comment

VaggelisD May 9, 2025 • edited Loading

Choose a reason for hiding this comment

izeigerman May 8, 2025

Choose a reason for hiding this comment

VaggelisD May 9, 2025 • edited Loading

Choose a reason for hiding this comment

VaggelisD May 9, 2025 •

edited

Loading

VaggelisD May 9, 2025 •

edited

Loading