fix(audit): defer row fetch in audit logs list query to avoid full-row scan by yan-3005 · Pull Request #28851 · open-metadata/OpenMetadata

yan-3005 · 2026-06-09T06:45:08Z

Problem

GET /api/v1/audit/logs is extremely slow on large audit_log_event tables (millions of rows). A single page (limit=25) over a 24-hour window can take 90+ seconds, ~99% of it in the database — even though the request passes a bounded limit and the event_ts index exists and is used.

Root cause

The list query selects all columns, including the large event_json (LONGTEXT, ~16 KB/row), with ORDER BY event_ts DESC, id DESC LIMIT n. To satisfy ORDER BY + LIMIT, the engine does a non-covering index range scan — it reads the full row (including event_json) for every row in the time window, then trims to limit. For a ~112k-row window that is ~112k full-row reads per page, which is the entire cost.

EXPLAIN ANALYZE over a ~111k-row window:

Query	Full rows read	Time
Current list query	~111,747	~153,000 ms
`COUNT` over the same rows (index-only)	0	~21 ms
Deferred-join rewrite	26	~180 ms

Fix

Deferred join / late row lookup — resolve the page of ids from the index first (index-only, no event_json read), then join back for the full columns of only the final page:

SELECT a.<cols>
FROM audit_log_event a
JOIN (SELECT id FROM audit_log_event <condition> <orderClause> LIMIT :limit) k
  ON a.id = k.id
<orderClauseQualified>

Inner subquery keeps the existing <condition> and all @Bind params unchanged; it is an index-only scan over idx_audit_log_event_ts that picks the top-N ids.
Outer ORDER BY must be qualified (a.event_ts, a.id) because id is ambiguous after the join — added ORDER_DESC_QUALIFIED / ORDER_ASC_QUALIFIED alongside the existing inner-scope ORDER_DESC / ORDER_ASC. Both directions are needed (backward pagination sorts ASC then reverses in Java).
Single shared @SqlQuery, valid on both MySQL and PostgreSQL (standard SQL).

Why results are identical (not just faster)

id is the primary key (unique, non-null) → the join is strictly 1:1: no rows dropped, none duplicated.
The inner applies the same WHERE / ORDER BY / LIMIT as before, and (event_ts, id) is a total order, so the top-N is deterministic and identical to the old query.
A JOIN does not preserve subquery order, so the outer re-sorts by the same keys — reproducing the original order exactly.

This is purely a performance change; the result set and ordering are unchanged. The only load-bearing assumption is id being a unique PK, which the schema guarantees.

Changes

CollectionDAO.AuditLogDAO.list — deferred-join SQL + new @Define("orderClauseQualified").
AuditLogRepository — added ORDER_DESC_QUALIFIED / ORDER_ASC_QUALIFIED and passed them at both list(...) call sites (forward + backward pagination). exportInBatches delegates to list, so it is fixed automatically.

Tests

Added two integration tests in AuditLogResourceIT that exercise the deferred join with real multi-page data:

test_listAuditLogs_deferredJoin_forwardPaginationOrderingIsConsistent — seeds an audit-event burst, pages forward through a fixed-endTs window, and asserts strict (eventTs DESC, id DESC) ordering and no duplicate ids across pages (covers ORDER_DESC_QUALIFIED + the join).
test_listAuditLogs_deferredJoin_backwardPaginationMatchesForward — pages forward to obtain a before cursor, then backward, and asserts the backward page reproduces the forward page in identical order (covers ORDER_ASC_QUALIFIED).

These are correctness/regression guards. A "fails-without-the-fix" test is intentionally not included: the old query is correct, just slow, and the slowness only reproduces at multi-million-row scale that CI does not have — so a timing assertion would pass on the un-optimized query too and would be meaningless. The tests instead guard the real risk introduced by the rewrite (dropped/duplicated/reordered rows, ambiguous-id SQL), on both MySQL and PostgreSQL.

Follow-ups (not in this PR)

Unindexed filters (entity_type, event_type) still scan within the window — add (entity_type, event_ts) / (event_type, event_ts) composite indexes if used at scale.
Per-request total COUNT could be made opt-in for cursor pagination.
Table size / disk reclamation is a separate retention concern (DataRetention + OPTIMIZE TABLE).

…w scan (#28850)

github-actions · 2026-06-09T06:45:19Z

✅ PR checks passed

The linked issue has a description and all required Shipping project fields set. Thanks!

github-actions · 2026-06-09T06:45:30Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot

Pull request overview

This PR optimizes the GET /api/v1/audit/logs list query to avoid full-row scans of audit_log_event (notably the large event_json) when paging over large time windows, by switching to a deferred-join (late row lookup) pattern that first selects the top-N ids and then joins back to fetch full columns only for those rows.

Changes:

Rewrote CollectionDAO.AuditLogDAO.list SQL to JOIN against a limited inner subquery of ids, deferring full-row reads until after LIMIT.
Updated AuditLogRepository to pass a qualified outer ORDER BY clause (a.event_ts, a.id) to avoid ambiguity after the join for both forward and backward pagination paths.
Added integration tests ensuring forward/backward cursor pagination preserves strict (eventTs DESC, id DESC) ordering and page consistency with the deferred-join query.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/CollectionDAO.java	Rewrites audit log list SQL to a deferred-join query and adds a new `@Define` for the qualified outer `ORDER BY`.
openmetadata-service/src/main/java/org/openmetadata/service/audit/AuditLogRepository.java	Introduces qualified order constants and wires them into DAO calls for both forward and backward pagination.
openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/AuditLogResourceIT.java	Adds regression tests validating ordering and cursor pagination equivalence under the deferred-join rewrite.

github-actions · 2026-06-09T09:16:49Z

🟡 Playwright Results — all passed (12 flaky)

✅ 4272 passed · ❌ 0 failed · 🟡 12 flaky · ⏭️ 88 skipped

Shard	Passed	Flaky	Skipped
🟡 Shard 1	300	1	4
🟡 Shard 2	802	4	9
✅ Shard 3	808	0	8
✅ Shard 4	843	0	12
🟡 Shard 5	719	2	47
🟡 Shard 6	800	5	8

🟡 12 flaky test(s) (passed on retry)

Features/DataAssetRulesDisabled.spec.ts › Verify the Chart entity item action after rules disabled (shard 1, 1 retry)
Features/BulkImport.spec.ts › Table (shard 2, 1 retry)
Features/DataQuality/ColumnLevelTests.spec.ts › Column Values To Be Not In Set (shard 2, 1 retry)
Features/DataQuality/TestCaseImportExportE2eFlow.spec.ts › Admin: Complete export-import-validate flow (shard 2, 1 retry)
Features/DataQuality/TestCaseResultPermissions.spec.ts › User with only VIEW cannot PATCH results (shard 2, 1 retry)
Pages/Entity.spec.ts › Announcement create, edit & delete (shard 5, 1 retry)
Pages/EntityDataSteward.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
Pages/Glossary.spec.ts › Column dropdown drag-and-drop functionality for Glossary Terms table (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage service type filter selection (shard 6, 1 retry)
Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)

📦 Download artifacts

How to debug locally

# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

harshach · 2026-06-09T15:10:58Z

-            + "actor_type, impersonated_by, service_name, "
-            + "entity_type, entity_id, entity_fqn, entity_fqn_hash, event_json, search_text, created_at "
-            + "FROM audit_log_event <condition> <orderClause> LIMIT :limit")
+        "SELECT a.id, a.change_event_id, a.event_ts, a.event_type, a.user_name, "


why we need this join @yan-3005

gitar-bot · 2026-06-10T10:22:25Z

Code Review ✅ Approved

Optimizes the audit log list query using a deferred join to bypass full-row scans, reducing response time from 90+ seconds to milliseconds. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

`Compact`
`gitar display:verbose`

_{Was this helpful? React with 👍 / 👎 | Gitar}

sonarqubecloud · 2026-06-10T11:33:07Z

Quality Gate passed for 'open-metadata-ingestion'

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

fix(audit): defer row fetch in audit logs list query to avoid full-ro…

8315256

…w scan (#28850)

Copilot AI review requested due to automatic review settings June 9, 2026 06:45

yan-3005 added bug Something isn't working governance labels Jun 9, 2026

yan-3005 self-assigned this Jun 9, 2026

Copilot started reviewing on behalf of yan-3005 June 9, 2026 06:45 View session

yan-3005 added safe to test Add this label to run secure Github workflows on PRs To release Will cherry-pick this PR into the release branch labels Jun 9, 2026

Copilot AI reviewed Jun 9, 2026

View reviewed changes

yan-3005 temporarily deployed to test June 9, 2026 06:58 — with GitHub Actions Inactive

yan-3005 removed the governance label Jun 9, 2026

harshach approved these changes Jun 9, 2026

View reviewed changes

harshach requested changes Jun 9, 2026

View reviewed changes

Merge branch 'main' into audit-logs-handoff-doc

e0c20e6

yan-3005 temporarily deployed to test June 10, 2026 10:34 — with GitHub Actions Inactive

yan-3005 had a problem deploying to test June 10, 2026 10:34 — with GitHub Actions Failure

yan-3005 temporarily deployed to test June 10, 2026 10:34 — with GitHub Actions Inactive

yan-3005 temporarily deployed to test June 10, 2026 15:04 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audit): defer row fetch in audit logs list query to avoid full-row scan#28851

fix(audit): defer row fetch in audit logs list query to avoid full-row scan#28851
yan-3005 wants to merge 2 commits into
mainfrom
audit-logs-handoff-doc

yan-3005 commented Jun 9, 2026

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

harshach Jun 9, 2026

Uh oh!

gitar-bot Bot commented Jun 10, 2026

Uh oh!

sonarqubecloud Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yan-3005 commented Jun 9, 2026

Problem

Root cause

Fix

Why results are identical (not just faster)

Changes

Tests

Follow-ups (not in this PR)

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ PR checks passed

Uh oh!

github-actions Bot commented Jun 9, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🟡 Playwright Results — all passed (12 flaky)

Uh oh!

harshach Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

gitar-bot Bot commented Jun 10, 2026

Uh oh!

sonarqubecloud Bot commented Jun 10, 2026

Quality Gate passed for 'open-metadata-ingestion'

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 9, 2026 •

edited

Loading

github-actions Bot commented Jun 9, 2026 •

edited

Loading