Skip to content

[bug] zone-stats Parametr-chart test flakes when concurrent workers record hits — SVG response and HTML table read different DB snapshots #454

@hubertgajewski

Description

@hubertgajewski

User Story

As a tester, I want tests/zone-stats.spec.ts › every Parametr chart matches the data table and is distinct on /zone/stats/ to pass deterministically when other workers are concurrently firing tracking hits, so that CI runs are not retry-rescued by 0.16% drift between the chart and the table.

Context

In CI run 25204245336 (PR #452, attempt 1, Mobile Safari 2/2 shard), the test failed twice on the "Przeglądarki i inne aplikacje WWW" (browsers) Parametr and passed on retry #1:

  • row 1: chart [43.59%] vs table 43.75% → Δ=16 hundredths (tolerance 5)
  • row 2: chart [37.29%] vs table 37.41% → Δ=12 hundredths (tolerance 5)

Mechanism — playwright/typescript/utils/svg-chart.util.ts:149-192:

const [response] = await Promise.all([
  page.waitForResponse((r) => r.url().includes(ctx.svgChartUrlFragment), { timeout: 60_000 }),
  ctx.submit.click(),
]);
// ...
const pairs = await svgChartPairs(page, await response.text());     // SVG response body
const tableTop = await dataTableTopRows(page, pairs.length);        // page DOM (HTML response)

The single submit.click() triggers two independent backend reads:

  1. The form-POST navigation response carries the data-table HTML (computed during that request).
  2. The resulting page contains <object type="image/svg+xml" data="…">, whose data URL fires a second request that re-queries the DB to render the SVG chart.

Parallel Playwright workers continuously call fireTrackingHit (playwright/typescript/utils/track-hit.util.ts:31) which posts to /scripts/. Hits arriving between the two reads shift per-dimension proportions by a few hundredths of a percent — enough to exceed CHART_TABLE_TOLERANCE_HUNDREDTHS (5).

Why the marker-filtered zone-hits tests do not suffer the same race: they filter results by a unique runMarker = randomUUID(), so concurrent hits from other workers cannot pollute their counts. This race only affects tests that compare two independently-fetched aggregates of the same query.

Acceptance Criteria

Scenario 1 — concurrent hit-write does not flake the test

  • Given the populated account on stage and Playwright running with full parallelism (all browser projects + shards)
  • When tests/zone-stats.spec.ts › every Parametr chart matches the data table and is distinct on /zone/stats/ runs three times back-to-back
  • Then every run passes on attempt 0 (no retry rescue) for every Parametr option

Scenario 2 — genuine product mismatch still fails

  • Given the SVG chart and the data table disagree by more than the chosen tolerance for a reason other than mid-test hit drift (e.g. a real product bug that puts a different label on the chart than in the table)
  • When the test runs
  • Then the assertion fails with a clear message naming the option, row, chart value, and table value

Implementation Hint

Pick one of these mitigations and document the choice in the PR body:

  1. Single-snapshot read — capture the data-table HTML from the same response that triggers the SVG fetch, before the SVG request returns. After submit.click() resolves, take a page.content() snapshot, parse the table from that string, and only then await the SVG response. This gives a strict bound on the time window during which hits can drift the data, and may eliminate the race entirely if the table is server-rendered before the <object> triggers its fetch.
  2. Empirically-widened tolerance — bump CHART_TABLE_TOLERANCE_HUNDREDTHS to the maximum drift observed across ≥3 full-parallel CI runs, plus a margin. Cheap, but loses sensitivity for catching real product drift.
  3. Quiesced fixture — out of scope for this stage env; mention as rejected in the PR body.

Edit site: playwright/typescript/utils/svg-chart.util.ts:149-192.

Definition of Done

  • Mitigation chosen, implemented, and rationale documented in the PR body.
  • tests/zone-stats.spec.ts › every Parametr chart matches the data table and is distinct on /zone/stats/ runs ≥3× under full parallelism on stage with zero failures and zero retries — link the three CI run URLs in the PR.
  • PR merged.
  • Estimate = 2 set in Project #1.
  • Actual hours recorded in Project #1.

Risks / Scope notes

  • The race cannot be eliminated at the product level without changing how /zone/stats/ serves chart+table; the test layer must absorb it.
  • The other failures in run 25204245336 (page.goto timeouts in /zone/hits/, /zone/scripts/, and local file:// tracking fixtures) are unrelated — they are navigation timeouts, not data races, and should be tracked separately if they recur.

Metadata

Metadata

Labels

bugSomething isn't workingflakinessFlaky test riskfound-by-testBug caught by automated teststest-qualityTest code quality improvements

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions