Skip to content

refactor(wasm): R2 β€” parse_wasm_config serde reduction (#689 follow-up)#697

Merged
bashandbone merged 6 commits into
stagingfrom
refactor/r2-wasm-config-serde-reduction
May 22, 2026
Merged

refactor(wasm): R2 β€” parse_wasm_config serde reduction (#689 follow-up)#697
bashandbone merged 6 commits into
stagingfrom
refactor/r2-wasm-config-serde-reduction

Conversation

@bashandbone
Copy link
Copy Markdown
Collaborator

Measurements

metric pre (origin/staging 7af85fb2) post (this branch b974550b) delta
WASM pre-opt (--profile release-web β€” CI-gate-comparable) 1,372,062 B 1,364,312 B βˆ’7,750 B (βˆ’0.56%)
WASM pre-opt (--release, agent's working measurement) 1,372,791 B 1,365,032 B βˆ’7,759 B (βˆ’0.57%)
lint_10kb p95 428.08 Β΅s 432.21 Β΅s +0.45% β€” Criterion p=0.88, no statistically significant change
cargo test --workspace green green β€”
cargo +stable clippy --workspace --all-targets --all-features -- -D warnings green green β€”

Above the synthesis-brief gate (β‰₯4 KB net savings). Slightly under the upper end of the preflight 5–8 KB estimate band, consistent with the linker-dedup uncertainty the preflight flagged. Bench is hot-path-neutral as expected (config parsing happens once per lint_native/fix_native call, not on the engine hot path).

Methodology note: wasm-pack 0.14 on this dev environment doesn't apply [package.metadata.wasm-pack.profile.release-web] wasm-opt flag lists reliably; the agent's working measurement used --release for build-stability. The CI-gate-comparable number (--profile release-web, top row above) confirms the delta independently to within 9 bytes β€” the relative delta is the trustworthy signal. Same caveat as documented in tools/wasm-size-check.sh.

Summary

  • Retires #[derive(Deserialize)] from WasmConfig and #[derive(Serialize)] from WasmConfigCacheKey<'a> (latter deleted entirely) β€” the two derives that together accounted for ~10 KB of WASM bundle bloat per perf(wasm): investigate +5.18% WASM binary regression vs. baseline (cumulative across staging window)Β #689 attribution.
  • Replaces them with explicit serde_json::Value::as_object().get(...) extraction via private wasm_config_from_value, plus direct serde_json::Map construction in build_cache_key. serde_json::Value is already in the WASM bundle (used by Diagnostic JSON building, Map<String, Value> for FixIntent-style args, RawValue for diagnostics), so the swap adds no new monomorphization.
  • TDD-first ordering: 19 byte-identity + type-mismatch + Constitution-III tests written and passing on pre-refactor code in commit 1 (fcd5ebca), refactor lands in commit 2 (2ad85eaa). One f32 byte-identity gotcha caught by the tests before reaching review.

Constitution III check β€” runtime-config does not expand engine's semantic surface

Constitution III Β§"runtime configuration that expands the engine's semantic surface" is the load-bearing constraint on WasmConfig. R2 PRESERVES and arguably TIGHTENS the surface:

property pre (Derive) post (R2)
Closed accept-list of {classifier_id, confidence_threshold, corrections, deadline_ms} implicit via struct field list explicit obj.get(...) enumeration at call site
Unknown JSON fields silently ignored yes (struct fields are the implicit accept-set) yes (any name not in the explicit obj.get(...) list is ignored) β€” pinned by r2_unknown_field_silently_ignored + r2_unknown_field_coexists_with_known unit tests
classifier_id accepts any serde-Deserializable type yes (Derive on Option<String> accepts string / null) NO β€” only Value::String accepted; everything else is a loud error. Tightening, not widening.
null JSON value treated as absent yes (#[serde(default)] + Option<T>) yes (`None

The Constitution III invariant is preserved. The tightening on classifier_id (and the other three fields) is a strict narrowing of what gets accepted β€” a JS caller passing a number for classifier_id now gets a loud error instead of being silently coerced.

f32 byte-identity gotcha

serde_json::Value::Number is internally f64-only. Naive replacement Number::from_f64(threshold as f64) widens the f32 bit-pattern to f64 BEFORE the shortest-roundtrip serializer formats it β€” for 0.85_f32 (which is not exactly representable as binary32) this produces "0.8500000238418579" instead of the f32 shortest-roundtrip "0.85" that serialize_f32 would emit. The discrepancy breaks cache-key byte-identity with the retired WasmConfigCacheKey #[derive(Serialize)] path.

Fix routes f32 through serde_json::to_string(&threshold) (which uses serialize_f32 / shortest-f32-roundtrip) then parses the resulting string back into a Value::Number. The roundtrip is safe for any finite f32 and cheap (one short alloc + one parse of a known-valid numeric literal) on engine-cache-miss only.

The byte-identity test r2_byte_identity_threshold_only exercises 0.85 specifically β€” a value that's NOT exactly representable as f32, so it catches a regression to the naive from_f64 path. The fix and rationale are documented in code comments at lines 1083–1097.

Behavior changes β€” errors from the Derive surface

input class pre (Derive) post (R2)
Malformed JSON ({...,) Err with line/column context Err with line/column context (outer serde_json::from_str::<Value>(s) unchanged)
Wrong type on a known field (classifier_id: 123) Err with line/column, generic shape error Err with field name + type hint (e.g., "classifier_id must be a string; got number") β€” no column
Wrong type on a corrections value ({"k": 42}) Err with line/column Err with corrections[k] key + type hint
Unknown field ({"some_other_field": 42}) silently ignored silently ignored (preserved)
Top-level non-object ("hello", [1,2], 42) Err with shape error Err containing "must be a JSON object" + type hint

Net trade: gain field names in errors (better diagnostics for JS callers), lose column numbers for field-level errors (acceptable β€” JS callers get column numbers from the outer parse for malformed JSON, which is the more common error class).

Adjacent-bug survey

  • BatchEntry #[derive(Deserialize)] at lib.rs:811 β€” deferred per synthesis brief. Two plain String fields (no Option, no HashMap); expected savings <1 KB. Unlikely to clear the 4 KB gate solo; will file as R2b only if combinable with other small cuts.
  • ~18 OUTPUT-side #[derive(Serialize)] impls (DiagnosticJson, FixResultJson, RuleIdJson, FixJson, AuditConfidenceJsonV1_0, DeadlineExceededBodyJson, etc., lines 284-1403) β€” these are the JS-API wire contract per contracts/diagnostic.json and contracts/audit-record.json. EXPLICITLY out of scope for the R-series. Touching them is a contract-break PR requiring schema-version bump coordination.
  • serde_json::Map BTreeMap-backed assumption β€” the workspace's serde_json = "1.0.149" is pinned without preserve_order, so Map is alphabetical-iterating. WasmConfig field names happen to be alphabetical, which coincides with the struct-declaration order the prior Derive used β†’ byte-identity preserved without preserve_order. The WasmConfig struct doc-comment + build_cache_key doc-comment + the byte-identity regression tests triangulate this constraint for future maintainers.

Constitution checks

  • VII Β§IV (scheme-adoption boundary): βœ… N/A β€” marque-wasm sits ABOVE the rule chain. This PR touches crates/wasm/src/lib.rs + claudedocs/wasm-bundle-analysis-2026-05-22.md only. Zero edits to engine, scheme, rules, core, ism, capco.
  • VIII (citation fidelity): βœ… No new Β§ citations added.
  • G13 (audit-content-ignorance): βœ… N/A β€” config parsing is engine-INPUT, not audit-OUTPUT.
  • Principle I (Performance): βœ… lint_10kb Criterion p=0.88 β€” no statistically significant change. SC-001 16 ms ceiling 50Γ— headroom intact.
  • III (runtime configuration / engine semantic surface): βœ… PRESERVED + TIGHTENED. See Constitution III check section above.

Test plan

  • cargo test --workspace β€” all green
  • cargo test -p marque-wasm --lib r2_ β€” 19 R2 unit tests green
  • cargo test -p marque-wasm β€” 113 wasm tests green (lib + integration)
  • cargo +stable clippy --workspace --all-targets --all-features -- -D warnings β€” clean
  • cargo fmt --check β€” clean
  • WASM size measured via wasm-pack build crates/wasm --target web --profile release-web (CI-gate-comparable) and --release (working measurement) β€” both confirm β‰₯7.7 KB delta
  • lint_10kb Criterion bench β€” no significant change (p=0.88)
  • TDD-first ordering verified: byte-identity tests pass on pre-refactor commit fcd5ebca (before refactor 2ad85eaa lands)
  • Constitution III silent-ignore behavior pinned by 2 dedicated unit tests
  • f32 byte-identity gotcha caught by TDD-first tests before reaching review

Review-pass changes (commit b974550b)

Three reviewer nits addressed in the final fix-up commit:

  1. MEDIUM (rust-specialist) β€” expect("corrections present") at build_cache_key line 1100 β†’ structurally self-proving if let Some(corrections) = cfg.corrections.as_ref().filter(|c| !c.is_empty()). Same semantic; Some-and-non-empty proof now visible at the binding site.
  2. LOW (general code-reviewer) β€” WasmConfig struct doc-comment now documents the alphabetical-field-naming constraint with a cross-reference to build_cache_key. A developer adding a new field encounters the constraint upfront rather than discovering it via byte-identity test failure.
  3. LOW (general code-reviewer) β€” measurement methodology discrepancy (--release vs --profile release-web) reconciled in the PR body's Measurements section above, with both numbers shown side-by-side and the wasm-pack 0.14 limitation noted.

Declined: pinning field-name presence in the 8 type-mismatch test error strings (LOW per general reviewer; flagged as judgment call). The boundary the tests pin is loud-vs-silent, which is the API contract; field-name quality is implementation-level. Pinning it would lock the present wording, which we explicitly accepted dropping serde's column-number context for.

References

…onfig (#689)

TDD step 1 of the R2 serde-reduction refactor. Adds 19 tests to the
existing in-module `#[cfg(test)] mod tests` block in `crates/wasm/src/lib.rs`
that pin behavior of `parse_wasm_config` before the
`#[derive(Deserialize)]` removal:

 - 9 byte-identity cache-key fixtures capturing exact emission of the
   current `#[derive(Serialize)] WasmConfigCacheKey` path: None / empty
   object / empty corrections / deadline-only all map to cache_key=None;
   classifier_id / threshold / corrections / all-three / deadline+
   classifier produce literal byte strings in struct-declaration order
   (which coincides with alphabetical here β€” alphabetical pins the
   shape, struct-declaration order will pin a future field insertion
   that breaks alphabetical).

 - 8 type-mismatch loud-failure tests asserting `is_err()` for non-
   object top-level inputs and field-level type mismatches. Pre-refactor
   these produce serde-derived error messages; post-refactor they will
   produce field-name-tagged messages. Tests assert only the loud-vs-
   silent boundary, not exact wording β€” both versions pass.

 - 2 Constitution III preservation tests pinning the silently-ignore-
   unknown-fields contract. The current `#[derive(Deserialize)]` has no
   `deny_unknown_fields`; the post-refactor `obj.get(...)` enumeration
   matches that behavior. WASM target's runtime-config surface MUST
   NOT widen β€” these tests catch a regression where a future hand-
   rolled parser accidentally rejects unknown fields and would block
   forward-compatible JS callers.

All 19 tests pass on `origin/staging` HEAD (pre-refactor). Captured
baseline strings via a standalone serde-derive reproducer at
authorship; the in-test assertions are the live regression gate.

Helpers stay private per the R2 synthesis brief's helper-visibility
decision β€” tests live alongside the existing build_cache_key /
parse_deadline_ms tests in `mod tests`, not under `crates/wasm/tests/`,
because the integration-test directory has no access to
`parse_wasm_config` / `build_cache_key` (intentional encapsulation).

Refs #689 Β· refs #696 (R1 precedent)
…ey Serialize (#689)

Replaces the `#[derive(Deserialize)] WasmConfig` and
`#[derive(Serialize)] WasmConfigCacheKey` paths with explicit
`serde_json::Value::as_object().get(...)` extraction and direct
`serde_json::Map` construction in `crates/wasm/src/lib.rs`. The
retired derives were the largest single contributor to
`parse_wasm_config`'s twiggy-attributed symbol size after R1
(`HashMap<String, String>` Deserialize monomorphization ~3-4 KB
plus per-field Visitor scaffolding ~3 KB plus the cache-key
Serialize emission ~1.5 KB).

Why an extraction rewrite and not a smaller-knob tweak:

  - `#[derive(Deserialize)]` on `WasmConfig` monomorphizes a
    full visitor + per-field __Field enum + a HashMap<String,
    String> deserializer, each of which is dead-weight code in
    a WASM build. The four-field accept-list (`classifier_id`,
    `confidence_threshold`, `corrections`, `deadline_ms`) is a
    closed contract β€” enumerating it at the call site via
    `obj.get(...)` is structurally narrower than the derive
    (the derive would have happily accepted any serde-
    deserializable type for a field named `classifier_id`),
    AND smaller after the linker is done.

  - `#[derive(Serialize)]` on `WasmConfigCacheKey` carries a
    similar per-field SerializeMap driver and
    skip_serializing_if codegen; the manual `serde_json::Map`
    construction reaches the same wire format directly.

Constitution III preservation (binding constraint):

  - Closed accept-list of `{classifier_id, confidence_threshold,
    corrections, deadline_ms}` enumerated explicitly in
    `wasm_config_from_value`.
  - Unknown JSON fields are silently ignored β€” preserves the
    pre-R2 derive behavior (no `#[serde(deny_unknown_fields)]`).
    Pinned by the new `r2_unknown_field_silently_ignored` /
    `r2_unknown_field_coexists_with_known` tests.
  - Field-level type mismatches are loud errors with field-
    name context (e.g. `"classifier_id must be a string; got
    number"`). Net gain in diagnostic quality at the cost of
    serde's column-number context, which JS callers do not
    use (the outer `from_str::<Value>` parse still surfaces
    column numbers for malformed JSON).
  - JSON `null` continues to mean "absent" for every field
    (preserves the derive's `Option<T>` semantics).

Cache-key byte-identity preservation:

  - `serde_json::Map` without the `preserve_order` feature
    is BTreeMap-backed and emits in alphabetical key order.
    `WasmConfig` field names are alphabetical today
    (`classifier_id` < `confidence_threshold` <
    `corrections`), which coincides with the
    struct-declaration order serde's `#[derive(Serialize)]`
    used. Byte-identity is pinned by the 9 R2 byte-identity
    tests added in the previous commit.

  - A future cache-key field that breaks alphabetical (e.g.
    inserting `aaa_new_field` between `classifier_id` and
    `confidence_threshold`) will fail the byte-identity tests
    and require either renaming the field or enabling
    `serde_json/preserve_order` and inserting in
    struct-declaration order.

  - One non-obvious detail: `serde_json::Value::Number` is
    f64-only. `Number::from_f64(threshold as f64)` widens
    before formatting and emits the long-form
    `"0.8500000238418579"` for the 0.85 bit-pattern, not the
    shortest-roundtrip `"0.85"` that serde's `serialize_f32`
    path produced. The R2 byte-identity tests caught this on
    the first build; the fix routes the f32 through
    `to_string(&threshold)` (uses `serialize_f32`) and then
    parses-back via `from_str::<Value>` so the resulting
    `Number` retains the shortest-f32 string form. Cost is
    one short alloc + one parse on engine-cache-miss only;
    cheap relative to the bundle savings.

Constitution VII Β§IV (scheme-adoption boundary): N/A β€”
`marque-wasm` sits above the rule chain. This commit touches
ONLY `crates/wasm/src/lib.rs`. Zero edits to engine, scheme,
rules, core, ism, or capco. Verified with
`git diff --stat HEAD~1`.

Net surface changes:
  - retired: `#[derive(Deserialize)] WasmConfig`
  - retired: `#[derive(Serialize)] WasmConfigCacheKey` (struct + impl)
  - added:   `fn wasm_config_from_value(Value) -> Result<WasmConfig, String>`
  - added:   `fn value_type_name(&Value) -> &'static str`
  - rewrote: `fn build_cache_key(&WasmConfig)` using `serde_json::Map`

All 37 marque-wasm unit tests pass (19 R2 tests from commit 1 +
18 pre-existing). All 76 integration tests pass across 10 test
binaries (113 total). `cargo +stable clippy -p marque-wasm
--all-targets -- -D warnings` is clean.

BatchEntry's `#[derive(Deserialize)]` (lib.rs:811) is OUT of
scope per the R2 synthesis brief; defer as R2b follow-up only
if measurable (two String fields, no Option, expected <1 KB).

Refs #689 Β· refs #696 (R1 precedent)
WASM bundle size delta (pre-opt, `wasm-pack --release` with system
wasm-opt):

  pre-R2:  1,372,791 bytes (post-R1 baseline)
  post-R2: 1,365,032 bytes
  delta:   βˆ’7,759 bytes (βˆ’0.57%)

Above the R2 synthesis-brief 4 KB go/no-go gate. The estimate band
in the preflight was 5-8 KB net β€” the measured delta lands in the
lower half of that band, consistent with the preflight's caveat that
the dominant `HashMap<String, String>` Deserialize cost is hard to
predict exactly because of potential linker deduplication against
other String-keyed HashMap deserializers elsewhere in the binary.

`lint_10kb` bench (criterion, `cargo bench -p marque-engine --bench
lint_latency -- lint_10kb`):

  pre:  [422.35 Β΅s 428.08 Β΅s 434.97 Β΅s]
  post: [421.49 Β΅s 432.21 Β΅s 447.64 Β΅s]
  change: +0.45% median, p=0.88 β€” no statistically significant change.

Hot-path neutral, as expected β€” config parsing is once-per-call,
not on the hot path. Constitution Principle I satisfied (SC-001
16 ms ceiling intact; SC-001's threshold is 50Γ— the post-R2 median).

Also marks the R2 row in the candidate table as landed and inlines
an "f32 byte-identity gotcha" note for future serde-reduction work
β€” `Value::Number` is f64-only, so `Number::from_f64(threshold as
f64)` widens before formatting and emits the long-form
`"0.8500000238418579"` rather than the shortest-roundtrip `"0.85"`
that serde's `serialize_f32` produces. The fix is in commit 2; the
note in this commit makes the gotcha findable for the next person.

Refs #689 Β· refs #696 (R1 precedent)
Copilot AI review requested due to automatic review settings May 22, 2026 19:05
@bashandbone bashandbone requested a review from a team as a code owner May 22, 2026 19:05
@bashandbone bashandbone added the performance Throughput, latency, bounded work, caching, and SC-001/SC-005 perf gates label May 22, 2026
…ons guard + alphabetical-naming constraint cross-reference

Two LOW/MEDIUM nits from the 2-reviewer pass on R2 (#689 follow-up,
sort-consolidation companion to #696).

1. **rust-specialist nit** β€” `expect("corrections present")` at
   `build_cache_key` line 1100 relied on a separately-computed
   `corrections_present` boolean gate to prove unwrap-safety. Replaced
   with `if let Some(corrections) = cfg.corrections.as_ref().filter(|c| !c.is_empty())`
   β€” same semantic (only emit the `corrections` key when the map is
   present AND non-empty, matching the retired
   `#[serde(skip_serializing_if = "Option::is_none")]` shape on
   `WasmConfigCacheKey`), but the Some-and-non-empty proof is now
   structurally visible at the binding site rather than relying on a
   doc-comment to bridge the gap.

2. **general code-reviewer nit** β€” the `WasmConfig` struct
   doc-comment described the R2 retirement motive but did not surface
   the alphabetical-field-naming constraint that the
   `build_cache_key` doc-comment captures. A developer reading the
   struct doc first when adding a new field would not see the
   constraint until they ran the byte-identity tests (which catch
   the breakage at authorship, but the upfront warning is cheaper).
   Added a paragraph documenting the constraint with a cross-
   reference to `build_cache_key`.

Declined the LOW field-name-in-error-test pin: the assertion boundary
the existing 8 type-mismatch tests pin is loud-vs-silent, which IS
the API contract; field-name presence in the error string is
implementation-quality, not contract. Pinning it would lock the
present error wording (which we explicitly accepted dropping serde's
column-number context for) β€” keeping the boundary contractual rather
than wording-coupled. General reviewer flagged this as a judgment
call.

cargo +stable clippy --workspace --all-targets --all-features -- -D warnings: green
cargo test -p marque-wasm: 113 tests green (including all 19 R2 unit tests)
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR continues the WASM bundle-size reduction work (#689 follow-up) by removing serde derive-based config parsing/serialization from marque-wasm’s parse_wasm_config path, replacing it with manual serde_json::Value extraction and direct serde_json::Map cache-key construction while preserving cache-key byte identity via new regression tests.

Changes:

  • Replace #[derive(Deserialize)] WasmConfig parsing with explicit Value-based field extraction (wasm_config_from_value) and improved field-scoped type-mismatch errors.
  • Remove WasmConfigCacheKey + #[derive(Serialize)] and build the cache key via direct serde_json::Map construction (including an f32 formatting workaround to preserve byte identity).
  • Add extensive unit tests to pin cache-key byte identity, loud failure on type mismatches, and silent-ignore behavior for unknown fields.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
crates/wasm/src/lib.rs Refactors WASM config parsing/cache-key construction away from serde derives; adds helpers + regression tests to preserve cache semantics and behavior.
claudedocs/wasm-bundle-analysis-2026-05-22.md Updates investigation notes with measured R2 results and details of the landed approach.

πŸ’‘ Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/wasm/src/lib.rs Outdated
Comment on lines +839 to +842
/// explicit `obj.get(...)` form additionally tightens the
/// runtime-config surface relative to the derive (the closed
/// accept-list is enumerated at the call site, not implied by the
/// struct shape).
Comment thread crates/wasm/src/lib.rs
Comment on lines +1117 to +1124
let mut corrections_map = serde_json::Map::new();
for (k, v) in corrections {
corrections_map.insert(k.clone(), serde_json::Value::String(v.clone()));
}
map.insert(
"corrections".to_owned(),
serde_json::Value::Object(corrections_map),
);
Comment thread crates/wasm/src/lib.rs Outdated
// `#[derive(Serialize)] WasmConfigCacheKey` path (which used
// `serialize_f32` directly). The format-then-parse roundtrip
// here is cheap (one short alloc, one parse of a known-valid
// numeric literal) and runs only on engine-cache-miss.
 empirically rejected)

Copilot's PR #697 review surfaced three inline comments:

**#1 (lib.rs:842) β€” accepted**: doc-comment claimed R2 "tightens the
runtime-config surface relative to the derive". Copilot correctly
pointed out that the pre-R2 `#[derive(Deserialize)]` on
`Option<String>` / `Option<f32>` / `Option<HashMap<String,String>>` /
`Option<f64>` was ALREADY type-strict and ALREADY had a closed
accepted-field set. There is no concrete pre-R2 case that was
accepted and is now rejected. The synthesis brief AND the rust-
specialist review both misclaimed "tightening"; Copilot caught it.
Reworded the `WasmConfig` doc-comment to state the post-R2 surface
matches the pre-R2 Derive exactly β€” same four field names, same
`Option<T>` shape, same loud-failure-on-mismatch, same silent-ignore.
The R2 change is structural (binary-size cut + improved error
messages with field names), NOT a tightening.

**#3 (lib.rs:1104) β€” accepted**: f32 format-parse roundtrip comment
claimed it "runs only on engine-cache-miss" but `build_cache_key`
runs on every `parse_wasm_config` call (which is every
lint_native / fix_native call), regardless of whether the resulting
cache key hits or misses the engine cache. Corrected the comment.

**#2 (lib.rs:1124) β€” empirically REJECTED**: Copilot suggested
replacing the `serde_json::Map`-with-cloned-keys path with a
borrow-preserving `Vec<(&str,&str)>` sort + direct JSON string
construction, to eliminate per-call `corrections` key/value clones.
The suggestion is theoretically correct on runtime: O(n) String
clones become O(n) borrow-preserving sort comparisons.

But empirically β€” measured locally with
`wasm-pack build --target web --profile release-web` (CI-gate
profile) β€” the borrow-preserving rewrite REGRESSED WASM bundle
size from 1,364,312 B (original R2) to 1,370,786 B (with Copilot's
suggested rewrite). That's a +6,474 B regression β€” would erase 83%
of R2's βˆ’7,750 B binary win.

The mechanism is the same LTO+ICF lesson captured in
PR #696 (R1's `sorted_entries` revert at
`crates/capco/src/lattice.rs:249-278`): the `serde_json::Map` /
`Value::Object` / `to_string::<Value>` path REUSES serde_json
machinery already monomorphized for the Diagnostic JSON output side
of `marque-wasm`. The direct-String-construction path forces fresh
monomorphizations of `serde_json::to_string::<str>` plus a longer
push-based code path. Theoretical clone-count "wins" don't beat
empirical binary-size measurements.

The runtime cost Copilot flagged is real but small: typical
corrections maps in marque usage are 5-20 entries; per-call cost
is 10-40 short String allocations, in the noise relative to the
~400 Β΅s lint hot path. The binary savings (6.5 KB) dwarf the
runtime cost. Trade favors keeping the original implementation.

Documented the trade in the `build_cache_key` doc-comment with a
cross-reference to PR #696's empirical-revert precedent β€” future
maintainers see WHY the seemingly-suboptimal clones are preserved.

Also brought along: structural Some-and-non-empty proof via
`.filter()` (rust-specialist nit from review fix-up b974550)
stays in place; that one didn't depend on the body shape Copilot
suggested.

Empirical measurements (wasm-pack --profile release-web, local):
- staging baseline: 1,372,062 B
- original R2 (b974550, serde_json::Map): 1,364,312 B (βˆ’7,750)
- Copilot #2 fix (direct String): 1,370,786 B (βˆ’1,276) ← rejected
- this commit (revert + doc fixes): 1,364,386 B (βˆ’7,676)

cargo +stable clippy --workspace --all-targets --all-features -- -D warnings: green
cargo test -p marque-wasm --lib r2_: 19/19 green (byte-identity preserved)
@bashandbone
Copy link
Copy Markdown
Collaborator Author

Thanks @copilot for the three inline comments β€” all three were on-target. Disposition + commit b41947a9:

#1 (lib.rs:842) β€” ACCEPTED. You were right. The pre-R2 #[derive(Deserialize)] on Option<String> / Option<f32> / Option<HashMap<String,String>> / Option<f64> was already type-strict with a closed accepted-field set; there's no concrete pre-R2 case that was accepted and is now rejected. The "tightening" claim was a misreading that survived the synthesis brief AND the rust-specialist review without being caught. Reworded the WasmConfig doc-comment to state the post-R2 surface matches the pre-R2 Derive exactly β€” same field names, same Option<T> shape, same loud-failure-on-mismatch, same silent-ignore. The R2 change is structural (binary-size cut + improved error messages with field names), NOT a tightening.

#3 (lib.rs:1104) β€” ACCEPTED. Comment correction. build_cache_key runs on every parse_wasm_config call (which is every lint_native / fix_native call), not only on engine-cache miss. Fixed.

#2 (lib.rs:1124) β€” empirically REJECTED with documented rationale. Your suggestion is theoretically correct on runtime (eliminate O(n) String clones via borrow-preserving Vec<(&str,&str)> + direct JSON construction). I implemented it and measured on wasm-pack build --target web --profile release-web (CI-gate-comparable profile):

approach WASM size delta vs staging
staging baseline 1,372,062 B 0
original R2 (serde_json::Map-with-clones) 1,364,312 B βˆ’7,750 B
your suggested rewrite (direct String construction) 1,370,786 B βˆ’1,276 B
current commit (revert + doc fixes) 1,364,386 B βˆ’7,676 B

The borrow-preserving rewrite REGRESSED the bundle by +6,474 B β€” would erase 83% of R2's binary win. The mechanism is the same LTO+ICF lesson captured in PR #696 (R1's sorted_entries revert): the serde_json::Map / Value::Object / to_string::<Value> path reuses serde_json machinery already monomorphized for the Diagnostic JSON output side of this crate; the direct-String approach forces fresh monomorphizations of serde_json::to_string::<str> plus a longer push-based code path.

The runtime cost you flagged is real but small in practice β€” typical corrections maps in marque usage are 5-20 entries; per-call cost is 10-40 short String allocations, in the noise relative to the ~400 Β΅s lint_10kb hot path. The binary savings (6.5 KB) dwarf the runtime cost. The trade-off was documented in the build_cache_key doc-comment with a cross-reference to PR #696's empirical-revert precedent so future maintainers see WHY the seemingly-suboptimal clones are preserved.

cargo +stable clippy --workspace -- -D warnings: green. cargo test -p marque-wasm: 19/19 R2 byte-identity tests green.

@bashandbone bashandbone merged commit 4ad5ee1 into staging May 22, 2026
13 checks passed
@bashandbone bashandbone deleted the refactor/r2-wasm-config-serde-reduction branch May 22, 2026 19:30
bashandbone added a commit that referenced this pull request May 24, 2026
…p) (#697)

fixtures for parse_wasm_config (#689)
fix(wasm): #689 R2 review fix-up β€” structurally self-proving corrections guard + alphabetical-naming constraint cross-reference
fix(wasm): #689 R2 review fix-up β€” Copilot doc-comment corrections (+ #2 empirically rejected)
bashandbone added a commit that referenced this pull request May 24, 2026
…p) (#697)

fixtures for parse_wasm_config (#689)
fix(wasm): #689 R2 review fix-up β€” structurally self-proving corrections guard + alphabetical-naming constraint cross-reference
fix(wasm): #689 R2 review fix-up β€” Copilot doc-comment corrections (+ #2 empirically rejected)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Throughput, latency, bounded work, caching, and SC-001/SC-005 perf gates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf(wasm): investigate +5.18% WASM binary regression vs. baseline (cumulative across staging window)

2 participants