Skip to content

x.json2: decode[T] is ~30% slower than cJSON json.decode on small nested structs (567 ms vs 437 ms / 1M iters); 2× speedup patch ready #26911

@enghitalo

Description

@enghitalo

Describe the bug

x.json2.decode for a small nested struct is ~30% slower than json.decode (the legacy cJSON-based encoder) on master. Same shape: 1 outer field, 1 string field, 1 inner struct with 2 fields. 1 000 000 iterations on the same machine, -prod -cc gcc.

SPENT  567.940 ms  json2.decode[Stru](json_data)!         ← x.json2 on master
SPENT  437.685 ms  old_json.decode(Stru, json_data)!      ← cJSON
                   ratio: x.json2 ≈ 1.30× cJSON

The x.json2 README states the goal is "as fast as or faster than cJSON". Today the gap is largest on the most common shape (small nested structs), which is the dominant workload for any service decoding tens of thousands of small JSON payloads per second.

The benchmark script bench.v already exists under x.json2's test folder. The numbers above are reproducible by anyone with ./v -prod -cc gcc crun <path to bench.v> after a ./v wipe-cache.

V code (bench.v excerpt, the same file shipped under x.json2 tests):

import x.json2
import json as old_json
import benchmark

const max_iterations = 1_000_000

pub struct Stru {
	val  int
	val2 string
	val3 Stru2
}

pub struct Stru2 {
	a         int
	churrasco string
}

fn main() {
	json_data := '{"val": 1, "val2": "lala", "val3": {"a": 2, "churrasco": "leleu"}}'
	mut b := benchmark.start()
	for _ in 0 .. max_iterations { _ := json2.decode[Stru](json_data)! }
	b.measure('json2.decode[Stru]')
	for _ in 0 .. max_iterations { _ := old_json.decode(Stru, json_data)! }
	b.measure('old_json.decode(Stru)')
}

C backend result (root cause):

// The decoder keeps a heap-allocated singly-linked list (LinkedList[ValueInfo])
// of every value in the JSON. Each Node is a small heap allocation, freed at end
// of decode. For a 4-key payload that is 4 mallocs+frees per call.
// On top of that, the inline non-embed struct decoder builds a fresh
// LinkedList[StructFieldInfo] *per decode call*, with one heap-allocated node
// per struct field, then walks it with pointer chasing. For Stru that is
// another 4 mallocs + 4 frees + 1 list free per call.
// cJSON parses into an arena and lets the wrapper pull pointers directly, so
// it pays a single arena alloc per call instead of 8 small ones.

Reproduction Steps

./v wipe-cache
./v -prod -cc gcc crun vlib/x/json2/tests/bench.v

Expected Behavior

SPENT  ~440 ms  json2.decode[Stru](json_data)!     # parity with cJSON or better
SPENT  ~440 ms  old_json.decode(Stru, json_data)!

Current Behavior (master)

SPENT  567.940 ms  json2.decode[Stru](json_data)!     # ~30% slower than cJSON
SPENT  437.685 ms  old_json.decode(Stru, json_data)!
SPENT  607.027 ms  json2.decode[SumTypes](json_data)! # ~28% slower than cJSON
SPENT  475.774 ms  old_json.decode(SumTypes, json_data)!

(Other shapes — top-level array of int, map[string]string, StructTypeOption[string] — are at parity or already faster than cJSON.)

Possible Solution

The hot path for nested structs is dominated by per-node allocation in LinkedList[ValueInfo] and per-call construction of LinkedList[StructFieldInfo] (one heap node per struct field, every decode). Three orthogonal changes that compound:

  1. Per-T cached StructFieldInfo + array iteration. The non-embed inline struct decoder used to (a) allocate a LinkedList[StructFieldInfo] node per struct field per decode call, (b) walk it with pointer chasing, and (c) free it. Build it once per type via a cached_struct_field_infos[T]() static — same pattern as cached_field_infos already used in the encoder — and iterate by index over a contiguous slice. The mutable is_decoded flag can be extracted into a per-call u64 bitmask (no allocation; up to 64 fields, with overflow to []bool for the rare wider struct). For the Stru benchmark this removes 4 mallocs + 4 frees + 1 array free per call; against 1M iterations that is 9M GC ops eliminated.

  2. decode_string no-escape fast path. When the JSON string body contains no \, return decoder.json[pos+1..pos+length-1] directly — a string-header slice, no body copy. The presence of an escape can be checked with a single C.memchr call.

  3. (Tried but not worth it.) A bump-allocator arena for Node[ValueInfo] (single malloc(N * sizeof(Node)) instead of N small &Node{}) regressed by ~10% in local testing, because Boehm GC's small-object freelist serves GC_MALLOC(32) faster than one GC_MALLOC(1600), and the per-call setup/teardown of the arena (plus the predicted-not-taken branch in the push) adds more cost than it saves. Lesson: under Boehm GC, arena allocators do not win for small short-lived objects — keep the lazy small-allocations.

I have changes (1) and (2) ready as a patch; happy to send a PR. Local measurement of the patched build:

Benchmark master patched Δ vs master vs cJSON
Stru (nested struct) 567 ms 274 ms 2.07× faster 1.57× faster than cJSON
SumTypes (nested) 607 ms 323 ms 1.88× faster 1.41× faster than cJSON
StructType[string] 111 ms 87 ms 1.28× 1.30× faster than cJSON
StructTypeOption[string] 135 ms 104 ms 1.30× 1.40× faster than cJSON
StructType[int] 132 ms 113 ms 1.17× 1.21× faster than cJSON
map[string]string 180 ms 159 ms 1.13× 1.40× faster than cJSON
string (single value) 71 ms 48 ms 1.48× n/a
StringAlias 71 ms 47 ms 1.51× n/a

All 57 x.json2 tests still pass on the patched build.

Additional Information/Context

x.json2's bench.v already exercises the worst case at 1M iterations. The two slow cases dominate the budget for any service that decodes tens of thousands of small JSON payloads per second; they should be the first targets.

V version

V 0.5.1 1b3385cc34ff783e793d1a26a8ec5be587c80fe0.40b3711

Environment details (OS name and version, etc.)

|V full version      |V 0.5.1 1b3385cc34ff783e793d1a26a8ec5be587c80fe0.40b3711
|:-------------------|:-------------------
|OS                  |linux, Ubuntu 24.04 LTS
|Processor           |16 cpus, 64bit, little endian, AMD Ryzen 7 5800H with Radeon Graphics
|Memory              |8.17GB/30.7GB
|                    |
|V executable        |/home/hitalo/Documents/v/v
|V last modified time|2026-04-18 09:18:00
|                    |
|V home dir          |OK, value: /home/hitalo/Documents/v
|VMODULES            |OK, value: /home/hitalo/.vmodules
|VTMP                |OK, value: /tmp/v_1000
|Current working dir |OK, value: /home/hitalo/Documents/v
|                    |
|Git version         |git version 2.43.0
|V git status        |0.5.1-1006-g40b3711b-dirty
|.git/config present |true
|                    |
|cc version          |cc (GCC) 14.2.0
|gcc version         |gcc (GCC) 14.2.0
|clang version       |Ubuntu clang version 18.1.3 (1)
|tcc version         |tcc version 0.9.28rc 2025-02-13 HEAD@f8bd136d (x86_64 Linux)
|tcc git status      |thirdparty-linux-amd64 696c1d84
|emcc version        |emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.6 ()
|glibc version       |ldd (Ubuntu GLIBC 2.39-0ubuntu8.3) 2.39

Note

You can use the 👍 reaction to increase the issue's priority for developers.

Please note that only the 👍 reaction to the issue itself counts as a vote.
Other reactions and those to comments will not be taken into account.

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugThis tag is applied to issues which reports bugs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions