Skip to content

jokruger/kavun-benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Kavun Benchmark

Comparison of embeddable scripting engines for Go, focused on the repeated-execution use case (game NPC logic, smart contracts, rule engines) where the host application compiles a script once and then runs it many times.

Compilation cost is intentionally excluded from measurements.

Engines

Note on Kavun vs Kavun0

Kavun is optimized for short-lived scripts with a small number of variables and assignments. Internally it uses an arena allocator with pre-allocated, typed value buffers (decimals, times, strings, arrays, dicts, iterators, compiled/builtin functions, etc.). For workloads that fit inside those pre-allocated capacities, value construction is effectively zero-allocation — the arena hands out slots from a pool that gets reset between runs rather than asking the Go heap.

The two Kavun benchmarks isolate the impact of that pool:

  • Kavun uses core.NewArena(nil), i.e. core.DefaultArenaOptions() — the default pre-allocated capacities.
  • Kavun0 sets every per-type capacity in core.ArenaOptions to 0 (decimals, times, bytes, runes, arrays, builtin/compiled functions, error/string/runes/bytes/array/dict values, int-range values, and all iterator pools). With zero capacity the arena cannot satisfy allocations from its pool, so every value falls through to the Go heap.

Comparing the two columns shows how much of Kavun's throughput on each task comes from the arena pool versus the core VM and bytecode design.

Tasks

Tasks are declared once in task/task.go (name + expected value + description). Each engine implements whichever subset it can; missing tasks are penalized in the summary.

Task Stresses
fib Recursion, function-call overhead (exponential call tree)
fib_tail Function-call overhead at linear depth (tail-recursion shape)
loop_sum Loops, integer arithmetic
sum_pow Loops + integer multiplication on the hot path
closure_counter Closure creation + repeated invocation of one closure
closures_iife Repeated closure creation-and-invoke per iteration
string_concat String allocation via repeated += concatenation
string_repeat String building via the language's idiomatic repeat builtin
str_contains Repeated multi-character substring search
array_dot Array creation, indexed access

Running

make all                       # bench + report
make bench COUNT=5             # 5 runs per (task, engine), into results/raw.txt
make report PENALTY=2.0        # parse results/raw.txt -> results/REPORT.md

Methodology

  • go test -bench with -benchmem. Sub-bench naming: Benchmark<Engine>/<task> — the report parses the engine name from the top-level benchmark identifier and the task from the sub-bench label.
  • Each (engine, task) pair is validated for correctness once before the timed loop starts (via task.Validate).
  • Compilation happens before b.ResetTimer(); the hot loop is the per-iteration script execution.

Aggregation

Per-task tables are sorted by ns/op ascending, with a vs best ratio relative to the fastest engine on that task.

The summary uses two complementary metrics, since tasks are not equally expensive and a plain average across tasks would be dominated by the slowest:

  • CPU geomean — geometric mean of per-task ns/op ratios. Lower is better; 1.00× means "fastest on every task".
  • Avg rank — mean rank across tasks. Robust but discards magnitude.

We also report worst-case ratio, win count (rank-1), and a memory geomean.

Missing-task penalty

If an engine doesn't implement a task, the report still includes the engine in that task's table but marks it _missing_ and substitutes a penalty for both the geomean and the avg-rank computation:

  • penalty ratio = (worst observed ratio on this task) × -penalty (default 2.0)
  • penalty rank = (number of engines that ran this task) + 1

This means skipping an easy task (where everyone is close to the best) costs less than skipping a hard one (where the worst engine is already far from the best) — proportional to the difficulty of the task.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors