Skip to content

feat(ci): Initial benchmark support for original C++ implementation#54

Merged
PyDataBlog merged 7 commits intomainfrom
feat/add-cplus-to-benches
Nov 10, 2025
Merged

feat(ci): Initial benchmark support for original C++ implementation#54
PyDataBlog merged 7 commits intomainfrom
feat/add-cplus-to-benches

Conversation

@PyDataBlog
Copy link
Owner


name: Pull Request
about: Propose a change to the project
title: "feat(benchmarks): integrate upstream SimString CLI"
labels: "testing"
assignees: ""


Description

Add the original SimString C++ CLI to the benchmark suite. The new Python harness clones/builds chokkan/simstring under benches/.simstring_cpp, benchmarks insert/search workloads over the existing corpus, and integrates with benches/run_benches.py. CI now installs autoconf, automake, and libtool so this build works.

Fixes (#52)

Type of change

  • feat(benchmarks): Enabled benchmark support for original C++ implementation.

How Has This Been Tested?

  • python benches/bench_simstring_cpp.py

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@PyDataBlog PyDataBlog added the testing Covers all testing related changes label Nov 9, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 9, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 41.97%. Comparing base (005b961) to head (dcc39d2).

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #54   +/-   ##
=======================================
  Coverage   41.97%   41.97%           
=======================================
  Files          15       15           
  Lines        2375     2375           
=======================================
  Hits          997      997           
  Misses       1378     1378           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Nov 9, 2025

Benchmark Results

This file is automatically generated by the CI. Do not edit manually.

System Specifications

  • OS: Linux 6.11.0-1018-azure
  • Architecture: x86_64
  • CPU Model: x86_64
  • CPU Cores: 4 logical, 2 physical
  • Memory: 15.62 GB

Insert Benchmark

language backend ngram_size mean stddev iterations
c++ simstring (C++ CLI) 2 71.5249 0.91662 100
c++ simstring (C++ CLI) 3 97.1021 2.97785 100
c++ simstring (C++ CLI) 4 115.707 3.68022 100
julia SimString.jl 2 89.6099 29.5446 100
julia SimString.jl 3 104.74 28.3815 100
julia SimString.jl 4 114.751 32.0284 100
python simstring (C++ python bindings) 2 64.394 1.43102 100
python simstring (C++ python bindings) 3 86.4413 3.6484 100
python simstring (C++ python bindings) 4 98.8328 3.81764 100
python simstring-fast 2 86.2836 2.4614 100
python simstring-fast 3 102.974 3.22975 100
python simstring-fast 4 112.715 3.36102 100
python simstring-rust (python bindings) 2 36.4171 0.561802 100
python simstring-rust (python bindings) 3 43.5453 0.816044 100
python simstring-rust (python bindings) 4 45.9507 1.36244 100
ruby simstring-pure 2 738.777 14.8764 28
ruby simstring-pure 3 814.451 23.6484 25
ruby simstring-pure 4 939.394 36.4496 22
rust simstring-rust (native) 2 34.4076 0.678646 100
rust simstring-rust (native) 3 40.8376 1.57575 100
rust simstring-rust (native) 4 42.6333 2.11043 100

Search Benchmark

language backend ngram_size threshold mean stddev iterations
c++ simstring (C++ CLI) 2 0.6 16.1748 0.223858 100
c++ simstring (C++ CLI) 2 0.7 10.7561 0.103319 100
c++ simstring (C++ CLI) 2 0.8 6.71582 0.0935537 100
c++ simstring (C++ CLI) 2 0.9 4.20622 0.0673734 100
c++ simstring (C++ CLI) 3 0.6 11.1191 0.484448 100
c++ simstring (C++ CLI) 3 0.7 8.24548 0.14097 100
c++ simstring (C++ CLI) 3 0.8 5.94437 0.0796804 100
c++ simstring (C++ CLI) 3 0.9 4.32262 0.122641 100
c++ simstring (C++ CLI) 4 0.6 10.2589 0.331635 100
c++ simstring (C++ CLI) 4 0.7 8.00865 0.104476 100
c++ simstring (C++ CLI) 4 0.8 6.15461 0.221608 100
c++ simstring (C++ CLI) 4 0.9 4.54396 0.168994 100
julia SimString.jl 2 0.6 349.378 16.7423 58
julia SimString.jl 2 0.7 218.825 6.75047 92
julia SimString.jl 2 0.8 112.939 4.51566 100
julia SimString.jl 3 0.6 279.041 7.57592 72
julia SimString.jl 3 0.7 179.348 4.96259 100
julia SimString.jl 3 0.8 103.782 5.06361 100
julia SimString.jl 4 0.6 250.428 6.19861 80
julia SimString.jl 4 0.7 168.893 4.20297 100
julia SimString.jl 4 0.8 98.6219 4.78113 100
python simstring (C++ python bindings) 2 0.6 14.4109 0.176258 100
python simstring (C++ python bindings) 2 0.7 9.09799 0.134132 100
python simstring (C++ python bindings) 2 0.8 4.92918 0.0727495 100
python simstring (C++ python bindings) 2 0.9 2.48574 0.0819627 100
python simstring (C++ python bindings) 3 0.6 9.76464 0.366049 100
python simstring (C++ python bindings) 3 0.7 6.81005 0.123567 100
python simstring (C++ python bindings) 3 0.8 4.3008 0.117984 100
python simstring (C++ python bindings) 3 0.9 2.64878 0.0894635 100
python simstring (C++ python bindings) 4 0.6 8.50096 0.441493 100
python simstring (C++ python bindings) 4 0.7 6.30265 0.0742264 100
python simstring (C++ python bindings) 4 0.8 4.36665 0.0894134 100
python simstring (C++ python bindings) 4 0.9 2.76477 0.0821726 100
python simstring-fast 2 0.6 53.5003 2.38209 100
python simstring-fast 2 0.7 32.1195 0.997588 100
python simstring-fast 2 0.8 18.6819 0.510077 100
python simstring-fast 2 0.9 7.00006 0.299571 100
python simstring-fast 3 0.6 45.254 3.65018 100
python simstring-fast 3 0.7 25.5861 1.15617 100
python simstring-fast 3 0.8 14.2903 0.816197 100
python simstring-fast 3 0.9 6.59539 0.174773 100
python simstring-fast 4 0.6 42.2937 4.02068 100
python simstring-fast 4 0.7 25.2046 0.833793 100
python simstring-fast 4 0.8 14.0646 0.613123 100
python simstring-fast 4 0.9 6.54571 0.0614608 100
python simstring-rust (python bindings) 2 0.6 11.7293 0.665155 100
python simstring-rust (python bindings) 2 0.7 7.57101 0.519667 100
python simstring-rust (python bindings) 2 0.8 4.54977 0.420666 100
python simstring-rust (python bindings) 2 0.9 2.97602 0.39078 100
python simstring-rust (python bindings) 3 0.6 9.15444 0.443932 100
python simstring-rust (python bindings) 3 0.7 6.058 0.565477 100
python simstring-rust (python bindings) 3 0.8 3.77045 0.213606 100
python simstring-rust (python bindings) 3 0.9 2.87338 0.0618079 100
python simstring-rust (python bindings) 4 0.6 7.55724 0.501171 100
python simstring-rust (python bindings) 4 0.7 5.32703 0.442301 100
python simstring-rust (python bindings) 4 0.8 3.55854 0.0867106 100
python simstring-rust (python bindings) 4 0.9 2.96202 0.092401 100
ruby simstring-pure 2 0.6 533.806 5.15734 38
ruby simstring-pure 2 0.7 325.868 7.4789 62
ruby simstring-pure 2 0.8 169.787 4.64717 100
ruby simstring-pure 3 0.6 438.891 9.71955 46
ruby simstring-pure 3 0.7 271.505 2.90389 74
ruby simstring-pure 3 0.8 155.317 1.9267 100
ruby simstring-pure 4 0.6 441.887 5.15309 46
ruby simstring-pure 4 0.7 282.016 2.95381 71
ruby simstring-pure 4 0.8 163.262 2.65504 100
rust simstring-rust (native) 2 0.6 11.6398 0.454905 100
rust simstring-rust (native) 2 0.7 7.2812 0.161805 100
rust simstring-rust (native) 2 0.8 4.34275 0.119543 100
rust simstring-rust (native) 2 0.9 2.53474 0.0395065 100
rust simstring-rust (native) 3 0.6 8.81237 0.232543 100
rust simstring-rust (native) 3 0.7 5.50207 0.128497 100
rust simstring-rust (native) 3 0.8 3.47275 0.112816 100
rust simstring-rust (native) 3 0.9 2.40889 0.0783958 100
rust simstring-rust (native) 4 0.6 6.70294 0.166849 100
rust simstring-rust (native) 4 0.7 4.53143 0.0871673 100
rust simstring-rust (native) 4 0.8 3.08084 0.0382106 100
rust simstring-rust (native) 4 0.9 2.44295 0.0817026 100

@PyDataBlog PyDataBlog self-assigned this Nov 10, 2025
Copy link
Contributor

@icfly2 icfly2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Way to go.

@PyDataBlog
Copy link
Owner Author

Very cool! Way to go.

@icfly2 Hmm AFAIR, the original one doesn't do ranked searches right? if not, then the benchmark punishes the other implementations are they are judged on ranked searches.

@PyDataBlog
Copy link
Owner Author

Very cool! Way to go.

@icfly2 Hmm AFAIR, the original one doesn't do ranked searches right? if not, then the benchmark punishes the other implementations are they are judged on ranked searches.

Ok looks like C++ doesn't do ranked search so will switch all benchmark search to generic one for fairer comparison: https://www.chokkan.org/software/simstring/

@PyDataBlog PyDataBlog merged commit a3e32fe into main Nov 10, 2025
38 checks passed
@PyDataBlog PyDataBlog deleted the feat/add-cplus-to-benches branch November 10, 2025 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Covers all testing related changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants