Skip to content

feat(perf): better search performance by avoiding allocations#40

Merged
PyDataBlog merged 9 commits intomainfrom
feat/optimise-similarity-search
Aug 6, 2025
Merged

feat(perf): better search performance by avoiding allocations#40
PyDataBlog merged 9 commits intomainfrom
feat/optimise-similarity-search

Conversation

@PyDataBlog
Copy link
Owner

@PyDataBlog PyDataBlog commented Aug 6, 2025


name: Pull Request
about: Propose a change to the project
title: "feat(perf): Initial optimization round for faster searching"
labels: ""
assignees: ""


Description

Initial refactor aimed at reducing and/or eliminating unnecessary allocations during searches.

Fixes (#35)

Type of change

  • feat(perf): Initial optimization round for faster searching.

How Has This Been Tested?

  • Added test for python integration

Checklist:

  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • Any dependent changes have been merged and published in downstream modules

@PyDataBlog PyDataBlog self-assigned this Aug 6, 2025
@codecov-commenter
Copy link

codecov-commenter commented Aug 6, 2025

Codecov Report

❌ Patch coverage is 96.94656% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 40.06%. Comparing base (5e3efe2) to head (78e34aa).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/search.rs 95.00% 2 Missing ⚠️
src/extractors/character_ngrams.rs 94.73% 1 Missing ⚠️
src/python/mod.rs 92.30% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #40       +/-   ##
===========================================
- Coverage   60.75%   40.06%   -20.69%     
===========================================
  Files          11       14        +3     
  Lines         372     2291     +1919     
===========================================
+ Hits          226      918      +692     
- Misses        146     1373     +1227     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@github-actions
Copy link

github-actions bot commented Aug 6, 2025

Benchmark Results

This file is automatically generated by the CI. Do not edit manually.

System Specifications

  • OS: Linux 6.11.0-1018-azure
  • Architecture: x86_64
  • CPU Model: x86_64
  • CPU Cores: 4 logical, 2 physical
  • Memory: 15.62 GB

Insert Benchmark

language backend ngram_size mean stddev iterations
julia SimString.jl 2 76.9386 19.6718 100
julia SimString.jl 3 90.8907 22.3992 100
julia SimString.jl 4 107.732 29.2409 100
python simstring-fast 2 85.3774 2.27358 100
python simstring-fast 3 98.7552 2.37727 100
python simstring-fast 4 108.241 2.60492 100
python simstring-rust (python bindings) 2 35.9795 0.405548 100
python simstring-rust (python bindings) 3 42.291 0.33357 100
python simstring-rust (python bindings) 4 43.2698 0.367686 100
ruby simstring-pure 2 631.042 7.7718 32
ruby simstring-pure 3 687.796 6.2886 30
ruby simstring-pure 4 785.584 11.1856 26
rust simstring-rust (native) 2 33.3637 0.381579 100
rust simstring-rust (native) 3 38.4524 0.868538 100
rust simstring-rust (native) 4 39.7485 0.370196 100

Search Benchmark

language backend ngram_size threshold mean stddev iterations
julia SimString.jl 2 0.6 345.144 2.99869 58
julia SimString.jl 2 0.7 218.76 2.28454 92
julia SimString.jl 2 0.8 120.483 1.90664 100
julia SimString.jl 3 0.6 282.851 2.68809 71
julia SimString.jl 3 0.7 187.53 3.04389 100
julia SimString.jl 3 0.8 110.457 2.24817 100
julia SimString.jl 4 0.6 260.031 3.11879 77
julia SimString.jl 4 0.7 177.2 2.26256 100
julia SimString.jl 4 0.8 104.996 2.3391 100
python simstring-fast 2 0.6 99.5301 1.83743 100
python simstring-fast 2 0.7 45.5961 0.219347 100
python simstring-fast 2 0.8 19.8198 0.078477 100
python simstring-fast 2 0.9 8.68455 0.0711747 100
python simstring-fast 3 0.6 74.9283 3.09744 100
python simstring-fast 3 0.7 33.7705 0.23569 100
python simstring-fast 3 0.8 16.7784 0.0608738 100
python simstring-fast 3 0.9 8.85469 0.0617427 100
python simstring-fast 4 0.6 62.987 4.13839 100
python simstring-fast 4 0.7 31.3399 0.608233 100
python simstring-fast 4 0.8 16.3525 0.125039 100
python simstring-fast 4 0.9 9.15024 0.0397023 100
python simstring-rust (python bindings) 2 0.6 14.3291 0.921432 100
python simstring-rust (python bindings) 2 0.7 9.01846 0.499885 100
python simstring-rust (python bindings) 2 0.8 5.40684 0.322296 100
python simstring-rust (python bindings) 2 0.9 3.46745 0.182385 100
python simstring-rust (python bindings) 3 0.6 12.4499 0.762032 100
python simstring-rust (python bindings) 3 0.7 8.43156 0.821714 100
python simstring-rust (python bindings) 3 0.8 5.09983 0.16104 100
python simstring-rust (python bindings) 3 0.9 3.53079 0.0855533 100
python simstring-rust (python bindings) 4 0.6 12.2162 0.679269 100
python simstring-rust (python bindings) 4 0.7 8.69231 0.595878 100
python simstring-rust (python bindings) 4 0.8 5.5803 0.118498 100
python simstring-rust (python bindings) 4 0.9 3.83781 0.109046 100
ruby simstring-pure 2 0.6 790.537 8.83367 26
ruby simstring-pure 2 0.7 385.36 2.08661 52
ruby simstring-pure 2 0.8 177.719 2.9818 100
ruby simstring-pure 3 0.6 619.04 4.51561 33
ruby simstring-pure 3 0.7 306.536 1.61813 66
ruby simstring-pure 3 0.8 157.667 1.35844 100
ruby simstring-pure 4 0.6 560.841 3.20314 36
ruby simstring-pure 4 0.7 302.575 1.1942 67
ruby simstring-pure 4 0.8 162.244 1.12586 100
rust simstring-rust (native) 2 0.6 13.3813 0.182399 100
rust simstring-rust (native) 2 0.7 8.29851 0.062837 100
rust simstring-rust (native) 2 0.8 4.91053 0.119579 100
rust simstring-rust (native) 2 0.9 2.91681 0.0546593 100
rust simstring-rust (native) 3 0.6 12.1003 0.245566 100
rust simstring-rust (native) 3 0.7 8.06063 0.196348 100
rust simstring-rust (native) 3 0.8 4.88857 0.0506469 100
rust simstring-rust (native) 3 0.9 3.04975 0.0481216 100
rust simstring-rust (native) 4 0.6 12.1171 0.332807 100
rust simstring-rust (native) 4 0.7 8.71218 0.0927642 100
rust simstring-rust (native) 4 0.8 5.46905 0.102869 100
rust simstring-rust (native) 4 0.9 3.29832 0.0923071 100

@github-actions
Copy link

github-actions bot commented Aug 6, 2025

Benchmark Results

This file is automatically generated by the CI. Do not edit manually.

System Specifications

  • OS: Linux 6.11.0-1018-azure
  • Architecture: x86_64
  • CPU Model: x86_64
  • CPU Cores: 4 logical, 2 physical
  • Memory: 15.62 GB

Insert Benchmark

language backend ngram_size mean stddev iterations
julia SimString.jl 2 77.6692 19.8978 100
julia SimString.jl 3 89.4087 20.3067 100
julia SimString.jl 4 108.858 25.2067 100
python simstring-fast 2 87.9563 3.58368 100
python simstring-fast 3 102.117 2.95784 100
python simstring-fast 4 111.084 3.70894 100
python simstring-rust (python bindings) 2 36.2493 0.361336 100
python simstring-rust (python bindings) 3 42.6312 0.717482 100
python simstring-rust (python bindings) 4 43.7351 0.758281 100
ruby simstring-pure 2 636.657 9.33667 32
ruby simstring-pure 3 697.446 10.4397 29
ruby simstring-pure 4 798.557 14.5165 26
rust simstring-rust (native) 2 34.4462 0.498537 100
rust simstring-rust (native) 3 39.5485 0.499606 100
rust simstring-rust (native) 4 40.3085 0.380062 100

Search Benchmark

language backend ngram_size threshold mean stddev iterations
julia SimString.jl 2 0.6 367.945 4.01445 55
julia SimString.jl 2 0.7 235.506 3.65399 85
julia SimString.jl 2 0.8 129.644 2.27017 100
julia SimString.jl 3 0.6 303.701 4.08859 66
julia SimString.jl 3 0.7 201.44 3.86126 100
julia SimString.jl 3 0.8 117.837 2.91487 100
julia SimString.jl 4 0.6 277.094 4.29347 73
julia SimString.jl 4 0.7 188.432 2.10071 100
julia SimString.jl 4 0.8 112.946 2.58249 100
python simstring-fast 2 0.6 103.434 2.32375 100
python simstring-fast 2 0.7 46.8205 0.551328 100
python simstring-fast 2 0.8 20.4447 0.328228 100
python simstring-fast 2 0.9 8.88377 0.151702 100
python simstring-fast 3 0.6 78.6086 3.52294 100
python simstring-fast 3 0.7 34.3366 0.610995 100
python simstring-fast 3 0.8 17.0185 0.246782 100
python simstring-fast 3 0.9 8.94838 0.0377482 100
python simstring-fast 4 0.6 66.6517 4.70636 100
python simstring-fast 4 0.7 32.7031 1.41806 100
python simstring-fast 4 0.8 16.4529 0.301787 100
python simstring-fast 4 0.9 9.17704 0.0362932 100
python simstring-rust (python bindings) 2 0.6 14.1127 1.02636 100
python simstring-rust (python bindings) 2 0.7 8.93835 0.494226 100
python simstring-rust (python bindings) 2 0.8 5.25181 0.157636 100
python simstring-rust (python bindings) 2 0.9 3.35812 0.0853041 100
python simstring-rust (python bindings) 3 0.6 12.5344 0.680733 100
python simstring-rust (python bindings) 3 0.7 8.26936 0.429493 100
python simstring-rust (python bindings) 3 0.8 5.31358 0.604958 100
python simstring-rust (python bindings) 3 0.9 3.61135 0.349833 100
python simstring-rust (python bindings) 4 0.6 12.1821 0.851323 100
python simstring-rust (python bindings) 4 0.7 8.77353 0.651165 100
python simstring-rust (python bindings) 4 0.8 5.6041 0.222874 100
python simstring-rust (python bindings) 4 0.9 3.78458 0.124901 100
ruby simstring-pure 2 0.6 790.417 8.92151 26
ruby simstring-pure 2 0.7 391.26 5.73673 52
ruby simstring-pure 2 0.8 179.054 1.8228 100
ruby simstring-pure 3 0.6 622.056 4.8602 33
ruby simstring-pure 3 0.7 308.686 3.4503 65
ruby simstring-pure 3 0.8 158.537 1.70569 100
ruby simstring-pure 4 0.6 570.493 13.3149 36
ruby simstring-pure 4 0.7 304.098 2.54006 66
ruby simstring-pure 4 0.8 163.295 2.96283 100
rust simstring-rust (native) 2 0.6 13.4511 0.107194 100
rust simstring-rust (native) 2 0.7 8.38545 0.0806073 100
rust simstring-rust (native) 2 0.8 4.90486 0.0564788 100
rust simstring-rust (native) 2 0.9 2.87002 0.128932 100
rust simstring-rust (native) 3 0.6 12.185 0.347008 100
rust simstring-rust (native) 3 0.7 8.0225 0.0928017 100
rust simstring-rust (native) 3 0.8 4.934 0.0597409 100
rust simstring-rust (native) 3 0.9 3.06689 0.0445847 100
rust simstring-rust (native) 4 0.6 12.1925 0.25438 100
rust simstring-rust (native) 4 0.7 8.79447 0.243556 100
rust simstring-rust (native) 4 0.8 5.42909 0.0696613 100
rust simstring-rust (native) 4 0.9 3.26807 0.116604 100

@github-actions
Copy link

github-actions bot commented Aug 6, 2025

Benchmark Results

This file is automatically generated by the CI. Do not edit manually.

System Specifications

  • OS: Linux 6.11.0-1018-azure
  • Architecture: x86_64
  • CPU Model: x86_64
  • CPU Cores: 4 logical, 2 physical
  • Memory: 15.62 GB

Insert Benchmark

language backend ngram_size mean stddev iterations
julia SimString.jl 2 84.1444 20.2116 100
julia SimString.jl 3 95.6942 22.0706 100
julia SimString.jl 4 113.203 25.4671 100
python simstring-fast 2 91.3737 4.18052 100
python simstring-fast 3 105.872 4.57075 100
python simstring-fast 4 115.799 4.90834 100
python simstring-rust (python bindings) 2 36.6298 1.08402 100
python simstring-rust (python bindings) 3 43.5112 1.57261 100
python simstring-rust (python bindings) 4 48.0112 3.21654 100
ruby simstring-pure 2 675.001 19.1384 30
ruby simstring-pure 3 768.693 19.1382 27
ruby simstring-pure 4 889.909 32.9884 23
rust simstring-rust (native) 2 33.5697 0.472784 100
rust simstring-rust (native) 3 40.0742 1.46576 100
rust simstring-rust (native) 4 41.8456 1.47559 100

Search Benchmark

language backend ngram_size threshold mean stddev iterations
julia SimString.jl 2 0.6 367.562 10.3848 55
julia SimString.jl 2 0.7 233.235 7.93964 86
julia SimString.jl 2 0.8 127.981 3.86399 100
julia SimString.jl 3 0.6 301.322 6.05041 67
julia SimString.jl 3 0.7 196.998 5.88643 100
julia SimString.jl 3 0.8 115.151 9.12925 100
julia SimString.jl 4 0.6 274.727 6.57464 73
julia SimString.jl 4 0.7 188.58 5.98388 100
julia SimString.jl 4 0.8 115.782 4.30659 100
python simstring-fast 2 0.6 112.469 5.65682 100
python simstring-fast 2 0.7 51.09 2.30284 100
python simstring-fast 2 0.8 22.1778 1.40934 100
python simstring-fast 2 0.9 9.12684 0.435397 100
python simstring-fast 3 0.6 87.0382 5.91793 100
python simstring-fast 3 0.7 38.9809 2.36743 100
python simstring-fast 3 0.8 18.725 1.26962 100
python simstring-fast 3 0.9 9.45345 0.537891 100
python simstring-fast 4 0.6 70.6974 4.41439 100
python simstring-fast 4 0.7 35.1083 2.23206 100
python simstring-fast 4 0.8 17.5233 1.31403 100
python simstring-fast 4 0.9 9.06614 0.205041 100
python simstring-rust (python bindings) 2 0.6 14.5597 1.04092 100
python simstring-rust (python bindings) 2 0.7 9.27208 0.410339 100
python simstring-rust (python bindings) 2 0.8 5.67725 0.40152 100
python simstring-rust (python bindings) 2 0.9 3.52841 0.101915 100
python simstring-rust (python bindings) 3 0.6 12.8059 0.998621 100
python simstring-rust (python bindings) 3 0.7 8.6409 0.495265 100
python simstring-rust (python bindings) 3 0.8 5.28918 0.294188 100
python simstring-rust (python bindings) 3 0.9 3.57758 0.087587 100
python simstring-rust (python bindings) 4 0.6 12.6011 0.945537 100
python simstring-rust (python bindings) 4 0.7 8.87425 0.435921 100
python simstring-rust (python bindings) 4 0.8 5.8715 0.559856 100
python simstring-rust (python bindings) 4 0.9 3.88509 0.108448 100
ruby simstring-pure 2 0.6 827.675 9.44302 25
ruby simstring-pure 2 0.7 406.303 6.80401 50
ruby simstring-pure 2 0.8 186.584 4.11284 100
ruby simstring-pure 3 0.6 656.308 11.1972 31
ruby simstring-pure 3 0.7 323.624 8.45797 62
ruby simstring-pure 3 0.8 165.939 3.85368 100
ruby simstring-pure 4 0.6 585.914 11.2453 35
ruby simstring-pure 4 0.7 316.365 8.32168 64
ruby simstring-pure 4 0.8 171.673 4.7918 100
rust simstring-rust (native) 2 0.6 13.6769 0.481645 100
rust simstring-rust (native) 2 0.7 8.4575 0.361116 100
rust simstring-rust (native) 2 0.8 5.00815 0.336188 100
rust simstring-rust (native) 2 0.9 2.9535 0.0723608 100
rust simstring-rust (native) 3 0.6 12.2328 0.25625 100
rust simstring-rust (native) 3 0.7 8.13088 0.193527 100
rust simstring-rust (native) 3 0.8 4.98056 0.171614 100
rust simstring-rust (native) 3 0.9 3.06545 0.119795 100
rust simstring-rust (native) 4 0.6 12.5205 0.597156 100
rust simstring-rust (native) 4 0.7 8.87088 0.234975 100
rust simstring-rust (native) 4 0.8 5.38636 0.062473 100
rust simstring-rust (native) 4 0.9 3.33616 0.0663829 100

@github-actions
Copy link

github-actions bot commented Aug 6, 2025

Benchmark Results

This file is automatically generated by the CI. Do not edit manually.

System Specifications

  • OS: Linux 6.11.0-1018-azure
  • Architecture: x86_64
  • CPU Model: x86_64
  • CPU Cores: 4 logical, 2 physical
  • Memory: 15.62 GB

Insert Benchmark

language backend ngram_size mean stddev iterations
julia SimString.jl 2 75.1315 19.5156 100
julia SimString.jl 3 89.2516 23.3662 100
julia SimString.jl 4 105.6 27.2153 100
python simstring-fast 2 85.2067 2.13548 100
python simstring-fast 3 99.3145 5.77524 100
python simstring-fast 4 108.427 3.79838 100
python simstring-rust (python bindings) 2 35.7809 0.415476 100
python simstring-rust (python bindings) 3 41.9651 0.438991 100
python simstring-rust (python bindings) 4 42.1236 0.405737 100
ruby simstring-pure 2 624.516 7.42419 33
ruby simstring-pure 3 677.849 7.82947 30
ruby simstring-pure 4 784.459 9.72532 26
rust simstring-rust (native) 2 33.3124 0.351372 100
rust simstring-rust (native) 3 38.3721 0.478983 100
rust simstring-rust (native) 4 40.244 3.24399 100

Search Benchmark

language backend ngram_size threshold mean stddev iterations
julia SimString.jl 2 0.6 346.926 2.83019 58
julia SimString.jl 2 0.7 221.385 1.69771 91
julia SimString.jl 2 0.8 122.473 1.44932 100
julia SimString.jl 3 0.6 287.79 2.25721 70
julia SimString.jl 3 0.7 190.528 2.96439 100
julia SimString.jl 3 0.8 112.615 2.42252 100
julia SimString.jl 4 0.6 265.644 8.03451 76
julia SimString.jl 4 0.7 180.777 2.12402 100
julia SimString.jl 4 0.8 108.05 2.64742 100
python simstring-fast 2 0.6 100.127 1.71055 100
python simstring-fast 2 0.7 45.7887 0.106501 100
python simstring-fast 2 0.8 19.8928 0.208479 100
python simstring-fast 2 0.9 8.65648 0.037088 100
python simstring-fast 3 0.6 73.8726 3.03744 100
python simstring-fast 3 0.7 33.3887 0.0978673 100
python simstring-fast 3 0.8 16.56 0.05992 100
python simstring-fast 3 0.9 8.77197 0.0441333 100
python simstring-fast 4 0.6 62.4556 3.86528 100
python simstring-fast 4 0.7 31.3141 0.59432 100
python simstring-fast 4 0.8 16.2826 0.0665898 100
python simstring-fast 4 0.9 9.0905 0.0426063 100
python simstring-rust (python bindings) 2 0.6 14.3298 1.54831 100
python simstring-rust (python bindings) 2 0.7 8.94484 0.556435 100
python simstring-rust (python bindings) 2 0.8 5.30066 0.14189 100
python simstring-rust (python bindings) 2 0.9 3.40202 0.0983849 100
python simstring-rust (python bindings) 3 0.6 12.3819 0.940079 100
python simstring-rust (python bindings) 3 0.7 8.50423 0.92888 100
python simstring-rust (python bindings) 3 0.8 5.15402 0.435975 100
python simstring-rust (python bindings) 3 0.9 3.50378 0.13205 100
python simstring-rust (python bindings) 4 0.6 12.0472 0.971295 100
python simstring-rust (python bindings) 4 0.7 8.71861 0.752842 100
python simstring-rust (python bindings) 4 0.8 5.58557 0.374837 100
python simstring-rust (python bindings) 4 0.9 3.74723 0.0885693 100
ruby simstring-pure 2 0.6 771.106 3.2249 26
ruby simstring-pure 2 0.7 378.367 3.28267 53
ruby simstring-pure 2 0.8 173.488 2.23041 100
ruby simstring-pure 3 0.6 607.963 3.00919 33
ruby simstring-pure 3 0.7 300.079 1.09873 67
ruby simstring-pure 3 0.8 153.721 0.830336 100
ruby simstring-pure 4 0.6 549.7 3.98486 37
ruby simstring-pure 4 0.7 296.264 0.757939 68
ruby simstring-pure 4 0.8 157.969 0.63191 100
rust simstring-rust (native) 2 0.6 13.2874 0.125435 100
rust simstring-rust (native) 2 0.7 8.25208 0.0640994 100
rust simstring-rust (native) 2 0.8 4.88292 0.0446596 100
rust simstring-rust (native) 2 0.9 2.88386 0.0535525 100
rust simstring-rust (native) 3 0.6 11.9949 0.188696 100
rust simstring-rust (native) 3 0.7 7.96013 0.127547 100
rust simstring-rust (native) 3 0.8 4.8936 0.0952878 100
rust simstring-rust (native) 3 0.9 2.95208 0.143157 100
rust simstring-rust (native) 4 0.6 12.0409 0.0795311 100
rust simstring-rust (native) 4 0.7 8.67798 0.0635088 100
rust simstring-rust (native) 4 0.8 5.38884 0.0548287 100
rust simstring-rust (native) 4 0.9 3.2659 0.0410311 100

@PyDataBlog PyDataBlog merged commit d53e85b into main Aug 6, 2025
33 checks passed
@PyDataBlog PyDataBlog deleted the feat/optimise-similarity-search branch August 6, 2025 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants