Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace pair_list with hash table #1128

Draft
wants to merge 29 commits into
base: master
Choose a base branch
from
Draft

Replace pair_list with hash table #1128

wants to merge 29 commits into from

Conversation

asvetlov
Copy link
Member

@asvetlov asvetlov commented Apr 5, 2025

No description provided.

Copy link

codspeed-hq bot commented Apr 7, 2025

CodSpeed Performance Report

Merging #1128 will degrade performances by 31.97%

Comparing ht (aa786a7) with master (1c5d240)

Summary

⚡ 48 improvements
❌ 12 regressions
✅ 184 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_cimultidict_add_istr[c-extension-module] 2.6 ms 3.6 ms -26.59%
test_cimultidict_delitem_istr[c-extension-module] 86.1 µs 66.2 µs +30.17%
test_cimultidict_extend_istr[c-extension-module] 2.4 ms 3.3 ms -27.01%
test_cimultidict_extend_istr_with_kwargs[c-extension-module] 6.5 ms 8.1 ms -20.14%
test_cimultidict_fetch_istr[c-extension-module] 57.9 µs 46.9 µs +23.3%
test_cimultidict_get_istr_hit[c-extension-module] 69.6 µs 58.4 µs +19.11%
test_cimultidict_get_istr_hit_with_default[c-extension-module] 71.6 µs 60.4 µs +18.47%
test_cimultidict_get_istr_miss[c-extension-module] 73.2 µs 48.2 µs +51.94%
test_cimultidict_get_istr_with_default_miss[c-extension-module] 75.3 µs 50.2 µs +50.01%
test_cimultidict_insert_istr[c-extension-module] 65.1 µs 50.3 µs +29.57%
test_cimultidict_update_istr[c-extension-module] 128.5 µs 49 µs ×2.6
test_cimultidict_update_istr_with_kwargs[c-extension-module] 292 µs 152.7 µs +91.25%
test_create_cimultidict_with_dict_istr[c-extension-module] 40.2 µs 45.9 µs -12.43%
test_create_cimultidict_with_items_istr[c-extension-module] 51.5 µs 46.3 µs +11.15%
test_create_multidict_with_dict[case-sensitive-c-extension-module] 36.4 µs 41.9 µs -13.17%
test_create_multidict_with_items[case-sensitive-c-extension-module] 48 µs 42.6 µs +12.54%
test_multidict_add_str[case-insensitive-c-extension-module] 5.5 ms 6.5 ms -15.21%
test_multidict_add_str[case-sensitive-c-extension-module] 2 ms 3 ms -31.97%
test_multidict_delitem_str[case-insensitive-c-extension-module] 111 µs 91.2 µs +21.71%
test_multidict_delitem_str[case-sensitive-c-extension-module] 77.2 µs 57.4 µs +34.43%
... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

@asvetlov
Copy link
Member Author

asvetlov commented Apr 7, 2025

Heh. Appending new values to the multidict is more expensive, all other ops are faster.

We have a tradeoff, as usual. I think that the lookup is a much more often operation than the multidict filling.
Also, pls keep in mind that the item replacement/deletion also requires a lookup.

@asvetlov
Copy link
Member Author

asvetlov commented Apr 7, 2025

The PR is more-or-less done but I'd like to make some polishing and self-review later.
Please don't merge it, I'll be only partially available next month, I cannot commit that I'll have enough time for fixing issues when the new version will be released.

Careful testing is appreciated!

New multidict is close to Python's dict except for multiple keys, of course.

It starts from the empty hashtable, which grows by a power of 2 starting from 8: 8, 16, 32, 64, 128, ...
The amount of items is 2/3 of the hashtable size (1/3 of the table is never allocated).

The table is resized if needed, and bulk updates (extend(), update(), and constructor calls) pre-allocate many items at once, reducing the amount of potential hashtable resizes.

Item deletion puts DKIX_DUMMY special index in the hashtable. In opposite to the standard dict, DKIX_DUMMY is never replaced with an index of the new entry except by hashtable indices rebuild. It allows to keep the insertion order for multiple equal keys.

The iteration for operations like getall() is a little tricky. The next index calculation could return the already visited index before reaching the end. To eliminate duplicates, the code marks already visited entries by entry->hash = -1. -1 hash is an invalid hash value that could be used as a marker. After the iteration finishes, all marked entries are restored.
Double iteration over the indices still has O(1) amortized time, it is ok.

.add(), val = md[key], md[key] = val, md.setdefault() all have O(1).
.getall() / .popall() have O(N) where N is the amount of returned items.
.update() / extend() have O(N+M) where N and M are amount of items in the left and right arguments (we had quadratic time here before).
popitem() is slightly slower because the function should calculate the index of the last (deleted) entry. Since the method is really rare; I don't care too much.

.copy() is super fast; construction multidict from another multidict could be optimized to reuse .copy() approach.

The performance is okay, multidict creation is slightly slower because the hashtable should be recalculated and the indices table rebuilds, but all other operations are faster.

Multidict is still slightly slower than the regular dict because I don't want to use the private API for accessing internal Python structures, the most notable is the string's hash.

TODO:

  1. Optimize MultiDict(md) construction.
  2. GC untrack the multidict if all keys/values are not tracked. In aiohttp, values are usually untracked str instanced.

Open question: Should we use HT_PERTURB_SHIFT in the index calculation? It is crucial for storing integers where hash(num) == num, but I doubt if it decreases the number of collisions for str keys. During the work on PR I saw many times when idx = next(idx) didn't change the current idx or it was like 1, 2, 1, 3 with a relatively high chance of duplication.
It may not be worth changing. I have no idea how to prove this except by calculating the collision statistics on a large enough number of keys.

If anybody wants to play with the code, two things could help debugging.

  1. Uncomment # CFLAGS = ["-O0", "-g3", "-UNDEBUG"] line in setup.py to make a debugging build. It enables asserts and ASSERT_CONSISTENT(...) self-consistency check.
  2. _ht_dump(...) function prints the current hashtable structure in a form useful for analyzing the internal structure.

Please feel free to experiment with and ask any questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant