Skip to content
This repository has been archived by the owner on Nov 15, 2017. It is now read-only.

Performance: single char hash for ubiquitous hostnames in matrix filtering #259

Closed
gorhill opened this issue May 10, 2014 · 2 comments
Closed

Comments

@gorhill
Copy link
Owner

gorhill commented May 10, 2014

HTTP Switchboard is already significantly more efficient than all popular blockers memory and CPU wise. However this does not mean I should stop looking for further improvement.

Hostnames in liquid-dict.js are indexed using a 6- to 7-character string hash. This works toward reducing memory footprint given that out-of-the box, there are nearly 60,000 distinct hostnames in this dictionary (one dict entry can contains many hostnames).

Since a while, I've been entertaining the idea of hashing into a single (unicode) character, as this would benefit both memory and CPU efficiency, but I needed a prototype to roughly validate the idea.

So I now have a prototype, and it shows that this works, and there is a gain in both CPU and memory footprint (see http://jsperf.com/makekey-concat-vs-join/3 for the CPU aspect).

Current limitation of the new hashing function is that a hostname can't be longer than 255 characters (if it ever become an issue, worst-case is to generate a two-char hash instead of one-char).

@gorhill
Copy link
Owner Author

gorhill commented May 10, 2014

To validate that one-char hashing works as well as the 6-/7-char hashing, I expect both versions to result in exactly the same dictionary topology, the only difference being the key values. So I just need to validate this using temporary code to compare both the old and new dict.

gorhill added a commit that referenced this issue May 10, 2014
gorhill added a commit that referenced this issue May 10, 2014
@gorhill gorhill closed this as completed May 10, 2014
@gorhill
Copy link
Owner Author

gorhill commented May 18, 2014

hostname can't be longer than 255 characters (if it ever become an issue...

For the record, if it ever become an issue solution is to clamp length to 255 -- not to make hash longer.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant