Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid storing full keys in leafs #10

Open
wastl opened this issue Jun 6, 2014 · 3 comments
Open

avoid storing full keys in leafs #10

wastl opened this issue Jun 6, 2014 · 3 comments

Comments

@wastl
Copy link

wastl commented Jun 6, 2014

We have an application where we store lots of URIs and map them to integer IDs. Since almost all URIs start with the same prefix, we decided to use the ART trie to do this mapping. Unfortunately, I realised that ART still keeps a complete copy of the key in its leaf nodes - wouldn't it be possible to dynamically reconstruct keys by traversing the trie from the root when needed (e.g. for the iteration)? In our case this would safe tremendous amounts of main memory.

@armon
Copy link
Owner

armon commented Jun 6, 2014

I'm not sure if there is a good way to do this. The internal nodes will only store up to 10 bytes of the key, at which point they actually rely on the leaf to store the key.

The problem is this, suppose you insert a few hundred keys with prefix "aaa....aaaa", and then you come along and insert "aaa{36}b.....bbbb". To determine the split point, we need the full key to determine the 37th character diverges. Since you can pick any arbitrary number for that, there will always be some limit to how much can be stored in the art nodes directly. That said, there is probably some rather complex de-duplication logic that can be used, but I don't currently have time to tackle that.

@LouHubiao
Copy link

I need insert about 1,000,000,000 node, and the size of ART will like B Tree and the height of tree is much more than B Tree.

@KeeProMise
Copy link

We have an application where we store lots of URIs and map them to integer IDs. Since almost all URIs start with the same prefix, we decided to use the ART trie to do this mapping. Unfortunately, I realised that ART still keeps a complete copy of the key in its leaf nodes - wouldn't it be possible to dynamically reconstruct keys by traversing the trie from the root when needed (e.g. for the iteration)? In our case this would safe tremendous amounts of main memory.

How many urls do you store and how much memory is used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants