Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate using different Collections libraries for filter speed-up #230

Closed
pjeli opened this issue May 31, 2019 · 3 comments · Fixed by #292
Closed

Investigate using different Collections libraries for filter speed-up #230

pjeli opened this issue May 31, 2019 · 3 comments · Fixed by #292
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@pjeli
Copy link
Collaborator

pjeli commented May 31, 2019

I have been reading up on and looking into the following repo for a while now: https://github.com/lacuna/bifurcan/.

There are some interesting properties here similar to the optimization done in the NameNode's LightweightGSet implementation; namely the use of contiguous memory regions to reduce memory size and speed up iteration.

I believe it may be possible to use this library to speedup:
(1) Bootstrap time.
(2) General collection time for all queries.
(3) Reduce memory usage overall from intermittent Map collections.

I believe #224 has done enough optimization for now and is not necessary for the next immediate release, but perhaps this can be for a 1.6.4 or 1.7.x release.

@pjeli pjeli added enhancement New feature or request help wanted Extra attention is needed labels May 31, 2019
@pjeli
Copy link
Collaborator Author

pjeli commented May 31, 2019

Some early numbers show that for a GSet of 1M inodes, the lacuna.bifurcan.Map implementation is 1/4 the memory cost of a java.util.HashMap and also about 4x as fast at filtering and collecting files and directories.

This could be pretty substantial.

@pjeli pjeli added this to the 1.6.4 milestone Jun 4, 2019
@pjeli
Copy link
Collaborator Author

pjeli commented Jun 5, 2019

Unfortunately the current lacuna.bifurcan Map, List, and Set implementations, and their derivatives, do not implement the java.util.Collection interface and therefore supporting them would involve re-writing the majority of NNA. Something I am not too keen on doing given the QueryEngine re-write I just went through.

I think I will hold off on this for now. Here's hoping that I find either a similar library that maps to Collection or that the author of lacuna.bifurcan's repo adds such implementations. 🤞

@pjeli pjeli removed this from the 1.6.4 milestone Jun 5, 2019
@pjeli pjeli changed the title Investigate using lacuna.bifurcan Maps for filter speed-up Investigate using different Collections libraries for filter speed-up Jun 13, 2019
@pjeli
Copy link
Collaborator Author

pjeli commented Jan 6, 2020

I've been experimenting with microbenchmarks and a couple libraries. I think the path forward is going to be allowing a configuration option to experiment with different Collections.

I think a simple path forward is, by default, to use ConcurrentHashMap (the current Collection). Then to allow regular HashMap. Then Trove.

Trove has shown to take about as long as ConcurrentHashMap for loading while using alot less memory. It make take longer to run queries on however; need to experiment at scale.

Meanwhile, HashMap is the fastest for loading - uses less memory but close to Trove. However I expect to see issues when iterating over HashMap for queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant