-
Notifications
You must be signed in to change notification settings - Fork 454
Mmap support for reference sequences. #187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This greatly reduces memory usage when many jobs are running on the same machine as the references are then shared between processes.
This has been used so we can check HAVE_MMAP. The CRAM code already had a lot of legacy io_lib_config.h includes, which are now htslib_config.h. (The @defs@ method (with or without AC_CONFIG_HEADERS) is a far better way of handling the irods backend too as it doesn't need rather messy gmake extensions, but this hasn't been changed.)
Check AC_FUNC_MMAP in configure, but note that this invokes a bunch of tests for standard headers etc, which is not ideal in 2015. We use AC_CHECK_HEADER carefully to avoid it pulling these tests in and perhaps we can do something similar for AC_FUNC_*.
@jkbonfield Do you have examples of tests showing that mmap is better than seeking? I experimented quite a bit with this in fastahack and didn't notice any difference in performance. Similar results in rocksdb, which allows mmapping the db files although notes that this tends to decrease performance. In my experience, it seems mostly to make coding easier by allowing files to look like large contiguous regions of memory. In principle it seems great, so I'm curious if there might be a particular way to use it that yields better performance. |
On Wed, Apr 01, 2015 at 03:30:31AM -0700, Erik Garrison wrote:
No I don't, but if doing lots of small fetches from within the same For seeking to larger offsets, fundamentally the OS has to do a
The primary goal for adding mmap was robustness when dealing with We had a corner case of running lots of single threaded jobs on small The clear implication here is that the memory to hold the reference James James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova The Wellcome Trust Sanger Institute is operated by Genome Research |
Thanks James, this really helps.
|
The mmap support is useful for reducing memory usage and also minimising disk I/O on querying small regions.
The second commit is rather orthogonal to this, but still desireable for portability as it means I can detect whether mmap is available.