Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memmap reads from directory store #265

Closed
artttt opened this issue Jun 4, 2018 · 15 comments
Closed

memmap reads from directory store #265

artttt opened this issue Jun 4, 2018 · 15 comments

Comments

@artttt
Copy link

artttt commented Jun 4, 2018

Ive only recently started using zarr but im impressed. well done.

I want to share an experience and a possible enhancement.
In one of my use cases i use vindex heavily across the whole array. I know this is likely a worst use case scenario as zarr is reading many many chunks for a small amount of data in each one.
I was previously using numpy memmap arrays for a similar use and it was much faster so i wondered if i used an uncompressed DirectoryStore if it would read chunks as a memmap. no luck, still reading full chunks. So i had a go at subclassing DirectoryStore to do this.


class MemMapReadStore(zarr.DirectoryStore):
    """Directory store using MemMap for reading chunks
    """
    def __getitem__(self, key):
        filepath = os.path.join(self.path, key)
        if os.path.isfile(filepath):
            #are there only 2 types of files? .zarray and the chunks?
            if key == '.zarray':
                with open(filepath, 'rb') as f:
                    return f.read()
            else:
                return np.memmap(filepath,mode='r')
        else:
            raise KeyError(key)

Its working well for me but I dont really know the inner workings of zarr so who knows what i might have broken and other features it wont play well with. I thought the idea might be a basis for an enhancement though. Worth sharing at least.

Speed up depends on access pattern, compression etc but for the example im testing im seeing 22 times speed up v a compressed zarr array of the same dimensions and chunking.

Its only working for reads as that was all i needed and i see the way you write replaces the whole chunk so memmap writes are not doable.

@alimanfoo
Copy link
Member

alimanfoo commented Jun 4, 2018 via email

@artttt
Copy link
Author

artttt commented Jun 5, 2018

Thanks for the pointer to #40. It looks like the possibility of a blosc_getitem type of access into a compressed chunk would be a great feature.

My data is dense in parts and sparse in other parts it therefore has many empty chunks if i stored the full array it would be too big for my disk. Zarr deals with this nicely. So does a plain numpy memmap on some platforms that use a sparse file that can be bigger then the disk however on windows it seems sparse files cant be bigger then the space available. So at a minimum Zarr gives me cross platform compatibility. Also, most of my use cases will benefit from compressed data so its nice to have both options in one even if i have to make a decision at creation time about which storage is going to be best.

Additionally Ive been trying the LRU cache over the directoryStore and seeing some benefits particularly for the uncompressed zarr (not using the memmaped store). I then made a LRU that takes a compressed store and cached the uncompressed data. It gives about a 3 times speed up for me when i get cache hits at the expense of more RAM usage.

Anyway i feel im prematurely optimising as a distraction. I should just get on with using zarr for my work.

from zarr.meta import decode_array_metadata,encode_array_metadata
from zarr.codecs import get_codec
from zarr.compat import OrderedDict_move_to_end
class LRUStoreCacheDecoded(zarr.LRUStoreCache):
    """ same as LRUStoreCache but stores chunks decoded for 
    faster access at the expense of higher RAM usage
   NOTE: assumes no filter is used.
   NOTE: not fully tested. Will likely break for use beyond reading a zarr.
    """
    def __getitem__(self, key):
        #this is not the right place for this but for trying it out it will do.
        #need to get the compressor to use but then tell the
        #next user in the chain that its uncompressed
        if key == '.zarray':
            value = self._store[key]
            meta = decode_array_metadata(value)
            # setup compressor
            config = meta['compressor']
            if config is None:
                self._compressor = None
            else:
                self._compressor = get_codec(config)
                meta['compressor']= None
                value = encode_array_metadata(meta)
            return value
            
        try:
            # first try to obtain the value from the cache
            with self._mutex:
                value = self._values_cache[key]
                # cache hit if no KeyError is raised
                self.hits += 1
                # treat the end as most recently used
                OrderedDict_move_to_end(self._values_cache, key)

        except KeyError:
            # cache miss, retrieve value from the store
            value = self._store[key]
            ####################### added next 2 lines to decode anything read from the store straight away
            if self._compressor:
                value = self._compressor.decode(value)
            with self._mutex:
                self.misses += 1
                # need to check if key is not in the cache, as it may have been cached
                # while we were retrieving the value from the store
                if key not in self._values_cache:
                    self._cache_value(key, value)

        return value

@alimanfoo
Copy link
Member

alimanfoo commented Jun 5, 2018 via email

@calclavia
Copy link

calclavia commented Jul 17, 2018

I've a related use case that I believe would benefit from introducing chunk-based decompressed cache. I use zarr for storing data that will be used for training neural networks. For this use case, often times you want to sample random (or almost random) rows from the dataset. If the sampling is mostly localized within a chunk, it would be great if the LRUCache could cache an entire chunk so we can take advantage of spatial locality.

For example, I would like to same data points [1, 5, 8, 3, 2], and because these all reside in the same compressed chunk (cached by LRU), only reading the first sample should be slow, and the rest should be already cached in memory.

@jakirkham
Copy link
Member

jakirkham commented Nov 20, 2018

FWIW have been thinking about this more lately.

Am wondering if we shouldn't try to use Memory-mapped files in all directory store cases. Admittedly there are some exotic filesystems where we will need to fallback to reading the whole chunk into bytes. That said, for most cases chunks are probably large enough to benefit from Memory-mapping.

Not to mention Python's mmap and NumPy's memmap implement the buffer protocol. Meaning that one can create views onto the data when slicing instead of copying, which is nice for avoiding some intermediate copies. In particular we currently construct bytes objects, which introduce a copy.

Admittedly this is less relevant for this issue, but with PR ( zarr-developers/numcodecs#121 ) we should be able to leverage Memory-mapped files nicely in our compressors. The result being we can stream data from disk (optionally) through compressors into NumPy arrays returned to the user. Should be useful for performance and generally handling large chunks.

@alimanfoo
Copy link
Member

@jakirkham nice ideas. Would be worth some simple benchmarking to verify using mmap does indeed improve performance. It may sound like it should reduce memory copies, but there could be subtleties that make it not so obvious.

@jakirkham
Copy link
Member

Good point. What would we consider a fair dataset to use for these benchmarks?

@alimanfoo
Copy link
Member

alimanfoo commented Nov 20, 2018 via email

@alimanfoo
Copy link
Member

alimanfoo commented Nov 20, 2018 via email

@meggart
Copy link
Member

meggart commented Nov 21, 2018

FWIW as long as using mmap doesn't hurt performance, I'd be tempted to use it.

One downside of memory-mapping is that you can not really control the amount of memory used by the OS library for caching (see e.g. numpy/numpy#7732). In a real use case on an HPC system, where there seems to be sufficient memory available, the lib might use all of it and the job gets killed by the scheduler for using more memory than reserved. So keeping the possibility to opt-out in certain cases might be desirable.

@jakirkham
Copy link
Member

What if we just added a flag to DirectoryStore to optionally allow memory-mapping that defaults to False? That should be a pretty compact, useful change, which would take a step in this direction.

@alimanfoo
Copy link
Member

What if we just added a flag to DirectoryStore to optionally allow memory-mapping that defaults to False? That should be a pretty compact, useful change, which would take a step in this direction.

FWIW I'd be happy with that if the implementation is straightforward. I'd also be happy with adding a separate store class if that's easier/simpler.

@jakirkham
Copy link
Member

Partly suggesting a flag as memory-mapping feels like a user optimization (perhaps one that different users want or don't want) as opposed to a fundamentally different way of storing the data (e.g. NestedDirectoryStore). Plus if it is a flag, several existing stores based off of DirectoryStore can provide the same benefit with no real change to them.

@jakirkham
Copy link
Member

Put together PR ( #377 ), which adds the memmap option so we can further the discussion by looking at an implementation.

@jakirkham
Copy link
Member

Since PR ( #377 ) was opened, we added PR ( #503 ), which allows users to customize how reading occurs by overriding the staticmethod _fromfile of DirectoryStore. For example:

class MemoryMappedDirectoryStore(DirectoryStore):
    def _fromfile(self, fn):
        with open(fn, "rb") as fh:
            return memoryview(mmap.mmap(fh.fileno(), 0, prot=mmap.PROT_READ))

This store can then be used with Groups and Arrays.

Given a user can do this on their own easily, have turned this into a doc issue ( #1245 ). Closing this out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants