Skip to content

tomwhite/memray-array

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

memray-array

Measuring memory usage of Zarr array storage operations using memray.

In an ideal world array storage operations would be zero-copy, but many libraries do not achieve this in practice. The scripts here measure what the actual empirical behaviour is across different filesystems (local/cloud), Zarr stores (local/s3fs/obstore), compression settings (using numcodecs), Zarr Python versions (v2/v3), and Zarr formats (2/3).

TL;DR

  • Writes using Zarr Python 3.0.8 and obstore are now best achieveable.
  • But s3fs writes could be improved, see: Using the Python buffer protocol in pipe fsspec/s3fs#959
  • Reads still need a lot of work, see: Codec pipeline memory usage zarr-developers/zarr-python#2904

Updates

Summary

The workload is simple: create a random 100MB NumPy array and write it to Zarr storage in a single chunk. Then (in a separate process) read it back from storage into a new NumPy array.

  • Writes with no compression should not incur any buffer copies.
  • Writes with compression incur a buffer copy, since implementations first write the compressed bytes into another buffer, which has to be around the size of the uncompressed bytes (since it is not known in advance how compressible the original is).
  • Reads with no compression should be able to incur no buffer copies by reading directly into the array buffer. However, this is not currently implemented in any of the libraries tested. See zarr-developers/zarr-python#2904
  • Reads with compression incur a second buffer copy for a separate decompress step, except for Zarr v2 reading from the local filesystem which can decompress directly into the array buffer.

Writes

Number of extra copies needed to write an array to storage using Zarr. (Links are to memray flamegraphs. Bold indicates best achievable.)

Filesystem Store Zarr Python/Numcodecs Zarr format Uncompressed Compressed
Local local v2 (2.18.7/0.15.1) 2 0 2
v3 (3.0.6/0.15.1) 3 1 2
v3 (3.0.8/0.16.1) 3 0 1
obstore v3 (3.0.8/0.16.1) 3 0 1
S3 s3fs v2 (2.18.7/0.15.1) 2 1 2
v3 (3.0.6/0.15.1) 3 1 2
obstore v3 (3.0.8/0.16.1) 3 0 1

Reads

Number of extra copies needed to read an array from storage using Zarr. (Links are to memray flamegraphs. Bold indicates best achievable.)

Filesystem Store Zarr Python/Numcodecs Zarr format Uncompressed Compressed
Local local v2 (2.18.7/0.15.1) 2 1 1
v3 (3.0.6/0.15.1) 3 1 2
obstore v3 (3.0.8/0.16.1) 3 1 2
S3 s3fs v2 (2.18.7/0.15.1) 2 2 2
v3 (3.0.6/0.15.1) 3 2 2
obstore v3 (3.0.8/0.16.1) 3 1 2

Discussion

Update: some of these paths have been fixed now, so the below may not represent the code in the latest releases.

This delves into what is happening for the different code paths, and suggests some remedies to reduce the number of buffer copies.

Writes

  • Local uncompressed writes (v2 only) - actual copies 0, desired copies 0

    • This is the only zero-copy case. The numpy array is passed directly to the file's write() method (in DirectoryStore), and since arrays implement the buffer protocol, no copy is made.
  • S3 uncompressed writes (v2 only) - actual copies 1, desired copies 0

    • A copy of the numpy array is made by this code in fsspec (in maybe_convert, called from FSMap.setitems()): bytes(memoryview(value)).
    • Remedy: it might be possible to use the memory view in fsspec and avoid the copy (see fsspec/s3fs#959), but it's probably better to focus on improvements to v3 (see below)
  • Uncompressed writes (v3 only) - actual copies 1, desired copies 0

  • Compressed writes - actual copies 2, desired copies 1

    • It is surprising that there are two copies, not one, given that the uncompressed case has zero copies (for local v2, at least). What's happening is that the numcodecs blosc compressor is making an extra copy when it resizes the compressed buffer. A similar thing happens for lz4 and zstd.
    • Remedy: the issue is tracked in numcodecs in zarr-developers/numcodecs#717.

Reads

  • Local reads (v2 only) - actual copies 1, desired copies 0

    • The Zarr Python v2 read pipeline separates reading the bytes from storage, and filling the output array - see _process_chunk(). So there is necessarily a buffer copy, since the bytes are never read directly into the output array.
    • Remedy: Zarr Python v2 is in bugfix mode now so there is no point in trying to change it to make fewer buffer copies. The changes would be quite invasive anyway.
  • Local reads (v3 only), plus obstore local and S3 - actual copies 1 (2 for compressed), desired copies 0 (1 for compressed)

    • The Zarr Python v3 CodecPipeline has a read() method that separates reading the bytes from storage, and filling the output array (just like v2). The ByteGetter class has no way of reading directly into an output array.
    • Remedy: this could be fixed by zarr-developers/zarr-python#2904, but it is potentially a major change to Zarr's internals
  • S3 reads (s3fs only) - actual copies 2, desired copies 0

    • Both the Python asyncio SSL library and aiohttp introduce a buffer copy when reading from S3 (using s3fs).
    • Remedy: unclear

Related issues

How to run

Create a new virtual env (for Python 3.11), then run

pip install -r requirements.txt

Local

pip install -U 'zarr<3' 'numcodecs<0.16.0'
python memray-array.py write
python memray-array.py write --no-compress
python memray-array.py read
python memray-array.py read --no-compress

pip install -U 'zarr==3.0.6' 'numcodecs<0.16.0'
python memray-array.py write
python memray-array.py write --no-compress
python memray-array.py read
python memray-array.py read --no-compress

pip install -U 'zarr==3.0.8' 'numcodecs>=0.16.0'
python memray-array.py write
python memray-array.py write --no-compress

pip install -U 'zarr==3.0.8' 'numcodecs>=0.16.0'
python memray-array.py write --library obstore
python memray-array.py write --no-compress --library obstore
python memray-array.py read --library obstore
python memray-array.py read --no-compress --library obstore

S3

These can take a while to run (unless run from within AWS).

Note: change the URL to an S3 bucket you own and have already created.

pip install -U 'zarr<3' 'numcodecs<0.16.0'
python memray-array.py write --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --store-prefix=s3://cubed-unittest/mem-array

pip install -U 'zarr==3.0.6' 'numcodecs<0.16.0'
python memray-array.py write --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --store-prefix=s3://cubed-unittest/mem-array

pip install -U 'zarr==3.0.8' 'numcodecs>=0.16.0'
export AWS_DEFAULT_REGION=...
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
python memray-array.py write --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py write --no-compress --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --library obstore --store-prefix=s3://cubed-unittest/mem-array
python memray-array.py read --no-compress --library obstore --store-prefix=s3://cubed-unittest/mem-array

Memray flamegraphs

mkdir -p flamegraphs
(cd profiles; for f in $(ls *.bin); do echo $f; python -m memray flamegraph --temporal -f -o ../flamegraphs/$f.html $f; done)

Or just run make.

About

Measuring memory usage of Zarr array storage operations using memray

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages