Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TiffReader (and OmeTiffReader) slow for dask chunked reads #178

Closed
bryantChhun opened this issue Jan 4, 2021 · 24 comments
Closed

TiffReader (and OmeTiffReader) slow for dask chunked reads #178

bryantChhun opened this issue Jan 4, 2021 · 24 comments
Assignees
Labels
discussion API changes open for discussion enhancement New feature or request

Comments

@bryantChhun
Copy link

System and Software

  • aicsimageio Version: 3.3.1
  • Python Version: 3.7
  • Operating System: OSX 10.13.6

Description

I've generated OME-tiff-stack files from micro-manager 2.0 gamma. Pulling a single xy slice from a stack using get_image_dask_data("YX", czst=random).compute() takes approx 15 seconds. This is tested using %timeit module in jupyter notebooks. Pulling a single xy slice from a small stack is more performant (approx 700ms) ... but isn't the purpose of dask delayed reading to be less vulnerable to data file sizes? Is there some incompatibility with jupyter that i'm not aware of?

Here's some sample code that I used:

def random_pull_dask(c_max, z_max, s_max, t_max, aicsimage):
    c = np.random.randint(c_max)
    z = np.random.randint(z_max)
    s = np.random.randint(s_max)
    t = np.random.randint(t_max)
    return aicsimage.get_image_dask_data("YX", C=c, Z=z, S=s, T=t).compute()

img = AICSImage(path_to_top_ometiff)
%timeit -n 3 -r 5 data = random_pull_dask(4, 5, 4, 50, img)

The large dataset has max dimensions (C, Z, S, T) = (4, 5, 4, 50) with 2048x2048 16-bit images.
The small dataset has max dimensions (C, Z, S, T) = (4, 5, 4, 10).

Expected Behavior

What did you expect to happen instead?
I expected the dask delayed read to scale less or not at all with the size of the dataset. This appears to be a 20x slowdown in read for only a 5x difference in data size.

Reproduction

A minimal example that exhibits the behavior.

Environment

Any additional information about your environment

@toloudis
Copy link
Collaborator

toloudis commented Jan 5, 2021

Thanks for posting this issue. The performance can depend on a number of things and some of that information is not in your post but I'll do my best here.

First, some considerations for performance in aicsimageio, from more general to more specific:

  • local files always will read faster than reads across network
  • The quickest read operation will always be .data and .get_image_data on a local file.
  • If your image is too large to fit in memory: use AICSImage.dask_data,
    AICSImage.get_image_dask_data, or Reader equivalents.
  • Invoking dask will always incur overhead beyond the plain read commands
  • Normally I would suggest using get_image_dask_data("CZYX"...) to get a larger chunk into memory and then index into that chunk. However, the tiff reader is not yet optimized for chunks bigger than YX. (Also: If you are well versed in dask usage, and have dask distributed configured, then you have a chance at parallel reads when reading higher dimensional dask arrays.)
  • I believe it is also the case that to get an arbitrary plane from a tiff file, any tiff reader has to potentially walk the whole file. The spec does not require the planes to be in a specific order.

To sum up, it is conceivable that the slowdown you experience is normal under the usage conditions you are exercising. And we have some work to do to optimize tiff chunked reads for different chunk dimensions (for example, get_image_dask_data("CZYX"...) instead of get_image_dask_data("YX"...) ). This is reflected in #132 .

@bryantChhun
Copy link
Author

Thank you for the very prompt reply. Some context: We find that saving data from micro-manager into ome-tiff is much more efficient than single-page tiffs (especially for high framerate acquisitions). This makes sense. It also motivates our decision to move fully to ome-tiff as defined by micro-manager's Multi-Dimensional Acquisition.

As it turns out, one nice feature of aicsimageio is that its ome-tiff parser can read micro-manager position/scene info and create an appropriate, sliceable array. In my tests, I can't get tifffile.imread to do this. The downside of aicsimageio is that retrieving arrays through aicsimageio's get_image_dask_data scales poorly with the added data from the positions, as mentioned above.

I understand there's the ability to do all our math on dask-delayed images (I think?), leaving the .compute() for the end. This is attractive especially for its parallelization aspect, and soon to be tested. However, tifffile.imread can read into a .zarr format, which enables compute as well, and seems to scale better with larger sizes (but not include micro-manager positions). It's maybe a different topic, but do you think aicsimageio could support zarr files? This might be easy since it's just a flag to tifffile.imread.

@evamaxfield
Copy link
Collaborator

evamaxfield commented Jan 6, 2021

It also motivates our decision to move fully to ome-tiff

This is (in my opinion) a great decision! Welcome to the OME club!

the ability to do all our math on dask-delayed images (I think?), leaving the .compute() for the end. This is attractive especially for its parallelization aspect

The power of dask and distributed! Yes you can create a compute graph and then save all the computation for a single compute call. But I won't comment too much on that because their documentation is much better than my knowledge. That said, the one tricky bit that always gets beginners is: "when to call persist and when to call compute". The reason I mention this is that you may want to persist a whole Z-stack in memory then run some computations then move on to the next Z-stack.

Do you think aicsimageio could support zarr files?

Oh boy do I have a great file for you! Our roadmap. We have wanted to support Zarr for a while and it will be coming in our 4.x release! That release should be about a month out? Depending on my free time.

If you don't know, OME is also looking at supporting (and has started prototyping) ome-zarr as the next gen from ome-tiff.

However, tifffile.imread can read into a .zarr format, which enables compute as well, and seems to scale better with larger sizes (but not include micro-manager positions)

I will have to look into that parameter and see what is happening. If it works well I may simply make a quick PR that uses tifffile.imread with zarr then zarr to dask instead of our current dask array construction.

@evamaxfield
Copy link
Collaborator

Did a bit of quick testing regarding the aszarr parameter:

In [1]: import dask.array as da

In [2]: from tifffile import imread

In [3]: from aicsimageio.readers import OmeTiffReader

In [4]: r = OmeTiffReader("aicsimageio/tests/resources/pipeline-4.ome.tiff")

In [5]: r.dims
Out[5]: <Dimensions [T: 1, C: 4, Z: 65, Y: 600, X: 900]>

In [6]: r.dask_data
Out[6]: dask.array<transpose, shape=(1, 4, 65, 600, 900), dtype=uint16, chunksize=(1, 1, 1, 600, 900), chunktype=numpy.ndarray>

In [7]: z = imread("aicsimageio/tests/resources/pipeline-4.ome.tiff", aszarr=True)

In [8]: d = da.from_zarr(z)

In [9]: d
Out[9]: dask.array<from-zarr, shape=(65, 4, 600, 900), dtype=uint16, chunksize=(1, 1, 1, 900), chunktype=numpy.ndarray>

In [10]: %timeit d[0, 0,].compute()
194 ms ± 1.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit r.dask_data[0, 0, 0,].compute()
61.5 ms ± 467 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

This is using the 4.0 API so it may look a bit different. I also tested this in current stable (3.3.3) and same result.

So yes I can very easily convert the ZarrTiffStorage object returned from imread with aszarr=True to a dask array but the chunks are really small. Each chunk is a single X dim, not even YX. Which means that when I run a quick single plane read comparison,aicsimageio is much more efficient due to the chunk sizes we create. This is likely my not knowing fully how to use the ZarrTiffStorage object though as there may be a better way to cast to dask from it rather than the naive cast.

From your original comment:

However, tifffile.imread can read into a .zarr format, which enables compute as well, and seems to scale better with larger sizes

I would love to see the timeit comparison of exactly above where you are getting better times from the Zarr store. Or let me know how you set it up so I can test.

@cgohlke
Copy link

cgohlke commented Jan 6, 2021

but the chunks are really small. Each chunk is a single X dim, not even YX. Which means that when I run a quick single plane read comparison,aicsimageio is much more efficient due to the chunk sizes we create.

Zarr has a ~40 microseconds overhead for reading a single chunk. Bio-Formats creates files with RowsPerStrip=1 segments/chunks by default (other TIFF writers are smarter). Both can add up to significant overhead compared to the asarray interface for access patterns that don't work well with small strips.

@evamaxfield
Copy link
Collaborator

Bio-Formats creates files with RowsPerStrip=1 segments/chunks by default (other TIFF writers are smarter).

Just tested more files:

aicsimageio/tests/resources/pipeline-4.ome.tiff
dask.array<from-zarr, shape=(65, 4, 600, 900), dtype=uint16, chunksize=(1, 1, 1, 900), chunktype=numpy.ndarray>
tifffile plane read: 1.4302599430084229
aicsimageio plane read: 0.9705557823181152
--------------------------------------------------------------------------------
aicsimageio/tests/resources/s_1_t_1_c_1_z_1.ome.tiff
dask.array<from-zarr, shape=(325, 475), dtype=uint16, chunksize=(1, 475), chunktype=numpy.ndarray>
tifffile plane read: 0.1106119155883789
aicsimageio plane read: 0.09806299209594727
--------------------------------------------------------------------------------
aicsimageio/tests/resources/variance-cfe.ome.tiff
dask.array<from-zarr, shape=(65, 9, 600, 900), dtype=uint16, chunksize=(1, 1, 36, 900), chunktype=numpy.ndarray>
tifffile plane read: 0.08590221405029297
aicsimageio plane read: 1.7470948696136475
--------------------------------------------------------------------------------
aicsimageio/tests/resources/pre-variance-cfe.ome.tiff
dask.array<from-zarr, shape=(65, 9, 600, 900), dtype=uint16, chunksize=(1, 1, 36, 900), chunktype=numpy.ndarray>
tifffile plane read: 0.19701695442199707
aicsimageio plane read: 1.6554019451141357
--------------------------------------------------------------------------------
aicsimageio/tests/resources/s_1_t_1_c_10_z_1.ome.tiff
dask.array<from-zarr, shape=(10, 1736, 1776), dtype=uint16, chunksize=(1, 18, 1776), chunktype=numpy.ndarray>
tifffile plane read: 0.05503201484680176
aicsimageio plane read: 0.09723997116088867
--------------------------------------------------------------------------------
aicsimageio/tests/resources/3d-cell-viewer.ome.tiff
dask.array<from-zarr, shape=(74, 9, 1024, 1024), dtype=uint16, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
tifffile plane read: 0.06575918197631836
aicsimageio plane read: 1.6189160346984863
--------------------------------------------------------------------------------
aicsimageio/tests/resources/s_3_t_1_c_3_z_5.ome.tiff
dask.array<from-zarr, shape=(5, 3, 325, 475), dtype=uint16, chunksize=(1, 1, 1, 475), chunktype=numpy.ndarray>
tifffile plane read: 0.14586305618286133
aicsimageio plane read: 0.5717129707336426
--------------------------------------------------------------------------------
aicsimageio/tests/resources/actk.ome.tiff
dask.array<from-zarr, shape=(6, 65, 233, 345), dtype=float64, chunksize=(1, 1, 23, 345), chunktype=numpy.ndarray>
tifffile plane read: 0.040633201599121094
aicsimageio plane read: 0.4918510913848877
--------------------------------------------------------------------------------

The above is getting the first plane from each file. Sometimes that is a single chunk and sometimes that is many chunks.

I will keep poking around but looks like the chunks determined by the image writer may be a safer bet than our current chunks so thanks @cgohlke for that bit of knowledge. We may have an optimization PR come out of this.

@bryantChhun
Copy link
Author

Wow thanks for all this quick work!

However, tifffile.imread can read into a .zarr format, which enables compute as well, and seems to scale better with larger sizes

I would love to see the timeit comparison of exactly above where you are getting better times from the Zarr store. Or let me know how you set it up so I can test.

I apologize. I made that statement but hadn't run through the same tests as earlier.

@evamaxfield evamaxfield changed the title ome-tiff reader is very slow on large datasets TiffReader (and OmeTiffReader) slow for dask chunked reads Jan 6, 2021
@evamaxfield
Copy link
Collaborator

Going to be running some more intensive benchmarks on this later today and will report back with my results.

Things I am planning on benchmarking:

  • random single plane read
  • random single z-stack read
  • full file read (using compute)

All benchmarks will be ran on 3.3.3.

Before I even run these benchmarks I will also want to see if the "drop in replacement" of self._dask_data = da.from_zarr(tifffile.imread(f, aszarr=True)) passes our unit tests. We have some specific tests for if we can serialize everything that I am curious about.

@evamaxfield evamaxfield self-assigned this Jan 14, 2021
@evamaxfield evamaxfield added discussion API changes open for discussion enhancement New feature or request labels Jan 14, 2021
@evamaxfield
Copy link
Collaborator

evamaxfield commented Jan 14, 2021

Sorry for the great delay @bryantChhun got distracted by other things.

I also didn't want to just benchmark this stuff as a one off, so I am taking this issue as a chance to also resolve #105, specifically, I am using asv which should really help for benchmarking this and monitoring the library in the future.

Preliminary Results and Comments:

All tests were ran on the 4.x TiffReader as running these benchmarks on 3.x series is moot point, especially considering the new Reader specification and requirements in 4.x.

First Plane Read

image

Similar to my early results and comment above, it would seem that the performance is highly depentent on the TIFF's RowsPerStrip. Of the files tested, the files written with tifffile generally have a RowsPerStrip of > 1, the files written by BioFormats have a RowsPerStrip of 1 and you can clearly see the performance difference.

  • For any of the files that have low RowsPerStrip values, current aicsimageio dask array construction and usage for "First Plane Reading" is generally faster than the imread(f, aszarr=True) implementation.
  • For any of the files that have high RowsPerStrip values, current aicsimageio dask array construction and usage for "First Plane Reading" is generally slower than the imread(f, aszarr=True) implementation.

Random Plane Read

image

All of the same comments from First Plane Reading apply.

  • For low RowsPerStrip value files, current aicsimageio dask array construction and usage is generally faster than the imread(f, aszarr=True) implementation.
  • For high RowsPerStrip value files, current aicsimageio dask array construction and usage is generally slower than the imread(f, aszarr=True) implementation.

Not Included

  • time_init which benchmarks how long it takes to construct the TiffReader object, doesn't apply to this issue.
  • time_array_construct which benchmarks how long it takes to construct the delayed dask array object itself, is generally stable / better times with aszarr (which makes sense) .
  • time_dask_read which benchmarks how long it takes to do .dask_data.compute(), has variable times and considering the obvious caveat that users should rarely use that codepath because generally (except in really specific distributed cluster configurations), simply using .data is faster.
  • For even more benchmarking, I plan to add a time_random_chunk_read that acts as time_random_plane_read on steroids.

Major Considerations

  • Notice in the images of the benchmarks that all benchmark sets have a second parameter of LOCAL - this is the host parameter. I couldn't get any of the tests with the aszarr to work with a REMOTE host (s3) due to the next consideration...

  • Notice in the code of the changes to TiffReader I am using self.path instead of the open resource from fsspec, I couldn't get aszarr to work with a buffer / fsspec based open resource. This means that if we did want to switch over to aszarr, there would be a code path for local files and a code path for remote files and I don't really like that but I could be convinced otherwise.

  • A side effect of using the aszarr implementation is that it makes the TiffReader unserializable due to a threadlock in the TiffZarrStore object (iirc). One of our basic tests on the AICSImage and all base Reader objects is that they must be able to able to pass serdes tests with cloudpickle. This was originally created as a test because of my use of Prefect and Dask and the desire to pass an already constructed AICSImage object as a parameter to a fuction that runs on a distributed worker. For example:

from aicsimageio import AICSImage
from prefect import task

@task
def do_some_img_computation(img: AICSImage):
   ...

Would no longer be possible if we moved to the aszarr implimentation and considering the above issues I had with fsspec buffers, to me it's a non-starter. But maybe @cgohlke you can tell me if there is another parameter I need to pass to get that to work with remote reads.

Final Comments

I think my general statement about this particular problem is:

The TIF format is just not meant for chunked reading... We as a community need to move to Zarr and NGFF as soon as we can to solve problems like these.

There is a bunch of info here so please feel free to give me any and all comments, questions, feedback on my implementation. Happy to hear them all. And @cgohlke if you see anything that could potentially improve or change the results of these benchmarks please let me know!

@evamaxfield
Copy link
Collaborator

Also tagging @joshmoore because I would be curious as to your thoughts on this problem as well.

@cgohlke
Copy link

cgohlke commented Jan 14, 2021

The TIF format is just not meant for chunked reading...

Actually TIFF was designed for chunked reading and rewriting. Just over 30 years ago...

I think tifffile's zarr interface works reasonably well for common tasks on many existing WSI, multi-resolution OME-TIFF, or LSM files in the hundreds of gigabytes on a local workstation. It depends on how the files were written and the access pattern.

if you see anything that could potentially improve or change the results of these benchmarks

The next version of tifffile will add an option (chunkmode='page') to use whole pages/IFDs instead of individual tiles/strips as zarr chunks. That significantly speeds up accessing frames in OME-TIFF produced by Bio-Formats with RowsPerStrip=1. Anyway, RowsPerStrip=1 might be beneficial for slicing along dimensions other than the image plane.

A side effect of using the aszarr implementation is that it makes the TiffReader unserializable due to a threadlock in the TiffZarrStore object (iirc).

Please provide a simple Python/tifffile example that I could test. The lock is needed to synchronize seeking and reading from a single file handle.

@evamaxfield
Copy link
Collaborator

evamaxfield commented Jan 14, 2021

Actually TIFF was designed for chunked reading and rewriting. Just over 30 years ago...

Hah. Fair.

I think tifffile's zarr interface works reasonably well for common tasks on many existing WSI, multi-resolution OME-TIFF, or LSM files in the hundreds of gigabytes on a local workstation. It depends on how the files were written and the access pattern.

Totally agree. I think in general if your file is local the tifffile aszarr route is probably the go to / I would really like to use it as the basis for the dask array construction.

The next version of tifffile will add an option (chunkmode='page') to use whole pages/IFDs instead of individual tiles/strips as zarr chunks. That significantly speeds up accessing frames in OME-TIFF produced by Bio-Formats with RowsPerStrip=1. Anyway, RowsPerStrip=1 might be beneficial for slicing along dimensions other than the image plane.

I eagerly await that release then! Will keep building out the benchmark suite in the meantime to prepare for it.

Please provide a simple Python/tifffile example that I could test. The lock is needed to synchronize seeking and reading from a single file handle.

Will send later tonight!

@evamaxfield
Copy link
Collaborator

@cgohlke I can post this as an issue over on the tifffile repo if you want but here is a short snippet for attempting to serialize the Zarr Store:

import dask.array as da
from distributed.protocol import serialize
from tifffile import imread

store = imread("aicsimageio/tests/resources/actk.ome.tiff", aszarr=True)
arr = da.from_zarr(store)

serialize(store)
# serialize(arr)  # also fails because it's just a reference to the store

deps:

dask
distributed
zarr
tifffile

@cgohlke
Copy link

cgohlke commented Jan 15, 2021

Yes, pickling won't work as long as ZarrFileStore is using a TiffFile instance to do the heavy lifting. To overcome this limitation one could implement the File Chunk Store approach. This could require duplicating/refactoring parts of the TiffFile implementation for those TIFF features that are not supported by the zarr specification/implementation. The overhead for accessing chunks would be even higher. I have to think about it some more...

@joshmoore
Copy link
Contributor

joshmoore commented Jan 15, 2021

#178 (comment) Also tagging @joshmoore because I would be curious as to your thoughts on this problem as well.

Thanks for the ping, @JacksonMaxfield, I assume your primarily interested in the RowsPerStrip issue? Oddly enough, we're working on some latency metrics right now as well (incl. HDF5 with pytest-benchmark ) so this is remarkably good timing either way. Here are some random end of the week thoughts then I'll need to get back to you next week:

  • Bio-Format's API default of RowsPerStrip=1 is certainly unfortunate. All newer implementations should be writing by tile (cf. bioformats2raw & friends) but we haven't taken the step of breaking the API to force the fact.
  • I'll give scripts/benchmark.py a closer look & hopefully a run and see if I have any immediate feedback unless you think there's a more appropriate starting point.
  • I just stumbled on tifffile's ZarrTiffStorage. Not sure how I missed (or forgot) the fact, but definitely looking forward to trying it out.
  • FileChunkStore is pretty exciting but like @cgohlke I still don't have my head around it completely. Will we be shipping additional index files with each TIFF? How will they be discovered? Etc.

cc: @dgault @sbesson

@evamaxfield
Copy link
Collaborator

At the risk of saying this before it's ready to go... I think I have found a solution that enables near aszarr levels of read time but keeps support for remote reading + serialization. The benchmarks will probably complete in about an hour or so. Will have more info then. (The short answer of how, is simply larger chunks to combat the 40ms overhead of aszarr).

Addressing some comments:

To overcome this limitation one could implement the File Chunk Store approach. This could require duplicating/refactoring parts of the TiffFile implementation for those TIFF features that are not supported by the zarr specification/implementation. The overhead for accessing chunks would be even higher. I have to think about it some more...

I don't think this is the play. Or at least not for an aicsimageio implementation because this would only work for a local file where our 4.x spec supports remote reads as well.

Thanks for the ping, @JacksonMaxfield, I assume your primarily interested in the RowsPerStrip issue?

Yep. Basically just wanted to see if there was a reason for BioFormats to have the specified RowsPerStrip as it is currently and if you all were working on "performant TIFFs" even though I know Zarr is coming.

Bio-Format's API default of RowsPerStrip=1 is certainly unfortunate. All newer implementations should be writing by tile (cf. bioformats2raw & friends) but we haven't taken the state of breaking the API to force the fact.

Answering my above question / statement, seems like you have known and thought about it!

I'll give scripts/benchmark.py a closer look & hopefully a run and see if I have any immediate feedback unless you think there's a more appropriate starting point.

If you wanted to run the benchmarks described in this issue it's a bit of a process... but asv is cool so may be worth just "having fun learning a new tool":

git checkout feature/benchmarks
pip install quilt3
python scripts/download_test_resources.py
pip install -e .[dev]
asv run {commitA}..{commitB}

The scripts/benchmarks.py file is outdated now that I am using asv.

FileChunkStore is pretty exciting but like @cgohlke I still don't have my head around it completely. Will we be shipping additional index files with each TIFF? How will they be discovered? Etc.

My above comment still stands regarding this implimentation. It may be good for local files, may be bad for remote files.

@cgohlke
Copy link

cgohlke commented Jan 15, 2021

I released a new tifffile version, which can optionally use whole images from pages/IFDs instead of strips/tiles as zarr chunks by passing chunkmode='page' to the ZarrTiffStore constructor. The feature is experimental.

I don't think this is the play. Or at least not for an aicsimageio implementation because this would only work for a local file where our 4.x spec supports remote reads as well.

By "File Chunk Store approach" I meant something like this. E.g., the store is initialized with the file path(s), offsets/bytecounts of the chunks in the file(s), and a decode function. Then __getitem__ maps the key to a specific file/offset/bytecount, reads the chunk from the local or remote file, and returns the decoded chunk to zarr. Why would that store not be serializable or not work for remote files?
The challenging part is to build a decode function and serialize it with the store.

@evamaxfield
Copy link
Collaborator

evamaxfield commented Jan 15, 2021

Alright some more updates!

I released a new tifffile version, which can optionally use whole images from pages/IFDs instead of strips/tiles as zarr chunks by passing chunkmode='page' to the ZarrTiffStore constructor. The feature is experimental.

The benchmarks I ran for comparison below use this new release and the chunkmode="page" param! Across the board, the new chunkmode param makes a massive difference so thank you so much @cgohlke 👏.

Unfortunately screenshotting the benchmarks on this one looks a bit weird in some cases due to how I am creating certain commits so I am summarizing.

  1. Comparing current 4.x TiffReader to "base tifffile Zarr impl"
  • average ~90% reduction in random chunk reads
  • average ~60% reduction in random plane reads
  1. Comparing current 4.x TiffReader to "optimized TiffReader + aszarr + larger chunks impl"
  • average ~70% reduction in random chunk reads
  • average ~20% reduction in random plane reads (this is expected, since in this impl each chunk is larger than a single plane)

I would say this is great news for a couple of reasons:

  1. I can continue down this path and add something like the existing chunk_by_dims parameter on the CziReader to the TiffReader (which was originally Configure TiffReader to read ZYX chunks at a time #132). In doing so, I would default to chunking ZYX, but, you as a user, could choose to chunk just YX (but not recommended -- details below), TYX, CZYX, or similar.
  2. With my additions on top of aszarr and chunkmode, we get a larger speedup and can read remote + can serialize! I didn't benchmark it yet but I have tried reading a remote OME-TIFF with the changes and the speed-up applies there too which is incredible.
  3. There are still more minor optimizations and bug fixes I need to make so these numbers should either stay stable or go down even further.

Addressing the "overhead". The way I patched the serialization and remote read problems is that I reinitialize the Zarr Store every time I want to read a chunk, so basically add ~40ms overhead to each chunk read. This is why larger chunks are generally better. There is less time spent indexing relative to the total time of pulling that chunk of data. (That ~90% reduction vs ~70% reduction is largely due to just aicsimageio having to remake the Zarr store for every chunk and it simply adds up. If I benchmarked with even larger chunks it would probably be even closer a comparison). I will give the obvious note here that tifffile.imread + aszarr + chunkmode="page" will probably always be faster than aicsimageio because we have the added overhead and requirement of remote files + serializability.

I will keep tinkering with this over the weekened and try to drop these read times even more / get them closer to base tifffile but I am pretty happy right now with where this is heading.


Responding to @cgohlke other comment about the "File Chunk Store approach":

By "File Chunk Store approach" I meant something like this. E.g., the store is initialized with the file path(s), offsets/bytecounts of the chunks in the file(s), and a decode function. Then getitem maps the key to a specific file/offset/bytecount, reads the chunk from the local or remote file, and returns the decoded chunk to zarr. Why would that store not be serializable or not work for remote files?

Ahhhh I totally missed that. That makes sense but more thought required for sure. My mistake!

@evamaxfield
Copy link
Collaborator

I have a working 4.x TiffReader optimization that allows for custom chunk_by_dims on the feature/benchmarks branch.

Here are results comparing current 4.x TiffReader against the optimized TiffReader:
image

🟢 is good.

In short, for "real files" (any of the non-super-small test files), across the board our read times basically drop to ~5% of what they are on the current 4.x branch (which has comparable read times to the current release).

Reiterated in code, this:

from aicsimageio.readers import TiffReader

r = TiffReader("some-file.tiff")  # can even include chunk_by_dims=["T", "Z", "Y", "X"] for even faster
delayed = r.get_image_dask_data("TZYX", C=0)  # or w.e. you want
delayed.compute()

now takes just 5% of the time compared to current 4.x development head / current release (3.3.4).

Link to the most up-to-date benchmark file.
I updated the benchmarks not to test the "random plane reads", but, with the new parameter, it now tests "random single chunk read" and "random many chunk read" because I think it's a bit better of a benchmark as that is more typical user behavior. It is pretty rare to only grab a single plane but rather common to retrieve a whole Z or T-stack (at least from my personal experience).

All unit tests pass (including local and remote reading, and serialization) for the TiffReader but need to fix some bugs for the OmeTiffReader due to the new implementation.

Gotta give a massive thanks to @cgohlke again for the aszarr and chunkmode params and implementation work.

@evamaxfield
Copy link
Collaborator

Closing this issue for now. @bryantChhun We talked about this in our dev meeting and decided that we aren't going to backport the optimization. We hope to have a dev_release out for you to use the 4.x optimization as soon as possible and we have a tentative release schedule for 4.x for Feb. 16.

@bryantChhun
Copy link
Author

Fantastic. Thanks for all the hard work. I look forward to the release!

@evamaxfield
Copy link
Collaborator

evamaxfield commented Jan 28, 2021

@bryantChhun and everyone else who followed this thread who wants to give it a go:

AICSImageIO 4.0.0.dev0 has been released 🎉

You can install with: pip install aicsimageio==4.0.0.dev0

4.0.0 work is on branch: main
Docs for the entire repo have already been transitioned over to the 4.0 docs: https://allencellmodeling.github.io/aicsimageio/
Benchmarks are also now available: https://allencellmodeling.github.io/aicsimageio/_benchmarks/index.html#/

This release supports the 4.0.0 API for AICSImage, DefaultReader, TiffReader, and OmeTiffReader
The TiffReader and OmeTiffReader chunked read time has been improved by 95% compared to 3.3.5!

dev release 1 is scheduled for Feb. 3 (ish) which will bring in ArrayLikeReader and the Writers module!

For anyone who see this and wants to give the dev release a try, please do but please read the README / docs. The API has changed a bit 😅

@bryantChhun
Copy link
Author

Perfect timing! Thank you. I was about to pull the 4.0 branch, literally, 10 minutes ago :-)

@evamaxfield
Copy link
Collaborator

Perfect timing! Thank you. I was about to pull the 4.0 branch, literally, 10 minutes ago :-)

Awesome. If you find any issues / want to give feedback for something please just post as a new GitHub issue! I will try to patch fixes as we go :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion API changes open for discussion enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants