-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TiffReader (and OmeTiffReader) slow for dask chunked reads #178
Comments
Thanks for posting this issue. The performance can depend on a number of things and some of that information is not in your post but I'll do my best here. First, some considerations for performance in aicsimageio, from more general to more specific:
To sum up, it is conceivable that the slowdown you experience is normal under the usage conditions you are exercising. And we have some work to do to optimize tiff chunked reads for different chunk dimensions (for example, |
Thank you for the very prompt reply. Some context: We find that saving data from micro-manager into ome-tiff is much more efficient than single-page tiffs (especially for high framerate acquisitions). This makes sense. It also motivates our decision to move fully to ome-tiff as defined by micro-manager's Multi-Dimensional Acquisition. As it turns out, one nice feature of aicsimageio is that its ome-tiff parser can read micro-manager position/scene info and create an appropriate, sliceable array. In my tests, I can't get I understand there's the ability to do all our math on dask-delayed images (I think?), leaving the |
This is (in my opinion) a great decision! Welcome to the OME club!
The power of
Oh boy do I have a great file for you! Our roadmap. We have wanted to support Zarr for a while and it will be coming in our 4.x release! That release should be about a month out? Depending on my free time. If you don't know, OME is also looking at supporting (and has started prototyping)
I will have to look into that parameter and see what is happening. If it works well I may simply make a quick PR that uses |
Did a bit of quick testing regarding the In [1]: import dask.array as da
In [2]: from tifffile import imread
In [3]: from aicsimageio.readers import OmeTiffReader
In [4]: r = OmeTiffReader("aicsimageio/tests/resources/pipeline-4.ome.tiff")
In [5]: r.dims
Out[5]: <Dimensions [T: 1, C: 4, Z: 65, Y: 600, X: 900]>
In [6]: r.dask_data
Out[6]: dask.array<transpose, shape=(1, 4, 65, 600, 900), dtype=uint16, chunksize=(1, 1, 1, 600, 900), chunktype=numpy.ndarray>
In [7]: z = imread("aicsimageio/tests/resources/pipeline-4.ome.tiff", aszarr=True)
In [8]: d = da.from_zarr(z)
In [9]: d
Out[9]: dask.array<from-zarr, shape=(65, 4, 600, 900), dtype=uint16, chunksize=(1, 1, 1, 900), chunktype=numpy.ndarray>
In [10]: %timeit d[0, 0,].compute()
194 ms ± 1.95 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [11]: %timeit r.dask_data[0, 0, 0,].compute()
61.5 ms ± 467 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) This is using the 4.0 API so it may look a bit different. I also tested this in current stable (3.3.3) and same result. So yes I can very easily convert the From your original comment:
I would love to see the timeit comparison of exactly above where you are getting better times from the Zarr store. Or let me know how you set it up so I can test. |
Zarr has a ~40 microseconds overhead for reading a single chunk. Bio-Formats creates files with RowsPerStrip=1 segments/chunks by default (other TIFF writers are smarter). Both can add up to significant overhead compared to the |
Just tested more files:
The above is getting the first plane from each file. Sometimes that is a single chunk and sometimes that is many chunks. I will keep poking around but looks like the chunks determined by the image writer may be a safer bet than our current chunks so thanks @cgohlke for that bit of knowledge. We may have an optimization PR come out of this. |
Wow thanks for all this quick work!
I apologize. I made that statement but hadn't run through the same tests as earlier. |
Going to be running some more intensive benchmarks on this later today and will report back with my results. Things I am planning on benchmarking:
All benchmarks will be ran on Before I even run these benchmarks I will also want to see if the "drop in replacement" of |
Sorry for the great delay @bryantChhun got distracted by other things. I also didn't want to just benchmark this stuff as a one off, so I am taking this issue as a chance to also resolve #105, specifically, I am using asv which should really help for benchmarking this and monitoring the library in the future. Preliminary Results and Comments:All tests were ran on the 4.x
First Plane ReadSimilar to my early results and comment above, it would seem that the performance is highly depentent on the TIFF's
Random Plane ReadAll of the same comments from First Plane Reading apply.
Not Included
Major Considerations
from aicsimageio import AICSImage
from prefect import task
@task
def do_some_img_computation(img: AICSImage):
... Would no longer be possible if we moved to the Final CommentsI think my general statement about this particular problem is:
There is a bunch of info here so please feel free to give me any and all comments, questions, feedback on my implementation. Happy to hear them all. And @cgohlke if you see anything that could potentially improve or change the results of these benchmarks please let me know! |
Also tagging @joshmoore because I would be curious as to your thoughts on this problem as well. |
Actually TIFF was designed for chunked reading and rewriting. Just over 30 years ago... I think tifffile's zarr interface works reasonably well for common tasks on many existing WSI, multi-resolution OME-TIFF, or LSM files in the hundreds of gigabytes on a local workstation. It depends on how the files were written and the access pattern.
The next version of tifffile will add an option (
Please provide a simple Python/tifffile example that I could test. The lock is needed to synchronize seeking and reading from a single file handle. |
Hah. Fair.
Totally agree. I think in general if your file is local the
I eagerly await that release then! Will keep building out the benchmark suite in the meantime to prepare for it.
Will send later tonight! |
@cgohlke I can post this as an issue over on the tifffile repo if you want but here is a short snippet for attempting to serialize the Zarr Store: import dask.array as da
from distributed.protocol import serialize
from tifffile import imread
store = imread("aicsimageio/tests/resources/actk.ome.tiff", aszarr=True)
arr = da.from_zarr(store)
serialize(store)
# serialize(arr) # also fails because it's just a reference to the store deps:
|
Yes, pickling won't work as long as ZarrFileStore is using a TiffFile instance to do the heavy lifting. To overcome this limitation one could implement the File Chunk Store approach. This could require duplicating/refactoring parts of the TiffFile implementation for those TIFF features that are not supported by the zarr specification/implementation. The overhead for accessing chunks would be even higher. I have to think about it some more... |
Thanks for the ping, @JacksonMaxfield, I assume your primarily interested in the RowsPerStrip issue? Oddly enough, we're working on some latency metrics right now as well (incl. HDF5 with pytest-benchmark ) so this is remarkably good timing either way. Here are some random end of the week thoughts then I'll need to get back to you next week:
|
At the risk of saying this before it's ready to go... I think I have found a solution that enables near Addressing some comments:
I don't think this is the play. Or at least not for an
Yep. Basically just wanted to see if there was a reason for BioFormats to have the specified
Answering my above question / statement, seems like you have known and thought about it!
If you wanted to run the benchmarks described in this issue it's a bit of a process... but
The
My above comment still stands regarding this implimentation. It may be good for local files, may be bad for remote files. |
I released a new tifffile version, which can optionally use whole images from pages/IFDs instead of strips/tiles as zarr chunks by passing
By "File Chunk Store approach" I meant something like this. E.g., the store is initialized with the file path(s), offsets/bytecounts of the chunks in the file(s), and a decode function. Then |
Alright some more updates!
The benchmarks I ran for comparison below use this new release and the Unfortunately screenshotting the benchmarks on this one looks a bit weird in some cases due to how I am creating certain commits so I am summarizing.
I would say this is great news for a couple of reasons:
Addressing the "overhead". The way I patched the serialization and remote read problems is that I reinitialize the Zarr Store every time I want to read a chunk, so basically add ~40ms overhead to each chunk read. This is why larger chunks are generally better. There is less time spent indexing relative to the total time of pulling that chunk of data. (That ~90% reduction vs ~70% reduction is largely due to just I will keep tinkering with this over the weekened and try to drop these read times even more / get them closer to base Responding to @cgohlke other comment about the "File Chunk Store approach":
Ahhhh I totally missed that. That makes sense but more thought required for sure. My mistake! |
I have a working 4.x Here are results comparing current 4.x 🟢 is good. In short, for "real files" (any of the non-super-small test files), across the board our read times basically drop to ~5% of what they are on the current 4.x branch (which has comparable read times to the current release). Reiterated in code, this: from aicsimageio.readers import TiffReader
r = TiffReader("some-file.tiff") # can even include chunk_by_dims=["T", "Z", "Y", "X"] for even faster
delayed = r.get_image_dask_data("TZYX", C=0) # or w.e. you want
delayed.compute() now takes just 5% of the time compared to current 4.x development head / current release (3.3.4). Link to the most up-to-date benchmark file. All unit tests pass (including local and remote reading, and serialization) for the Gotta give a massive thanks to @cgohlke again for the |
Closing this issue for now. @bryantChhun We talked about this in our dev meeting and decided that we aren't going to backport the optimization. We hope to have a |
Fantastic. Thanks for all the hard work. I look forward to the release! |
@bryantChhun and everyone else who followed this thread who wants to give it a go: AICSImageIO 4.0.0.dev0 has been released 🎉 You can install with: 4.0.0 work is on branch: main This release supports the 4.0.0 API for dev release 1 is scheduled for Feb. 3 (ish) which will bring in For anyone who see this and wants to give the dev release a try, please do but please read the README / docs. The API has changed a bit 😅 |
Perfect timing! Thank you. I was about to pull the 4.0 branch, literally, 10 minutes ago :-) |
Awesome. If you find any issues / want to give feedback for something please just post as a new GitHub issue! I will try to patch fixes as we go :) |
System and Software
Description
I've generated OME-tiff-stack files from micro-manager 2.0 gamma. Pulling a single xy slice from a stack using
get_image_dask_data("YX", czst=random).compute()
takes approx 15 seconds. This is tested using %timeit module in jupyter notebooks. Pulling a single xy slice from a small stack is more performant (approx 700ms) ... but isn't the purpose of dask delayed reading to be less vulnerable to data file sizes? Is there some incompatibility with jupyter that i'm not aware of?Here's some sample code that I used:
The large dataset has max dimensions (C, Z, S, T) = (4, 5, 4, 50) with 2048x2048 16-bit images.
The small dataset has max dimensions (C, Z, S, T) = (4, 5, 4, 10).
Expected Behavior
What did you expect to happen instead?
I expected the dask delayed read to scale less or not at all with the size of the dataset. This appears to be a 20x slowdown in read for only a 5x difference in data size.
Reproduction
A minimal example that exhibits the behavior.
Environment
Any additional information about your environment
The text was updated successfully, but these errors were encountered: