-
-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
imread - investigate possible performance improvement #161
Comments
dask/dask#5913 has several comments describing how to accurately profile speedups |
One potential blocker: jni thinks the reason
|
One thing worth noting (and I may have mentioned this to Juan as well) is |
Hi guys, just saw this here and remembered that I did stumble upon bad performance of
So I added a
import itertools
import numbers
import warnings
import dask
import dask.array
import dask.delayed
import numpy
import pims
from dask_image.imread import _utils
def imread_mb(fname, nframes=1, *, arraytype="numpy"):
"""
Read image data into a Dask Array.
Provides a simple, fast mechanism to ingest image data into a
Dask Array.
Parameters
----------
fname : str
A glob like string that may match one or multiple filenames.
nframes : int, optional
Number of the frames to include in each chunk (default: 1).
arraytype : str, optional
Array type for dask chunks. Available options: "numpy", "cupy".
Returns
-------
array : dask.array.Array
A Dask Array representing the contents of all image files.
"""
if not isinstance(nframes, numbers.Integral):
raise ValueError("`nframes` must be an integer.")
if (nframes != -1) and not (nframes > 0):
raise ValueError("`nframes` must be greater than zero.")
if arraytype == "numpy":
arrayfunc = numpy.asanyarray
elif arraytype == "cupy": # pragma: no cover
import cupy
arrayfunc = cupy.asanyarray
with pims.open(fname) as imgs:
shape = (len(imgs),) + imgs.frame_shape
dtype = numpy.dtype(imgs.pixel_type)
if nframes == -1:
nframes = shape[0]
if nframes > shape[0]:
warnings.warn(
"`nframes` larger than number of frames in file."
" Will truncate to number of frames in file.",
RuntimeWarning
)
elif shape[0] % nframes != 0:
warnings.warn(
"`nframes` does not nicely divide number of frames in file."
" Last chunk will contain the remainder.",
RuntimeWarning
)
lower_iter, upper_iter = itertools.tee(itertools.chain(
range(0, shape[0], nframes),
[shape[0]]
))
next(upper_iter)
# a = []
# for i, j in zip(lower_iter, upper_iter):
# print(i, j)
# a.append(dask.array.from_delayed(
# dask.delayed(_utils._read_frame)(fname, slice(i, j),
# arrayfunc=arrayfunc),
# (j - i,) + shape[1:],
# dtype,
# meta=arrayfunc([])
# ))
# a = dask.array.concatenate(a)
def func(fname, arrayfunc, block_info=None):
i, j = block_info[None]['array-location'][0]
return _utils._read_frame(fname, slice(i, j), arrayfunc=arrayfunc)
from dask.array.core import normalize_chunks
a = dask.array.map_blocks(
func,
chunks=normalize_chunks((nframes, ) + shape[1:], shape),
fname=fname,
arrayfunc=arrayfunc,
meta=arrayfunc([]),
)
return a Comparison: # write some dummy data
import tifffile
import numpy as np
for t in range(10000):
tmpim = np.random.randint(0,1000, [2, 2]).astype(np.uint16)
tifffile.imsave('data/im_t%03d.tif' %t, tmpim) from dask_image import imread
%timeit imread.imread('data/im_*.tif')
from dask_image import imread
%timeit imread_mb('data/im_*.tif')
So there's a big performance difference! Also, indexing the resulting array is faster in the im = imread.imread('data/im_*.tif')
im_mb = imread_mb('data/im_*.tif')
def iterate_through_first_axis(im):
for i in range(im.shape[0]):
im[i]
return %timeit iterate_through_first_axis(im)
%timeit iterate_through_first_axis(im_mb)
So it seems that |
Related discussion: #121 |
Wow @m-albert those are some really significant speedups! A pull request would be welcome. FYI: the tests for imread are currently failing, which will block merging new |
I've just used @m-albert's code above to run a similar comparison, with two slight changes:
# write some dummy data
import tifffile
import numpy as np
for t in range(200):
tmpim = np.random.randint(0,1000, [2000, 2000]).astype(np.uint16)
tifffile.imsave('data/im_t%03d.tif' %t, tmpim) Benchmark resultsBaseline (current
|
@GenevieveBuckley Yes, as you say graph construction is faster using [len(ar.dask.dependencies.keys()) for ar in [im, im_mb]]
That might explain why not only array creation but also tasks like indexing are faster as shown above (indexing is also the test case used in the issue you linked dask/dask#5913). |
It might be worth to comparing performance to In [1]: import tifffile
In [2]: %timeit tifffile.imread('data/im_*.tif')
1.61 s ± 22.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [3]: %timeit tifffile.imread('data/im_*.tif', ioworkers=16)
442 ms ± 7.39 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [4]: from dask_image import imread
In [5]: %timeit imread.imread('data/im_*.tif').compute()
2.6 s ± 9.42 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [7]: %timeit imread_mb('data/im_*.tif').compute()
2.55 s ± 17.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) |
@cgohlke cool, Replacing import tifffile, glob
def _read_frame_tifffile(fn, i, *, arrayfunc=numpy.asanyarray):
fns = glob.glob(fn)
fns = sorted(fns)
x = tifffile.imread(fns[i])
if i.stop - i.start == 1:
x = x[None, ...]
return arrayfunc(x) leads to faster reading than with %timeit imread_mb_tifffile('data/im_*.tif').compute(scheduler='threads')
%timeit imread_mb('data3/im_*.tif').compute(scheduler='threads')
Interestingly, despite threading %timeit tifffile.imread('data/im_*.tif')
In any case I guess So I guess this issue is about minimising specifically the overhead due to task graph manipulations. |
Sure. It was just meant as a baseline. If you are concerned about oversubscribing threads, it's probably better to disable multi-threading in tifffile with I guess dask_image performance should be comparable to this (requires latest version of tifffile): import tifffile
import zarr
import dask.array
def imread(filenames):
with tifffile.imread(filenames, aszarr=True) as storage:
return dask.array.from_zarr(storage)
%timeit imread('data/im_*.tif').compute()
|
Do we have a sense of why |
@cgohlke that makes sense.
@jakirkham I suspect it might have to do with the use of symbolic mappings ( [len(ar.dask.dependencies.keys()) for ar in [im, im_mb]]
[len(ar.dask.layers.keys()) for ar in [im, im_mb]]
[sys.getsizeof(ar.dask.layers) for ar in [im, im_mb]]
E.g. an entire layer in the im_mb.dask.layers['func-019ae6ced5c051d76aced6041c13072a'] Blockwise<((<function imread_mb.<locals>.func at 0x10e66ef70>, None), ((<class 'tuple'>, ['block_info']), None), ('block-info-func-019ae6ced5c051d76aced6041c13072a', ('.0', '.1', '.2'))) -> func-019ae6ced5c051d76aced6041c13072a> See also the docstring of class Blockwise(Layer):
"""Tensor Operation
This is a lazily constructed mapping for tensor operation graphs.
This defines a dictionary using an operation and an indexing pattern.
It is built for many operations like elementwise, transpose, tensordot, and
so on. We choose to keep these as symbolic mappings rather than raw
dictionaries because we are able to fuse them during optimization,
sometimes resulting in much lower overhead.
""" |
Maybe it's worth profiling? |
@jakirkham True, improving the performance of At least regarding the arrays they produce I found that calling |
Closed by #165 |
It's been found that the performance of
da.map_blocks
is much better thanda.stack
when joining large arrays: dask/dask#5913It's unclear if
da.concatenate
(like we use in imread) is also slower, but this seems likely. We should investigate if we can get a performance benefit by switching toda.map_blocks
.The text was updated successfully, but these errors were encountered: