tiffsequence with pattern and axesorder #76

dschneiderch · 2021-04-08T19:25:23Z

this seems like its getting off topic from the original thread but i wanted to ask more about higher dimension arrays

If you are interested in organizing your files into a higher dimensional zarr array, TiffSequence takes an optional regular expression pattern that matches axes and sequence indices in the file names. That can quite complicated:

tifffile/tests/test_tifffile.py

Lines 12686 to 12729 in 581d7a5

@pytest.mark.skipif(SKIP_PRIVATE or SKIP_LARGE or SKIP_CODECS, reason=REASON)

def test_sequence_wells_axesorder():

"""Test FileSequence with well plates and axes reorder."""

ptrn = r'(?:_(z)_(\d+)).*_(?P<p>[a-z])(?P<a>\d+)(?:_(s)(\d))(?:_(w)(\d))'

fnames = private_file('BBBC/BBBC006_v1_images_z_00/*.tif')

fnames += private_file('BBBC/BBBC006_v1_images_z_01/*.tif')

tifs = TiffSequence(fnames, pattern=ptrn, axesorder=(1, 2, 0, 3, 4))

assert len(tifs) == 3072

assert tifs.shape == (16, 24, 2, 2, 2)

assert tifs.axes == 'PAZSW'

data = tifs.asarray()

assert isinstance(data, numpy.ndarray)

assert data.flags['C_CONTIGUOUS']

assert data.shape == (16, 24, 2, 2, 2, 520, 696)

assert data.dtype == 'uint16'

assert data[8, 12, 1, 0, 1, 256, 519] == 1579

if not SKIP_ZARR:

with tifs.aszarr() as store:

assert_array_equal(data, zarr.open(store, mode='r'))

@pytest.mark.skipif(SKIP_PRIVATE or SKIP_LARGE, reason=REASON)

def test_sequence_tiled():

"""Test FileSequence with tiled OME-TIFFs."""

# Dataset from https://github.com/tlambert03/tifffolder/issues/2

ptrn = re.compile(

r'\[(?P<U>\d+) x (?P<V>\d+)\].*(C)(\d+).*(Z)(\d+)', re.IGNORECASE

)

fnames = private_file('TiffSequenceTiled/*.tif', expand=False)

tifs = TiffSequence(fnames, pattern=ptrn)

assert len(tifs) == 60

assert tifs.shape == (2, 3, 2, 5)

assert tifs.axes == 'UVCZ'

data = tifs.asarray(is_ome=False)

assert isinstance(data, numpy.ndarray)

assert data.flags['C_CONTIGUOUS']

assert data.shape == (2, 3, 2, 5, 2560, 2160)

assert data.dtype == 'uint16'

assert data[1, 2, 1, 3, 1024, 1024] == 596

if not SKIP_ZARR:

with tifs.aszarr(is_ome=False) as store:

assert_array_equal(

data[1, 2, 1, 3:5], zarr.open(store, mode='r')[1, 2, 1, 3:5]

)

This one works:

pattern = r'(.{2})-(.+)-\d{8}T\d{6}-PSII0-(\d)'
pngs = tifffile.TiffSequence('data/psII/dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern)

can you clarify why this doesnt work? it doesn't like the lack of the 2nd group even though that regex works too at regex101.

pattern = r'(.{2})-.+-\d{8}T\d{6}-PSII0-(\d)'
pngs = tifffile.TiffSequence('data/psII/dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern)

FileSequence: failed to parse file names (axes do not match within image sequence)

Also, based on your example I thought I could give an axes per group in the regex. is that not correct?

pattern = r'(.{2})-(.+)-\d{8}T\d{6}-PSII0-(\d)'
pngs = tifffile.TiffSequence('data/psII/dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern, axesorder=(2,1,0))

gives IndexError: list index out of range

my files are:

A1-doi-20200531T210155-PSII0-1.png
B1-doi-20200531T210155-PSII0-2.png

The text was updated successfully, but these errors were encountered:

cgohlke · 2021-04-08T21:46:25Z

Can you explain what shape and axes you are expecting, at which indices the files should be, and where in the file name the indices are encoded? Axes labels are single letters, indices are integers.

dschneiderch · 2021-04-08T23:25:39Z

For every group I expected an extra dimension, even if length one.
pattern = r'(.{2})-(.+)-\d{8}T\d{6}-PSII0-(\d)'
would give shape (2,1,2,640,480)
sampleid = A1, B1
experiment = 'doi'
frameid = 1, 2
image dimensions are 640x480

and in this particular example, I would expect an array of NA for the case where sampleid=B1, frameid=1 and sampleid=A1, frameid=2

Sorry, I don't understand axes labels vs indices and it sounds like I'm missing something fundamental here.

cgohlke · 2021-04-09T00:37:53Z

Thank you. I understand now. The current implementation does not handle categories (like A1, B1) but requires indices in form of numbers or characters (which can be converted to numbers) for each dimension. You can probably work around this if your categories all have distinct characters at certain positions, e.g.:

import tifffile
import imagecodecs

pattern = r'(?P<A>[A-Z])\d-(?P<B>d)oi-\d{8}T\d{6}-PSII0-(?P<C>\d)'

with tifffile.TiffSequence(
    'dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern
) as pngs:
    print(pngs)
    print(pngs.asarray().shape)

Output:

FileSequence: files are missing. Missing data are zeroed
TiffSequence
 A1-doi-20200531T210155-PSII0-1.png
 files: 2
 shape: 2, 1, 2
 axes: ABC
(2, 1, 2, 200, 200)

You might be better off with a DataFrame or database than with a numpy or zarr array to handle your data. Check if similar tools like PIMS can handle categories...

dschneiderch · 2021-04-09T15:32:33Z

Ok, Thanks. I did not realize axes could only be individual letters.

Thanks for the alternative suggestions. I was trying to avoid rolling my own solution and zarr seemed like a potential solution but it seems I am pushing this beyond where it is. I will check out PIMS too.

cgohlke · 2021-10-10T03:17:04Z

This can be done with v2021.10.10:

import tifffile
import imagecodecs

with tifffile.FileSequence(
    imagecodecs.imread,
    'dataset-A1-20200531/*.png',
    pattern=(
        r'(?P<sampleid>.{2})-'
        r'(?P<experiment>.+)-\d{8}T\d{6}-PSII0-'
        r'(?P<frameid>\d)'
    ),
    categories={'sampleid': {'A1': 0, 'B1': 1}, 'experiment': {'doi': 0}},
) as pngs:
    print(pngs)
    print(pngs.asarray().shape)

FileSequence
 A1-doi-20200531T210155-PSII0-1.png
 files: 2 (2 missing)
 shape: 2, 1, 2
 labels: sampleid, experiment, frameid
(2, 1, 2, 200, 200)

dschneiderch closed this as completed Apr 9, 2021

cgohlke added the enhancement New feature or request label Jun 10, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tiffsequence with pattern and axesorder #76

tiffsequence with pattern and axesorder #76

dschneiderch commented Apr 8, 2021

cgohlke commented Apr 8, 2021

dschneiderch commented Apr 8, 2021

cgohlke commented Apr 9, 2021 •

edited

Loading

dschneiderch commented Apr 9, 2021

cgohlke commented Oct 10, 2021

tiffsequence with pattern and axesorder #76

tiffsequence with pattern and axesorder #76

Comments

dschneiderch commented Apr 8, 2021

cgohlke commented Apr 8, 2021

dschneiderch commented Apr 8, 2021

cgohlke commented Apr 9, 2021 • edited Loading

dschneiderch commented Apr 9, 2021

cgohlke commented Oct 10, 2021

cgohlke commented Apr 9, 2021 •

edited

Loading