Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiffsequence with pattern and axesorder #76

Closed
dschneiderch opened this issue Apr 8, 2021 · 5 comments
Closed

tiffsequence with pattern and axesorder #76

dschneiderch opened this issue Apr 8, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@dschneiderch
Copy link

this seems like its getting off topic from the original thread but i wanted to ask more about higher dimension arrays

If you are interested in organizing your files into a higher dimensional zarr array, TiffSequence takes an optional regular expression pattern that matches axes and sequence indices in the file names. That can quite complicated:

tifffile/tests/test_tifffile.py

Lines 12686 to 12729 in 581d7a5

@pytest.mark.skipif(SKIP_PRIVATE or SKIP_LARGE or SKIP_CODECS, reason=REASON)
def test_sequence_wells_axesorder():
"""Test FileSequence with well plates and axes reorder."""
ptrn = r'(?:_(z)_(\d+)).*_(?P<p>[a-z])(?P<a>\d+)(?:_(s)(\d))(?:_(w)(\d))'
fnames = private_file('BBBC/BBBC006_v1_images_z_00/*.tif')
fnames += private_file('BBBC/BBBC006_v1_images_z_01/*.tif')
tifs = TiffSequence(fnames, pattern=ptrn, axesorder=(1, 2, 0, 3, 4))
assert len(tifs) == 3072
assert tifs.shape == (16, 24, 2, 2, 2)
assert tifs.axes == 'PAZSW'
data = tifs.asarray()
assert isinstance(data, numpy.ndarray)
assert data.flags['C_CONTIGUOUS']
assert data.shape == (16, 24, 2, 2, 2, 520, 696)
assert data.dtype == 'uint16'
assert data[8, 12, 1, 0, 1, 256, 519] == 1579
if not SKIP_ZARR:
with tifs.aszarr() as store:
assert_array_equal(data, zarr.open(store, mode='r'))
@pytest.mark.skipif(SKIP_PRIVATE or SKIP_LARGE, reason=REASON)
def test_sequence_tiled():
"""Test FileSequence with tiled OME-TIFFs."""
# Dataset from https://github.com/tlambert03/tifffolder/issues/2
ptrn = re.compile(
r'\[(?P<U>\d+) x (?P<V>\d+)\].*(C)(\d+).*(Z)(\d+)', re.IGNORECASE
)
fnames = private_file('TiffSequenceTiled/*.tif', expand=False)
tifs = TiffSequence(fnames, pattern=ptrn)
assert len(tifs) == 60
assert tifs.shape == (2, 3, 2, 5)
assert tifs.axes == 'UVCZ'
data = tifs.asarray(is_ome=False)
assert isinstance(data, numpy.ndarray)
assert data.flags['C_CONTIGUOUS']
assert data.shape == (2, 3, 2, 5, 2560, 2160)
assert data.dtype == 'uint16'
assert data[1, 2, 1, 3, 1024, 1024] == 596
if not SKIP_ZARR:
with tifs.aszarr(is_ome=False) as store:
assert_array_equal(
data[1, 2, 1, 3:5], zarr.open(store, mode='r')[1, 2, 1, 3:5]
)

This one works:

pattern = r'(.{2})-(.+)-\d{8}T\d{6}-PSII0-(\d)'
pngs = tifffile.TiffSequence('data/psII/dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern)

can you clarify why this doesnt work? it doesn't like the lack of the 2nd group even though that regex works too at regex101.

pattern = r'(.{2})-.+-\d{8}T\d{6}-PSII0-(\d)'
pngs = tifffile.TiffSequence('data/psII/dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern)

FileSequence: failed to parse file names (axes do not match within image sequence)

Also, based on your example I thought I could give an axes per group in the regex. is that not correct?

pattern = r'(.{2})-(.+)-\d{8}T\d{6}-PSII0-(\d)'
pngs = tifffile.TiffSequence('data/psII/dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern, axesorder=(2,1,0))

gives IndexError: list index out of range

my files are:

A1-doi-20200531T210155-PSII0-1.png
B1-doi-20200531T210155-PSII0-2.png
@cgohlke
Copy link
Owner

cgohlke commented Apr 8, 2021

Can you explain what shape and axes you are expecting, at which indices the files should be, and where in the file name the indices are encoded? Axes labels are single letters, indices are integers.

@dschneiderch
Copy link
Author

For every group I expected an extra dimension, even if length one.
pattern = r'(.{2})-(.+)-\d{8}T\d{6}-PSII0-(\d)'
would give shape (2,1,2,640,480)
sampleid = A1, B1
experiment = 'doi'
frameid = 1, 2
image dimensions are 640x480

and in this particular example, I would expect an array of NA for the case where sampleid=B1, frameid=1 and sampleid=A1, frameid=2

Sorry, I don't understand axes labels vs indices and it sounds like I'm missing something fundamental here.

@cgohlke
Copy link
Owner

cgohlke commented Apr 9, 2021

Thank you. I understand now. The current implementation does not handle categories (like A1, B1) but requires indices in form of numbers or characters (which can be converted to numbers) for each dimension. You can probably work around this if your categories all have distinct characters at certain positions, e.g.:

import tifffile
import imagecodecs

pattern = r'(?P<A>[A-Z])\d-(?P<B>d)oi-\d{8}T\d{6}-PSII0-(?P<C>\d)'

with tifffile.TiffSequence(
    'dataset-A1-20200531/*.png', imread=imagecodecs.imread, pattern=pattern
) as pngs:
    print(pngs)
    print(pngs.asarray().shape)

Output:

FileSequence: files are missing. Missing data are zeroed
TiffSequence
 A1-doi-20200531T210155-PSII0-1.png
 files: 2
 shape: 2, 1, 2
 axes: ABC
(2, 1, 2, 200, 200)

You might be better off with a DataFrame or database than with a numpy or zarr array to handle your data. Check if similar tools like PIMS can handle categories...

@dschneiderch
Copy link
Author

Ok, Thanks. I did not realize axes could only be individual letters.

Thanks for the alternative suggestions. I was trying to avoid rolling my own solution and zarr seemed like a potential solution but it seems I am pushing this beyond where it is. I will check out PIMS too.

@cgohlke cgohlke added the enhancement New feature or request label Jun 10, 2021
@cgohlke
Copy link
Owner

cgohlke commented Oct 10, 2021

This can be done with v2021.10.10:

import tifffile
import imagecodecs

with tifffile.FileSequence(
    imagecodecs.imread,
    'dataset-A1-20200531/*.png',
    pattern=(
        r'(?P<sampleid>.{2})-'
        r'(?P<experiment>.+)-\d{8}T\d{6}-PSII0-'
        r'(?P<frameid>\d)'
    ),
    categories={'sampleid': {'A1': 0, 'B1': 1}, 'experiment': {'doi': 0}},
) as pngs:
    print(pngs)
    print(pngs.asarray().shape)
FileSequence
 A1-doi-20200531T210155-PSII0-1.png
 files: 2 (2 missing)
 shape: 2, 1, 2
 labels: sampleid, experiment, frameid
(2, 1, 2, 200, 200)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants