Prairie fixes #508

ctrueden · 2013-05-07T15:03:45Z

This PR is the same as #217, but for the dev_4_4 branch.

This avoids a spurious "Content is not allowed in prolog" error that Java normally emits when attempting to parse an XML document with UTF-8 encoding that includes a BOM: http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html

The usage of parallel arrays makes it very difficult to guarantee robust behavior in the case of missing and malformed data. The code is also harder than necessary to understand. The new Prairie metadata structure provides a "1st class citizen" for accessing Prairie-specific information. It will also ease later migration to the SCIFIO structure.

This change utilizes the new PrairieMetadata data structure, improving on the behavior of the previous reader implementation as follows: 1. PrairieView is capable of acquiring multiple stage positions (i.e., multiple Images), but does not differentiate them in the metadata in any way. Rather, all time points at all positions are simply numbered sequentially as "cycles" and conflated. However, we can disentangle stage positions from time points by comparing stage positions. If we notice a repeating pattern of stage positions, we record them as multiple series in the core metadata. PrairieView acquires planes for all positions at a given time point before moving on to the next time point. Hence, the rasterization order is always P faster, T slower; e.g.: P1T1, P2T1, P3T1, P1T2, P2T2, P3T2 2. PrairieView numbers each Sequence (i.e., stage position / time point pair) with a "cycle" value, each Frame (i.e., focal plane of a Sequence) with an "index" value, and each File (i.e., channel of a Frame) with a "channel" value. These values are 1-based. In our experience, the XML is always recorded in sequential order. However, the new implementation does not assume that will always be the case. Rather, it attempts to gracefully handle missing Sequences, Frames and Files, using the cycle, index and channel metadata as the canonical definition of how each TIFF file fits into the dataset. One useful consequence of this flexibility is that the reader now supports partial Prairie datasets: if the run is interrupted before completion, PrairieView produces a partial dataset, with no XML elements or TIFF files beyond the point of interruption. This is useful if, for example, the desired phenomenon or occurrence is observed taking place before the acquisition is complete; there may be no reason to continue the run after that point.

If linesPerFrame or pixelsPerLine is missing, we fall back to using the first TIFF file's Y or X size, respectively.

This method returns the list of Sequences ordered by key. It will be useful to handle cases where the cycle numbering does not increment one at a time (e.g., for metadata with cycle=2,5,8,11,... or even for non-linear increment patterns).

PrairieView is capable of producing data where the cycle numbering does not increment one at a time (e.g., for metadata with cycle=2,5,8,11,...). We have not yet observed any datasets with variable, non-linear increment patterns (e.g., cycle=2,4,5,11,12,...), but this new approach should handle that too. The new approach works by explicitly creating and sorting a list of existing sequences, then using that list as the basis for sizeT & sizeP, rather than assuming that Sequence#getCycleCount() will be sensible.

In the case where a Sequence is flagged as a TSeries, the Frames of that Sequence must be treated as time points rather than focal planes. This logic was previously not fully propagated through the code.

ctrueden · 2013-05-07T15:21:18Z

Sadly, this branch does not currently build. Investigating...

Thanks to Melissa Linkert for noticing.

Thanks to Melissa Linkert for pointing this out.

I somehow missed this before, so it was always null.

This is required to properly handle cases where some channels are active and others are not, as defined in the CFG metadata. For example, we have datasets with "*_Ch1_*.tif" and "*_Ch3_*.tif" planes acquired, but no "*_Ch2_*.tif" files.

We no longer care about which channel min/maxes exist per frame, as we rely totally on the CFG metadata to determine the desired channels. To be absolutely sure that all of our sample Prairie datasets have channel metadata given in CFG, I did a quick check: find . -name '*.cfg' -print0 | xargs -0 grep -L channel_ And indeed they do. If we come across a dataset in the future with this information missing, it would be pretty easy to synthesize based on the per-frame channel min/maxes again, but until then, we don't need it.

More specifically, we ignore the X stage position inversion flag, because our sample data does not appear to respect it anyway.

ctrueden · 2013-05-07T16:56:51Z

Build errors fixed, and the code is working in my manual tests (using Fiji).

ghost · 2013-05-08T16:46:31Z

--test prairie

Prairie fixes

Ldap filters (rebased onto dev_4_4)

ctrueden added 8 commits May 7, 2013 09:57

PrairieReader: avoid NPE from missing attributes

17748f0

If linesPerFrame or pixelsPerLine is missing, we fall back to using the first TIFF file's Y or X size, respectively.

PrairieReader: fix bug in logging statement

291a680

PrairieReader: Add PrairieMetadata#getSequences

23584f8

This method returns the list of Sequences ordered by key. It will be useful to handle cases where the cycle numbering does not increment one at a time (e.g., for metadata with cycle=2,5,8,11,... or even for non-linear increment patterns).

PrairieReader: fix handling of inverted Z/T

f9cfcd8

In the case where a Sequence is flagged as a TSeries, the Frames of that Sequence must be treated as time points rather than focal planes. This logic was previously not fully propagated through the code.

ctrueden added 12 commits May 7, 2013 10:25

PrairieReader: use Boolean.parseBoolean

c73ee3b

PrairieReader: use Locations instead of Files

87b6f50

PrairieReader: reset singleTiffMode when closing

cb27318

PrairieReader: fix bugs in sequence indexing

ac3f2af

Thanks to Melissa Linkert for noticing.

PrairieReader: always populate AcquisitionDate

746b475

Thanks to Melissa Linkert for pointing this out.

PrairieReader: parse acquisition date

b331f72

I somehow missed this before, so it was always null.

PrairieReader: add missing final keywords

aff3bf7

PrairieReader: parse active channels from CFG file

5da0574

PrairieReader: use active channels metadata

b30391e

This is required to properly handle cases where some channels are active and others are not, as defined in the CFG metadata. For example, we have datasets with "*_Ch1_*.tif" and "*_Ch3_*.tif" planes acquired, but no "*_Ch2_*.tif" files.

PrairieReader: add a hack for X position inversion

93f28f6

More specifically, we ignore the X stage position inversion flag, because our sample data does not appear to respect it anyway.

PrairieReader: fix imports for dev_4_4

3a6d64b

melissalinkert added a commit that referenced this pull request May 9, 2013

Merge pull request #508 from ctrueden/prairie-fixes-dev_4_4

d1ff996

Prairie fixes

melissalinkert merged commit d1ff996 into ome:dev_4_4 May 9, 2013

hflynn pushed a commit to hflynn/bioformats that referenced this pull request Oct 11, 2013

Merge pull request ome#508 from hflynn/rebased/dev_4_4/ldap-filters

b6e4851

Ldap filters (rebased onto dev_4_4)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prairie fixes #508

Prairie fixes #508

Uh oh!

ctrueden commented May 7, 2013

Uh oh!

ctrueden commented May 7, 2013

Uh oh!

ctrueden commented May 7, 2013

Uh oh!

ghost commented May 8, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Prairie fixes #508

Prairie fixes #508

Uh oh!

Conversation

ctrueden commented May 7, 2013

Uh oh!

ctrueden commented May 7, 2013

Uh oh!

ctrueden commented May 7, 2013

Uh oh!

ghost commented May 8, 2013

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants