Skip to content

Conversation

@ctrueden
Copy link
Member

@ctrueden ctrueden commented May 7, 2013

This PR is the same as #217, but for the dev_4_4 branch.

ctrueden added 8 commits May 7, 2013 09:57
This avoids a spurious "Content is not allowed in prolog" error that
Java normally emits when attempting to parse an XML document with UTF-8
encoding that includes a BOM:

http://www.rgagnon.com/javadetails/java-handle-utf8-file-with-bom.html
The usage of parallel arrays makes it very difficult to guarantee robust
behavior in the case of missing and malformed data. The code is also
harder than necessary to understand. The new Prairie metadata structure
provides a "1st class citizen" for accessing Prairie-specific
information. It will also ease later migration to the SCIFIO structure.
This change utilizes the new PrairieMetadata data structure, improving
on the behavior of the previous reader implementation as follows:

1. PrairieView is capable of acquiring multiple stage positions (i.e.,
   multiple Images), but does not differentiate them in the metadata in
   any way. Rather, all time points at all positions are simply numbered
   sequentially as "cycles" and conflated.

   However, we can disentangle stage positions from time points by
   comparing stage positions. If we notice a repeating pattern of stage
   positions, we record them as multiple series in the core metadata.

   PrairieView acquires planes for all positions at a given time point
   before moving on to the next time point. Hence, the rasterization
   order is always P faster, T slower; e.g.:

     P1T1, P2T1, P3T1, P1T2, P2T2, P3T2

2. PrairieView numbers each Sequence (i.e., stage position / time point
   pair) with a "cycle" value, each Frame (i.e., focal plane of a
   Sequence) with an "index" value, and each File (i.e., channel of a
   Frame) with a "channel" value. These values are 1-based.

   In our experience, the XML is always recorded in sequential order.
   However, the new implementation does not assume that will always be
   the case. Rather, it attempts to gracefully handle missing Sequences,
   Frames and Files, using the cycle, index and channel metadata as the
   canonical definition of how each TIFF file fits into the dataset. One
   useful consequence of this flexibility is that the reader now
   supports partial Prairie datasets: if the run is interrupted before
   completion, PrairieView produces a partial dataset, with no XML
   elements or TIFF files beyond the point of interruption. This is
   useful if, for example, the desired phenomenon or occurrence is
   observed taking place before the acquisition is complete; there may
   be no reason to continue the run after that point.
If linesPerFrame or pixelsPerLine is missing, we fall back to using the
first TIFF file's Y or X size, respectively.
This method returns the list of Sequences ordered by key. It will be
useful to handle cases where the cycle numbering does not increment one
at a time (e.g., for metadata with cycle=2,5,8,11,... or even for
non-linear increment patterns).
PrairieView is capable of producing data where the cycle numbering does
not increment one at a time (e.g., for metadata with
cycle=2,5,8,11,...). We have not yet observed any datasets with
variable, non-linear increment patterns (e.g., cycle=2,4,5,11,12,...),
but this new approach should handle that too.

The new approach works by explicitly creating and sorting a list of
existing sequences, then using that list as the basis for sizeT & sizeP,
rather than assuming that Sequence#getCycleCount() will be sensible.
In the case where a Sequence is flagged as a TSeries, the Frames of that
Sequence must be treated as time points rather than focal planes. This
logic was previously not fully propagated through the code.
@ctrueden
Copy link
Member Author

ctrueden commented May 7, 2013

Sadly, this branch does not currently build. Investigating...

ctrueden added 12 commits May 7, 2013 10:25
Thanks to Melissa Linkert for noticing.
Thanks to Melissa Linkert for pointing this out.
I somehow missed this before, so it was always null.
This is required to properly handle cases where some channels are active
and others are not, as defined in the CFG metadata. For example, we have
datasets with "*_Ch1_*.tif" and "*_Ch3_*.tif" planes acquired, but no
"*_Ch2_*.tif" files.
We no longer care about which channel min/maxes exist per frame, as we
rely totally on the CFG metadata to determine the desired channels.

To be absolutely sure that all of our sample Prairie datasets have
channel metadata given in CFG, I did a quick check:

    find . -name '*.cfg' -print0 | xargs -0 grep -L channel_

And indeed they do. If we come across a dataset in the future with this
information missing, it would be pretty easy to synthesize based on the
per-frame channel min/maxes again, but until then, we don't need it.
More specifically, we ignore the X stage position inversion flag,
because our sample data does not appear to respect it anyway.
@ctrueden
Copy link
Member Author

ctrueden commented May 7, 2013

Build errors fixed, and the code is working in my manual tests (using Fiji).

@ghost
Copy link

ghost commented May 8, 2013

--test prairie

melissalinkert added a commit that referenced this pull request May 9, 2013
@melissalinkert melissalinkert merged commit d1ff996 into ome:dev_4_4 May 9, 2013
hflynn pushed a commit to hflynn/bioformats that referenced this pull request Oct 11, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants