Replies: 9 comments 45 replies
-
Thanks for raising this (and so quickly) @jpivarski. I will read through it very slowly later on, feel free to ping me if I forget to respond later. cc'ing @jbusecke @mgrover1 as fellow scipy lunch participants, and those who indicated interest in the previous thread: @joshmoore, @milancurcic, @SimonHeybrock, + @shoyer for opinions. |
Beta Was this translation helpful? Give feedback.
-
@jpivarski Can you share some code to define |
Beta Was this translation helpful? Give feedback.
-
def shape(t):
if isinstance(t, ak.types.ArrayType):
return (t.length,) + shape(t.content)
elif isinstance(t, ak.types.RegularType):
return (t.size,) + shape(t.content)
elif isinstance(t, ak.types.ListType):
return (None,) + shape(t.content)
elif isinstance(t, ak.types.NumpyType):
return ()
else:
raise TypeError("oops, is_ragged_type would have returned False") and then >>> shape(some_data.type)
(3, None, None) and >>> regular_data = ak.Array(np.arange(2*3*5).reshape(2, 3, 5))
>>> shape(regular_data.type)
(2, 3, 5) |
Beta Was this translation helpful? Give feedback.
-
Not suggesting it (since I don't know your requirements), but maybe it is useful to compare to a similar but different approach, taken by Scipp, illustrated below:
Binned: Grouped: |
Beta Was this translation helpful? Give feedback.
-
It would be great to see support for ragged arrays. We use them all the time for ML on graphs. Wouldn't an easier approach be to build a ragged array class that is backed by multiple non-ragged xarrays? This is the approach that, for example, Tensorflow uses. RaggedTensors are composite arrays, each of which is backed by more than one normal tensor. Perhaps even a ragged "view" of all the ragged tensors derived dataset could be constructed automatically if the right metadata is stored in the attrs. By building on top of xarray this way, you'd avoid the challenge of navigating all the xarray internals. |
Beta Was this translation helpful? Give feedback.
-
The most recent Array Cast Episode seems to describe a RaggedArray type structure with named axes, implemented in Mathematica. https://www.arraycast.com/episodes/episode66-tali-beynon |
Beta Was this translation helpful? Give feedback.
-
Follow up on this topic at scikit-hep/ragged#6 |
Beta Was this translation helpful? Give feedback.
-
Seems like we should fallback to the older protocols when we can. Especially since this will get us |
Beta Was this translation helpful? Give feedback.
-
I'd probably generate a rectilinear array first, then convert to ragged, and finally truncate each sublist at a chosen length or to the last non-fillvalue. This makes the interpretation of the underlying choices somwhat more stable, which in turn improves performance in generating variations on known examples - and shrinking as a special case of that. We use a similar trick inside
More precisely, it's because "an array dimension must be scikit-hep/ragged#6 says:
If a future version of the Array API standard includes ragged arrays, we will of course aim to support it in Hypothesis. Until then, I think strategies for generating ragged arrays would be best-placed in the libraries defining those structures, or a third-party package (e.g. |
Beta Was this translation helpful? Give feedback.
-
From a conversation with @TomNicholas, @dcherian, and others at lunch today (and, ultimately, following up on #4285): we could perhaps get ragged arrays into xarray by
shape
to return typetuple[int | None]
(safer than interpreting-1
or some other number to mean "variable."To explain (3) above, here's an example of coordinates that work with
some_data
:The first dimension,
x
, looks familiar; it's a one-dimensional array with the same length assome_data
. The second dimension,y
, is an array of lists in which each list has the same length as the first-level lists insome_data
. The third dimension,z
, has the same-length lists of lists assome_data
.This structure allows for alignment between coordinates and the data for the same reason that Awkward broadcasting works, demonstrated with
+
below:This isn't a direct extension of aligning rectilinear arrays, but 1D array × 1D array × 1D array is inherently rectilinear. Whereas in rectilinear alignment, we might have a 1D array
z
likethis alignment is a generalization of using a
z
like@dcherian thought that xarray handles things in a sufficiently abstract way that we might be able to introduce this as a generalization. I'd like to see how that would go (and can provide ideas, implementations, and features on the Awkward end to help).
Also, go ahead and ping anyone here!
Beta Was this translation helpful? Give feedback.
All reactions