-
I've been on the fence for a while about whether I should be involved in this effort. I'm the lead developer of Awkward Array, which is an array library for complex data structures. Its interface was designed to be a strict generalization of a subset of NumPy's, so in that sense, I'm very much involved in array library APIs. On the other hand, permitting arrays like >>> import awkward1 as ak
>>> array = ak.Array([[{"x": 0.0, "y": []}, None, {"x": 1.1, "y": [1]}], [], [{"x": 2.2, "y": [1, 2]}]])
>>> ak.type(array) # using Datashape to describe non-rectilinear data types
3 * var * ?{"x": float64, "y": var * int64}
>>> ak.to_list(array)
[[{'x': 0.0, 'y': []}, None, {'x': 1.1, 'y': [1]}], [], [{'x': 2.2, 'y': [1, 2]}]] would break some of the standardization goals you're trying to achieve. For instance, these arrays can't have dtypes or shapes: >>> array.dtype
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward1/highlevel.py", line 1082, in __getattr__
raise AttributeError(
AttributeError: no field named 'dtype'
>>> array.shape
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/jpivarski/irishep/awkward-1.0/awkward1/highlevel.py", line 1082, in __getattr__
raise AttributeError(
AttributeError: no field named 'shape' However, the same would be true of Apache Arrow, and I see that they're represented in the list of members. >>> ak.to_arrow(array)
<pyarrow.lib.ListArray object at 0x7f7b1edbfca0>
[
-- is_valid:
[
true,
false,
true
]
-- child 0 type: double
[
0,
0,
1.1
]
-- child 1 type: list<item: int64>
[
[],
[],
[
1
]
],
-- is_valid: all not null
-- child 0 type: double
[]
-- child 1 type: list<item: int64>
[],
-- is_valid: all not null
-- child 0 type: double
[
2.2
]
-- child 1 type: list<item: int64>
[
[
1,
2
]
]
] Maybe the role that projects like Awkward Array and Apache Arrow can play would be to define subsets of array APIs that can apply to non-rectilinear arrays? In today's announcement, you also mention mutability; Awkward Array (and maybe Apache Arrow?) would fall into the immutable subset. Perhaps "allowing for irregularity" is another subset? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
Do awkward arrays have a shape and a dtype when they represent basic NumPy-like arrays? A big part of the standard is that it defines a minimal subset of APIs that are required for array libraries. But a lot of things are explicitly left out of the spec because the libraries are allowed to do different things there. For something like awkward, an option would be to make it so that "standard" arrays work as the spec says, but if the array is one of the more generalized types, then it doesn't have some of the properties like |
Beta Was this translation helpful? Give feedback.
-
Speaking from the position of writing the test suite, the (an aside: why does the library name for Awkward have a number in it? I thought that was a typo the first time I saw it, until I noticed it was like that everywhere) |
Beta Was this translation helpful? Give feedback.
-
Follow up on this topic at scikit-hep/ragged#6 |
Beta Was this translation helpful? Give feedback.
Speaking from the position of writing the test suite, the
dtype
andshape
attributes are needed to check if the library conforms to the type promotion rules and broadcasting rules, respectively. Being able to create arrays with thedtype
keyword argument and dtype literals likemod.ones(dtype=mod.float64)
is also important for testing purposes (I don't know if awkward has those).(an aside: why does the library name for Awkward have a number in it? I thought that was a typo the first time I saw it, until I noticed it was like that everywhere)