Closed
Description
I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.
Let's look at Element
:
pub unsafe trait Element: Clone + Send {
const DATA_TYPE: DataType;
fn is_same_type(dtype: &PyArrayDescr) -> bool;
fn npy_type() -> NPY_TYPES { ... }
fn get_dtype(py: Python) -> &PyArrayDescr { ... }
}
npy_type()
is used inPyArray::new()
and the like. Instead, one should usePyArray_NewFromDescr()
to make use of the custom descriptor. Should all places wherenpy_type()
is used split between "simple type, useNew
" and "user type, useNewFromDescr
"? Or, alternatively, should arrays always be constructed from descriptor? (in which case,npy_type()
becomes redundant and should be removed)- Why is
same_type()
needed at all? It is only used inFromPyObject::extract
where one could simply usePyArray_EquivTypes
(like it's done in pybind11). Isn't it largely redundant? (or does it exist for optimization purposes? In which case, is it even noticeable performance-wise?) DATA_TYPE
constant is really only used to check if it's an object or not in 2 places, like this:Isn't this redundant as well? Given that one can always doif T::DATA_TYPE != DataType::Object
T::get_dtype().get_datatype() != Some(DataType::Object) // or, can add something like: T::get_dtype().is_object()
- With all the notes above,
Element
essentially is justpub unsafe trait Element: Clone + Send { fn get_dtype(py: Python) -> &PyArrayDescr; }
- For structured types, do we want to stick the type descriptor into
DataType
? E.g.:Or, alternatively, just keep it asenum DataType { ..., Record(RecordType) }
DataType::Void
? In which case, how does one recover record type descriptor? (it can always be done through numpy C API of course, viaPyArrayDescr
). - In order to enable user-defined record dtypes, having to return
&PyArrayDescr
would probably require:- Maintaining a global static thread-safe registry of registered dtypes (kind of like it's done in pybind11)
- Initializing this registry somewhere
- Any other options?
Element
should probably be implemented for tuples and fixed-size arrays.- In order to implement structured dtypes, we'll inevitably have to resort to proc-macros. A few random thoughts and examples of how it can be done (any suggestions?):
-
#[numpy(record)] #[derive(Clone, Copy)] #[repr(packed)] struct Foo { x: i32, u: Bar } // where Bar is a registered numpy dtype as well // dtype = [('x', '<i4'), ('u', ...)]
- We probably have to require either of
#[repr(C)]
,#[repr(packed)]
or#[repr(transparent)]
- If repr is required, it can be an argument of the macro, e.g.
#[numpy(record, repr = "C")]
. (or not) - We also have to require
Copy
? (or not? technically, you could have object-type fields inside) - For wrapper types, we can allow something like this:
-
#[numpy(transparent)] #[repr(transparent)] struct Wrapper(pub i32); // dtype = '<i4'
- For object types, the current suggestion in the docs is to implement a wrapper type and then impl
Element
for it manually. This seems largely redundant, given that theDATA_TYPE
will always beObject
. It would be nice if any#[pyclass]
-wrapped types could automatically implementElement
, but it would be impossible due to orphan rule. An alternative would be something like this:#[pyclass] #[numpy] // i.e., #[numpy(object)] struct Foo {}
- How does one register dtypes for foreign (remote) types? I.e.,
OrderedFloat<f32>
orWrapping<u64>
or somePyClassFromOtherCrate
? We can try doing something like what serde does for remote types.
-
Metadata
Metadata
Assignees
Labels
No labels