Skip to content

Discussion: dtype system and integrating record types #254

Closed
@aldanor

Description

@aldanor

I've been looking at how record types can be integrated in rust-numpy and here's an unsorted collection of thoughts for discussion.

Let's look at Element:

pub unsafe trait Element: Clone + Send {
    const DATA_TYPE: DataType;
    fn is_same_type(dtype: &PyArrayDescr) -> bool;
    fn npy_type() -> NPY_TYPES { ... }
    fn get_dtype(py: Python) -> &PyArrayDescr { ... }
}
  • npy_type() is used in PyArray::new() and the like. Instead, one should use PyArray_NewFromDescr() to make use of the custom descriptor. Should all places where npy_type() is used split between "simple type, use New" and "user type, use NewFromDescr"? Or, alternatively, should arrays always be constructed from descriptor? (in which case, npy_type() becomes redundant and should be removed)
  • Why is same_type() needed at all? It is only used in FromPyObject::extract where one could simply use PyArray_EquivTypes (like it's done in pybind11). Isn't it largely redundant? (or does it exist for optimization purposes? In which case, is it even noticeable performance-wise?)
  • DATA_TYPE constant is really only used to check if it's an object or not in 2 places, like this:
    if T::DATA_TYPE != DataType::Object
    Isn't this redundant as well? Given that one can always do
    T::get_dtype().get_datatype() != Some(DataType::Object)
    // or, can add something like: T::get_dtype().is_object()
  • With all the notes above, Element essentially is just
     pub unsafe trait Element: Clone + Send {
         fn get_dtype(py: Python) -> &PyArrayDescr;
     }
  • For structured types, do we want to stick the type descriptor into DataType? E.g.:
    enum DataType { ..., Record(RecordType) }
    Or, alternatively, just keep it as DataType::Void? In which case, how does one recover record type descriptor? (it can always be done through numpy C API of course, via PyArrayDescr).
  • In order to enable user-defined record dtypes, having to return &PyArrayDescr would probably require:
    • Maintaining a global static thread-safe registry of registered dtypes (kind of like it's done in pybind11)
    • Initializing this registry somewhere
    • Any other options?
  • Element should probably be implemented for tuples and fixed-size arrays.
  • In order to implement structured dtypes, we'll inevitably have to resort to proc-macros. A few random thoughts and examples of how it can be done (any suggestions?):
    • #[numpy(record)]
      #[derive(Clone, Copy)]
      #[repr(packed)]
      struct Foo { x: i32, u: Bar } // where Bar is a registered numpy dtype as well
      // dtype = [('x', '<i4'), ('u', ...)]
    • We probably have to require either of #[repr(C)], #[repr(packed)] or #[repr(transparent)]
    • If repr is required, it can be an argument of the macro, e.g. #[numpy(record, repr = "C")]. (or not)
    • We also have to require Copy? (or not? technically, you could have object-type fields inside)
    • For wrapper types, we can allow something like this:
    • #[numpy(transparent)]
      #[repr(transparent)]
      struct Wrapper(pub i32);
      // dtype = '<i4'
    • For object types, the current suggestion in the docs is to implement a wrapper type and then impl Element for it manually. This seems largely redundant, given that the DATA_TYPE will always be Object. It would be nice if any #[pyclass]-wrapped types could automatically implement Element, but it would be impossible due to orphan rule. An alternative would be something like this:
      #[pyclass]
      #[numpy] // i.e., #[numpy(object)]
      struct Foo {}
    • How does one register dtypes for foreign (remote) types? I.e., OrderedFloat<f32> or Wrapping<u64> or some PyClassFromOtherCrate? We can try doing something like what serde does for remote types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions