Skip to content

Numpy arrays ctors and flags rework #338

Open
@aldanor

Description

@aldanor

@wjakob This is a discussion / brainstorming issue for flags-related stuff in numpy api. Here's an unordered collection of my thoughts about it resulting from digging around numpy/pybind11 source, please feel free to comment:

  • We currently have NPY_ARRAY_FORCECAST by default which is very bad. NumPy will then happily convert anything to anything even if it it's complete bollocks (this triggers unsafe casting mode), which doesn't play well for either input arguments or return values and quite often yields surprising results.
  • I would actually advocate for removing forcecast option completely as it doesn't make much sense and is contradictory. You can only sensibly use it for strongly typed array_t<T, array::forcecast>, which on the one hand implies that you actually do want T, but on the other it will almost completely disregard the array's dtype because of forcecast. If you want this type of behaviour, you can always accept just an array and then do .astype() (see below)-- which would be a lot more precise because you can specify casting rules. I can't think of a single legitimate example where you would use forcecast either for input arguments or return values -- if you can think of any, I'm all ears :)
  • The default flags combination in NumPy is NPY_ARRAY_DEFAULT which is comprised of:
    • NPY_ARRAY_C_CONTIGUOUS
    • NPY_ARRAY_WRITEABLE
    • NPY_ARRAY_ALIGNED (this in particular is a very sensible default)
  • NumPy also defines NPY_ARRAY_OUT_ARRAY which is the same as NPY_ARRAY_DEFAULT, and NPY_ARRAY_IN_ARRAY which is the same thing but without the "writeable" bit. If you think about it, most of the times the input arguments should not require writeability unless the purpose is to mutate them (dropping the writeable flag from requirements would avoid having numpy to make an unneeded copy in some cases). It would be nice to be able to easily specify that.
  • Constructor of array_t calls PyArray_FromAny, which is a universal conversion function "from anything". While its nice on its own and it would be beneficial to expose it separately (e.g. a hypothetical ::from_object() static method), I believe it shouldn't be called in the ctor. Instead, it should check that the object is already an array (PyArray_Check) and then call array conversion routine (PyArray_FromArray) which also benefits from checking the casting rules (only two available here: safe / force, but that should be sufficient for ctor purposes).
  • It would be also nice to have an ::astype(dtype, casting = safe) -> array method on the array and also ::astype<T>(casting = safe) -> array_t<T> method (the flags should be preserved from the caller). Here we can accept all 5 casting types (e.g. array::casting::same_kind).
  • It's currently impossible to specify flags for array whereas it may sometimes be beneficial (at least controlling the writeability). Obviously, forcecast flag doesn't apply here, but numpy handles redundant flags the same way, some routines ignore some flags. This would mean that the ctor of array would be almost the same of that for array_t, calling PyArray_CheckAny and then PyArray_FromArray.
  • Would it make sense to add an ensurecopy flag? When would it be used? (in light of the coming changes that would allow to provide an owner for the array so data is not copied)
  • API-wise, it would be nice to be able to easily specify input/output arrays without fussing with type parameters. Not sure about it at all yet, but maybe some type aliases like array::in(...) or array_t<int>::out(...) or array_t<int, array::forcecast>::out that would set the proper set of flags (with array_t() being essentially the same as array_t::out). Most of the time just these two sets of flags would be used, I believe (input arguments and output values, respectively).
  • Code-wise, this shouldn't be a huge change; all of this is likely to be contained just within the ctors of array and array_t, plus maybe the aliases and a few new methods as described above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions