Description
Proposed new feature or change:
Motive
The default dtype of array is platform-dependant. ( #9464 )
When running tests in a continuous integration context, that are ran on multiple platforms (Windows, macOS, Linux), the fact that the default dtypes of arrays can vary must be taken into account.
The issue appears for tests relying on numpy's arrays representations. Indeed, the default dtype of the array is not displayed in the array representation. This means that an expected output representation is now dependant on the platform. Writing OS-specific tests is now unavoidable.
What I would like is being able to write platform independent repeatable outputs that can be used for automated testing.
Example
Actual
On my machine, the default dtype for integer arrays is int64
. Here are some examples of array creations and their representations:
In [3]: import numpy as np
In [4]: np.array([1, 2, 3])
Out[4]: array([1, 2, 3])
In [5]: np.array([1, 2, 3], dtype=np.int64)
Out[5]: array([1, 2, 3])
In [6]: np.array([1, 2, 3], dtype=np.int32)
Out[6]: array([1, 2, 3], dtype=int32)
We can see that:
- When creating an array with no dtype kwarg, the default dtype is used. The array representation solely is not enough to know the actual dtype.
- When creating an array with a dtype kwarg matching the default integer dtype of the platform, the resulting array representation is the same, and dtype is also implicit.
- The last case is the most explicit: the user provides the expected dtype, and the representation reflects that. This only works for non-default dtypes.
Desired
In [3]: import numpy as np
In [4]: np.array([1, 2, 3])
Out[4]: array([1, 2, 3], dtype=int64)
In [5]: np.array([1, 2, 3], dtype=np.int64)
Out[5]: array([1, 2, 3], dtype=int64)
In [6]: np.array([1, 2, 3], dtype=np.int32)
Out[6]: array([1, 2, 3], dtype=int32)
The dtype is always printed out, and the default dtype does not influence the representation. So, since the default dtype depends on the platform, and the representation depends on the dtype, the chain is broken and the representation does not depend anymore on the platform. Writing platform independant tests relying on representation is now easier.
from
platform <- default dtype <- repr
=> platform <- repr
to
platform <- default dtype </- repr
=> platform </- repr
Existing solutions I looked for
np.set_printoptions
I first looked into https://numpy.org/doc/stable/reference/generated/numpy.set_printoptions.html
I experimented with kwarg legacy='1.13'
and legacy='1.21
, without success. Also, even if I were successful, I would have dislike relying on a kwarg named legacy
, strongly implying it should not be used anymore in new code.
Proposed solution
Adding a new dtype
printing option
import numpy
np.set_printoptions(dtype="default") # current behaviour
np.set_printoptions(dtype="always") # always print dtype
np.set_printoptions(dtype="never") # never print dtype
Technical Analysis
The function _array_repr_implementation
implements the array representation logic. We can see the logic where it adds the suffix, and there is no way to force print the dtype, or force not printing it.
Allow to override this param could be helpful:
def _array_repr_implementation(
arr, max_line_width=None, precision=None, suppress_small=None,
+ skipdtype: bool | None = None,
array2string=array2string):
...
- skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0
+ if skipdtype is None:
+ skipdtype = dtype_is_implied(arr.dtype) and arr.size > 0
Role of the proposed new skipdtype
three-valued kwarg:
None
: current behaviour, platform-dependantFalse
: always print the, dtype=...
suffixTrue
: never print the, dtype=...
suffix