Skip to content

Ensure argument validation is being done for the Tensor APIs #102267

Closed
@tannergooding

Description

@tannergooding

Work is being done to introduce a Tensor<T> and supporting types. As part of that, there are many APIs which are taking complex user inputs which represent the general shape and iteration mechanics for the underlying memory wrapped by a tensor.

There are a number of edge cases that need to be accounted for, such as lengths that when multiplied out will overflow the underlying contiguous length computed which are not currently protected against.

Additionally, there are a wide variety of "safe" (as in will not cause an IndexOutOfRange or AccessViolation exception) ways to specify the shape and iteration mechanics, there are many inputs that are likely representative of user error and which the public API points should check for and prevent.

A simple example of the latter is that given a T[] with length == 2073600 and lengths: [1080, 1920], the natural stride amount should be [1920, 1]. This means there are columns are 1 apart and rows are 1920 apart, which results in 2073600 total elements, no skipped elements, and everything being valid. But if a user got the order wrong and did [1, 1920] then we'd have columns 1920 apart which means not every column can be indexed within the bounds of the underlying array and would start accessing out of bounds memory. Another scenario is [1920, 2] where it's saying columns are 2 elements apart, but rows are 1920 apart. While this will never AV, it means that it's not uniquely representing indices and so various [y, x] will overlap with another [a, b]. This is a likely user error and something we should be protecting against.

-- There may be cases where such odd layouts are beneficial to the end user, but they can lead to all kinds of non-deterministic and in general undefined behavior, so if we were to support them it should only be via some TensorMarshal.CreateTensorSpan API that is explicitly hidden away and unsafe.

-- It's worth noting there is a specialization of "implicit broadcast" where a stride of 0 does represent overlapping memory. This is purely an optimization for read only views of data and can be trivially detected, documented, and accounted for. This is safe/sensible and our APIs can robustly handle it to ensure there isn't non-deterministic behavior being encountered (such as if an implicit broadcast span were passed into AddInPlace). It should be considered distinct from the problem described above where the user likely made an error in describing the tensor strides

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions