Skip to content

QOL suggestion for dpctl.tensor.usm_ndarray #1192

Closed
@fcharras

Description

@fcharras

It would be nice having an .offset attribute that gives (in bytes or in number of items with the given dtype) the position of the first address of a view in the underlying usm_data buffer. Or have usm_ndarray automatically register it from a view when using the buffer= param. Or is there already a practical way for this maybe ?

Example: I have a usecase (radix sorting) where I want to re-interpret a float32 buffer into a uint32 buffer, it's possible with:

import numpy as np
import dpctl.tensor as dpt

array = dpt.arange(10, dtype=np.float32)
array_uint32 = dpt.usm_ndarray(shape=(10,), dtype=np.uint32, buffer=array)

print(array)
print(array_uint32)

output:

[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[         0 1065353216 1073741824 1077936128 1082130432 1086324736 1088421888 1090519040 1091567616]

now there are cases where I'd like to only reinterpret part of the buffer:

sub_array = array[-2:]
sub_array_uint32_wrong = dpt.usm_ndarray(shape=(2,), dtype=np.uint32, buffer=sub_array)
print(sub_array_uint32_wrong)

but this doesn't work,

[         0 1065353216]

I get the first two values of the buffer rather than the two lasts, meaning that when passing buffer=sub_array it's not the buffer actually used by the view that is registered, but the whole base buffer given by sub_array.usm_data.

However the offset option can make it work:

sub_array_uint32_good = dpt.usm_ndarray(shape=(2,), dtype=np.uint32, buffer=sub_array, offset=8)
print(sub_array_uint32_good)
[1090519040 1091567616]

which is OK, but since I don't see a way to get the offset value from an usm_ndarray attribute, it means that the user code must maintain and pass through layers of code an additional offset quantity for views in cases where such conversion is needed later on.

It would be nice if either usm_ndarray can work from a view without the need for passing explicitly offset, or if usm_ndarrays could expose an offset attribute to make this possible:

sub_array_uint32_good = dpt.usm_ndarray(shape=(2,), dtype=np.uint32, buffer=sub_array, offset=sub_array.offset)

which of course currently fails with:

AttributeError: 'dpctl.tensor._usmarray.usm_ndarray' object has no attribute 'offset'

For numpy arrays, numpy offer this kind a tool with byte_bounds which is another nice way of exposing it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions