Skip to content

Added support for argument axis with value None for dpctl.tensor.concat(). #1125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 17, 2023

Conversation

npolina4
Copy link
Contributor

@npolina4 npolina4 commented Mar 15, 2023

Closes #1122

  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • If this PR is a work in progress, are you filing the PR as a draft?

@github-actions
Copy link

@coveralls
Copy link
Collaborator

coveralls commented Mar 15, 2023

Coverage Status

Coverage: 82.516% (+0.09%) from 82.427% when pulling cc7f714 on fix_gh_1122 into 8f828f2 on master.

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.2=py310h76be34b_41 ran successfully.
Passed: 46
Failed: 788
Skipped: 280

@oleksandr-pavlyk
Copy link
Contributor

For readability sake it may be better to separate implementation of axis=None case into a separate function, and in concat simply do, at the beginning of the function:

def concat(arrays, axis=0):
    "docstring"
    if axis is None:
        return _concat_axis_None(arrays)

    # code to handle integral axis here

In principle, the implementation of axis=None should be using composition of _copy_usm_ndarray_into_usm_ndarray and reshape, since reshape itself may need to perform a copy. For example:


In [2]: x0 = dpt.ones((3,4,), dtype="i4")[:, :2]

In [3]: with dpctl.utils.onetrace_enabled():
   ...:     y = dpt.reshape(x0, -1)
   ...:
Device Timeline (queue: 0x55b0214962d0): zeCommandListAppendMemoryCopy(H2D)<2.1> [ns] = 43370460122 (append) 43375682898 (submit) 43376338418 (start) 43376351126 (end)
Device Timeline (queue: 0x55b0214962d0): dpctl::tensor::kernels::copy_and_cast::copy_for_reshape_generic_kernel<int><3.1> [ns] = 43531609850 (append) 43531947296 (submit) 43532798910 (start) 43532857972 (end)

I think _concat_axis_None could be rewritten to use _copy_usm_ndarray_for_reshape if array is not C-contiguous. For C-contiguous arrays, a simple memcopy would do, hence _copy_usm_ndarray_into_usm_ndarray is appropriate.

Hence, I suggest to implement the _conca_axis_None as follows:

def _concat_axis_None_alt(arrays):
    "Implementation of concat(arrays, axis=None)"
    res_dtype, res_usm_type, exec_q = _arrays_validation(
        arrays, check_ndim=False
    )
    res_shape = 0
    for array in arrays:
        res_shape += array.size
    res = dpt.empty(
        res_shape, dtype=res_dtype, usm_type=res_usm_type, sycl_queue=exec_q
    )

    hev_list = []
    fill_start = 0
    for array in arrays:
        fill_end = fill_start + array.size
        if array.flags.c_contiguous:
            hev, _ = ti._copy_usm_ndarray_into_usm_ndarray(
                src=dpt.reshape(array, -1),
                dst=res[fill_start:fill_end],
                sycl_queue=exec_q,
            )
        else:
            hev, _ = ti._copy_usm_ndarray_for_reshape(
                src=array,
                dst=res[fill_start:fill_end],
                shift=0,
                sycl_queue=exec_q,
            )
        fill_start = fill_end
        hev_list.append(hev)

    dpctl.SyclEvent.wait_for(hev_list)
    return res

This offers some tangible performance gains:

In [1]: import dpctl.tensor as dpt, dpctl.utils

In [2]: import dpctl.tensor._manipulation_functions as mf

In [3]: x0 = dpt.ones((10**6,4,), dtype="i4")[:, :2]

In [4]: x1 = dpt.full((171,2*171,171,), 2, dtype="i4")[:,::2]

In [5]: x2 = dpt.full((171,171,171,), 2, dtype="i4")

In [6]: %time y0 = mf._concat_axis_None((x0,x1,))     # original implementation,  includes cost of JITting
CPU times: user 376 ms, sys: 73.9 ms, total: 450 ms
Wall time: 466 ms

In [7]: %time y0 = mf._concat_axis_None((x0,x1,))
CPU times: user 89.1 ms, sys: 99.6 ms, total: 189 ms
Wall time: 196 ms

In [8]: %time y0 = mf._concat_axis_None((x0,x1,))
CPU times: user 86.5 ms, sys: 107 ms, total: 194 ms
Wall time: 198 ms

In [9]: %time y0 = mf._concat_axis_None((x0,x2,))
CPU times: user 32.6 ms, sys: 32.1 ms, total: 64.7 ms
Wall time: 67.3 ms

In [10]: %time y0 = mf._concat_axis_None((x0,x2,))
CPU times: user 41.6 ms, sys: 21.1 ms, total: 62.7 ms
Wall time: 64.3 ms

In [11]: %time y0 = mf._concat_axis_None_alt((x0,x1,))   # proposed implementation,  may include cost of JITting
CPU times: user 73.6 ms, sys: 113 ms, total: 187 ms
Wall time: 186 ms

In [12]: %time y0 = mf._concat_axis_None_alt((x0,x1,))
CPU times: user 75.5 ms, sys: 116 ms, total: 191 ms
Wall time: 190 ms

In [13]: %time y0 = mf._concat_axis_None_alt((x0,x2,))
CPU times: user 11.2 ms, sys: 51 ms, total: 62.2 ms
Wall time: 61.9 ms

Copy link
Contributor

@oleksandr-pavlyk oleksandr-pavlyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please apply changes as outlined in the earlier comment, and also please add a test to covert concat(..., axis=None) for both C-contiguous and strided inputs.

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.2=py310h76be34b_51 ran successfully.
Passed: 46
Failed: 788
Skipped: 280

Copy link
Contributor

@oleksandr-pavlyk oleksandr-pavlyk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @npolina4

@oleksandr-pavlyk oleksandr-pavlyk merged commit 2273287 into master Mar 17, 2023
@oleksandr-pavlyk oleksandr-pavlyk deleted the fix_gh_1122 branch March 17, 2023 00:33
@github-actions
Copy link

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.2=py310h76be34b_51 ran successfully.
Passed: 46
Failed: 788
Skipped: 280

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dpctl.tensor.concat does not support axis=None mandated by array API
3 participants