Skip to content

In-place addition, multiplication, subtraction of usm_ndarrays #1237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jun 13, 2023

Conversation

ndgrigorian
Copy link
Collaborator

@ndgrigorian ndgrigorian commented Jun 12, 2023

Closes #1229
This pull request implements in-place addition, subtraction, and multiplication of usm_ndarray instances.

By avoiding the additional allocation of an output array, the performance advantage compared compared to applying the operator with a scalar, a la dpt.add(x, 2), is considerable.

For example:

In [8]: X = dpt.ones((16768, 16768), dtype="f4")

In [9]: %time X += 2
CPU times: user 58.5 ms, sys: 78.1 ms, total: 137 ms
Wall time: 133 ms

In [10]: %time dpt.add(X, 2)
CPU times: user 238 ms, sys: 147 ms, total: 385 ms
Wall time: 392 ms
  • Have you provided a meaningful PR description?
  • Have you added a test, reproducer or referred to an issue with a reproducer?
  • Have you tested your changes locally for CPU and GPU devices?
  • Have you made sure that new changes do not introduce compiler warnings?
  • Have you checked performance impact of proposed changes?
  • If this PR is a work in progress, are you opening the PR as a draft?

@github-actions
Copy link

@ndgrigorian ndgrigorian force-pushed the inplace-operator-initial-impl branch from e9a4f30 to cd3f169 Compare June 12, 2023 18:43
@coveralls
Copy link
Collaborator

coveralls commented Jun 12, 2023

Coverage Status

coverage: 84.117% (+0.007%) from 84.11% when pulling da5f2f7 on inplace-operator-initial-impl into 43f3b7b on master.

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.3dev3=py310h7bf5fec_36 ran successfully.
Passed: 363
Failed: 637
Skipped: 119

@ndgrigorian ndgrigorian marked this pull request as ready for review June 12, 2023 20:05
@ndgrigorian ndgrigorian requested review from npolina4 and vtavana June 12, 2023 20:05
@ndgrigorian ndgrigorian changed the title In-place addition of usm_ndarrays In-place addition, multiplication, subtraction of usm_ndarrays Jun 12, 2023
@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.3dev3=py310h7bf5fec_38 ran successfully.
Passed: 372
Failed: 628
Skipped: 119

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.3dev3=py310h7bf5fec_39 ran successfully.
Passed: 373
Failed: 627
Skipped: 119

@ndgrigorian ndgrigorian requested a review from vtavana June 13, 2023 08:37
@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.3dev3=py310h7bf5fec_42 ran successfully.
Passed: 375
Failed: 625
Skipped: 119

@ndgrigorian ndgrigorian force-pushed the inplace-operator-initial-impl branch from 2a98e1d to da5f2f7 Compare June 13, 2023 16:18
Copy link
Collaborator

@vtavana vtavana left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ndgrigorian

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.3dev3=py310h7bf5fec_50 ran successfully.
Passed: 386
Failed: 614
Skipped: 119

@ndgrigorian ndgrigorian merged commit 81553f8 into master Jun 13, 2023
@github-actions
Copy link

Deleted rendered PR docs from intelpython.github.com/dpctl, latest should be updated shortly. 🤞

@github-actions
Copy link

Array API standard conformance tests for dpctl=0.14.3dev4=py310h7bf5fec_10 ran successfully.
Passed: 388
Failed: 612
Skipped: 119

Copy link

@AlexanderKalistratov AlexanderKalistratov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just small comments. You can ignore it, if you want.

sycl::vec<argT, vec_sz> arg_vec;
sycl::vec<resT, vec_sz> res_vec;

#pragma unroll

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried to compare performance against something simple like this:

int grid = ndit.get_group_linear_id();
int llid = ndit.get_local_linear_id();
int local_size = ndit.get_local_range(0); // is this 1-d task?

int base = grid*local_size*vec_sz*n_vecs;

#pragma unroll
for (int i = 0; i < vec_sz*n_vecs; ++i)
    size_t k = base + local_size*i + llid;
    if (k < nitems)
        op(lhs[k], rhs[k]);

I'm pretty sure compiler can place load and store operations for you. You don't need to put it manually.

You probably need to split in two loops (over n_vecs and over vec_sz)

}
}
else {
for (size_t k = base + sg.get_local_id()[0]; k < nelems_;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one doesn't feels right.
Are you iterating over all items inside one subgroup?
You are comparing against nelems_, not against vec_sz*n_vecs.

Have you validated this version?

X = dpt.zeros((10, 10), dtype=dtype, sycl_queue=q)
dt_kind = X.dtype.kind
if dt_kind in "ui":
X += int(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You really shouldn't test +=0 and *=1. Use something, what would modify original data. E.g. +=1 and *=2

Also, I don't see results checking


sz = 127
ar1 = dpt.ones(sz, dtype=op1_dtype)
ar2 = dpt.ones_like(ar1, dtype=op2_dtype)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is much better to use something like arange or random data if possible. Testing on arrays where all elements are the same could play a bad joke with you.

v = dpt.arange(5, dtype="i4")

m += v
assert (dpt.asnumpy(m) == np.arange(1, 6, dtype="i4")[np.newaxis, :]).all()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one good one 👍

but it tests only one dtype.

@ndgrigorian ndgrigorian deleted the inplace-operator-initial-impl branch August 8, 2023 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement in place functions
5 participants