support of pipelined accesses in the runtime 

I have a question about the runtime support that allows for the migration of the following cuda codes. Thanks.
```
auto pipe = cuda::make_pipeline();

  // pipeline load W/X and compute WX;
  pipe.producer_acquire();
  cuda::memcpy_async(W_shared + (threadIdx.y * tx + threadIdx.x) * vec_size,
                     W + (idx * feat_out + j) * feat_in +
                         (threadIdx.y * tx + threadIdx.x) * vec_size,
                     cuda::aligned_size_t<W_copy_size>(W_copy_size), pipe);
  cuda::memcpy_async(X_shared + (threadIdx.y * tx + threadIdx.x) * vec_size,
                     X + (batch_idx * feat_in) +
                         (threadIdx.y * tx + threadIdx.x) * vec_size,
                     cuda::aligned_size_t<X_copy_size>(X_copy_size), pipe);
  pipe.producer_commit();
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support of pipelined accesses in the runtime #1554

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

support of pipelined accesses in the runtime #1554

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions