Clarify semantics of `ExecSpace` parameter for communication interfaces

KokkosComm currently exposes communication functions with an `ExecSpace` parameter whose purpose is not semantically clear, neither in the code nor in the documentation, and that doesn't map to the behavior we expect.

The documentation for, e.g. `KokkosComm::isend`, states:
> ```cpp
> template<CommunicationMode SendMode, KokkosExecutionSpace ExecSpace, KokkosView SendView>
> Req isend(const ExecSpace &space, const SendView &sv, int dest, int tag, MPI_Comm comm)
> ```
>
> Parameters:
> - `space`: The execution space to operate in
> ...
>
> Template Parameters:
> - `ExecSpace`: A Kokkos execution space to operate
> ...

This lets users rightfully think that the `space` parameter is used for specifying in which execution space the communication has to happen, not the one they must sync with for their communication to be correctly processed.
However, the actual implementation of `KokkosComm::isend` (for contiguous views that don't require packing) does something like:
```cpp
  space.fence();  // can't issue isend until work in space is complete
  MPI_Isend(/*...*/);
```

Why can't we submit more work in the space? What does this fence has to do with the execution space in which the communication operates?

Let's demonstrate these unclear semantics with a code example:
```cpp
// Partition the execution space into two
auto instances = Kokkos::partition_space(space, 1, 1);
auto compute_space = instances[0];
auto comm_space = instances[1];

// Dispatch some parallel work that prepares our data `v` on the compute exec space
Kokkos::parallel_for(Kokkos::RangePolicy(compute_space, 0, v.extent(0)), [=](int const i) {
  v(i) = i;
});

// Send the prepared data using the communication exec space
KokkosComm::isend(comm_space, v, dest, tag, comm).wait(); // WARNING: `v` may not have been ready!
```

In this snippet, we split an execution space into two. Because we're specifying the `compute_space` execution space to the `parallel_for`, it is asynchronous and we _must_ fence on it so that `v` is done initializing _before_ calling `KokkosComm::isend`.
However, the implementation of the latter fences on the given execution space (here, `comm_space`) before doing the actual send. In this example, this is the wrong space to fence on, **and** the fence is useless.

Passing an execution space can be useful — e.g. for specifying where to pack a non-contiguous view, or use a particular CUDA stream once we have the NCCL backend — but it shouldn't be just for us to manually fence on an execution space that may have nothing to do with the view we operate on.

I propose that we:
1. Clarify the purpose and semantics of the execution space parameter;
2. Explain to users that is up to them to guarantee that their data is ready to be used by KokkosComm (i.e. in the example above, they must call `compute_space.fence()` themselves before calling us);
3. Remove the call to `space.fence()` from our functions that take an execution space (in the contiguous case).

Given that we would not fence on the execution space anymore, the `plans` proposed in #100 won't be needed, as we remove all pointless fences from our implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarify semantics of `ExecSpace` parameter for communication interfaces #108

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Clarify semantics of ExecSpace parameter for communication interfaces #108

Description

Activity