Skip to content

Clarify semantics of ExecSpace parameter for communication interfaces #108

Open
@dssgabriel

Description

KokkosComm currently exposes communication functions with an ExecSpace parameter whose purpose is not semantically clear, neither in the code nor in the documentation, and that doesn't map to the behavior we expect.

The documentation for, e.g. KokkosComm::isend, states:

template<CommunicationMode SendMode, KokkosExecutionSpace ExecSpace, KokkosView SendView>
Req isend(const ExecSpace &space, const SendView &sv, int dest, int tag, MPI_Comm comm)

Parameters:

  • space: The execution space to operate in
    ...

Template Parameters:

  • ExecSpace: A Kokkos execution space to operate
    ...

This lets users rightfully think that the space parameter is used for specifying in which execution space the communication has to happen, not the one they must sync with for their communication to be correctly processed.
However, the actual implementation of KokkosComm::isend (for contiguous views that don't require packing) does something like:

  space.fence();  // can't issue isend until work in space is complete
  MPI_Isend(/*...*/);

Why can't we submit more work in the space? What does this fence has to do with the execution space in which the communication operates?

Let's demonstrate these unclear semantics with a code example:

// Partition the execution space into two
auto instances = Kokkos::partition_space(space, 1, 1);
auto compute_space = instances[0];
auto comm_space = instances[1];

// Dispatch some parallel work that prepares our data `v` on the compute exec space
Kokkos::parallel_for(Kokkos::RangePolicy(compute_space, 0, v.extent(0)), [=](int const i) {
  v(i) = i;
});

// Send the prepared data using the communication exec space
KokkosComm::isend(comm_space, v, dest, tag, comm).wait(); // WARNING: `v` may not have been ready!

In this snippet, we split an execution space into two. Because we're specifying the compute_space execution space to the parallel_for, it is asynchronous and we must fence on it so that v is done initializing before calling KokkosComm::isend.
However, the implementation of the latter fences on the given execution space (here, comm_space) before doing the actual send. In this example, this is the wrong space to fence on, and the fence is useless.

Passing an execution space can be useful — e.g. for specifying where to pack a non-contiguous view, or use a particular CUDA stream once we have the NCCL backend — but it shouldn't be just for us to manually fence on an execution space that may have nothing to do with the view we operate on.

I propose that we:

  1. Clarify the purpose and semantics of the execution space parameter;
  2. Explain to users that is up to them to guarantee that their data is ready to be used by KokkosComm (i.e. in the example above, they must call compute_space.fence() themselves before calling us);
  3. Remove the call to space.fence() from our functions that take an execution space (in the contiguous case).

Given that we would not fence on the execution space anymore, the plans proposed in #100 won't be needed, as we remove all pointless fences from our implementations.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentationperformanceSomething related to performancesemanticsSomething related to code semantics

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions