Skip to content

Conversation

mhoemmen
Copy link
Contributor

@mhoemmen mhoemmen commented May 20, 2025

Implement draft R2 of P3663 (Future-proof submdspan_mapping).

Priorities, in decreasing order:

  1. Demonstrate a working implementation (DONE)
  2. Compare performance with no-P3663 baseline (IN PROGRESS)
  3. Fix merge conflicts (only needed if WG21 votes P3663 into the Standard)

Feature is off by default. To enable it, set the CMake option MDSPAN_ENABLE_P3663 to ON.

There are two versions of the implementation. The default version requires C++20. The nondefault version requires the C++23 feature "deducing this," and optionally can make use of the C++26 feature "pack indexing" (P2662R3) if available. In order to support these features, I made some changes to mdspan's CMake logic to permit -std=c++2c with Clang 21.0.0 and GCC 15.1.0.

I've tested the default version with Clang 14 and GCC 11.4.0, and the nondefault version with Clang 21.0.0 (development Clang) and GCC 15.1.0. To enable the default version, please use the following CMake options.

  -DMDSPAN_ENABLE_P3663=ON \
  -DCMAKE_CXX_FLAGS="-std=c++20" \
  -DMDSPAN_CXX_STANDARD=20 \
  -DMDSPAN_CONSTANT_WRAPPER_WORKAROUND=ON

To enable the nondefault version, please use the following CMake options.

  -DMDSPAN_ENABLE_P3663=ON \
  -DCMAKE_CXX_FLAGS="-std=c++2c" \
  -DMDSPAN_CXX_STANDARD=26 \
  -DMDSPAN_CONSTANT_WRAPPER_WORKAROUND=OFF

mhoemmen added 5 commits May 6, 2025 14:23
It currently requires the C++23 feature "deducing this"
in order to compile.  I've tested it with dev Clang (21.0.0).
It should work fine with Clang 20.
Also implement exposition-only __mdspan_integral_constant_like
concept, which is useful for canonicalization and Mandates.
sizeof...(Slices) == 0 case builds and passes tests.
Set the following CMake options.

  -DCMAKE_CXX_FLAGS="-std=c++2c"

  -DMDSPAN_CXX_STANDARD=26

You'll get a warning, but it seems to work.
* canonical-ice
* subtract-ice
* de-ice
* check-static-bounds
mhoemmen added 2 commits May 21, 2025 16:09
submdspan_canonicalize_slices now works for one strided_slice.
Fix canonical_ice (wrong order in subtraction; wording is fine).
mhoemmen added 8 commits May 21, 2025 16:49
Attempt to change over submdspan to use canonicalization
without doing much else.  It fails to build; a lot of
layout mappings seem to get the wrong mapping result type.

It builds and passes tests with P3663 support off.
submdspan builds now when submdspan_mapping is called with
canonical slices, but the submdspan test fails.

Investigate whether this is a bug in the implementation.
Fix extent computation for std::complex.
It builds and passes tests.

NEXT STEPS:

* Make it ill-formed to call submdspan_mapping with non-canonical
  slice types
* Make P3663 version only use constant_wrapper and canonical
  slice types in submdspan_mapping_impl functions
mhoemmen added 10 commits May 23, 2025 11:29
All submdspan_mapping_impl functions now check that every k-th slice
type is a canonical k-th slice type.  This ensures that
submdspan_mapping_impl is only well-formed for canonical k-th slice
types, and is ill-formed otherwise.
in P3663 branch only.
Don't need to specialize for all integral-constant-like,
just for std::constant_wrapper.
@mhoemmen mhoemmen changed the title Implement P3663R1 (Future-proof submdspan_mapping) Implement P3663R2 (Future-proof submdspan_mapping) May 27, 2025
@mhoemmen mhoemmen changed the title Implement P3663R2 (Future-proof submdspan_mapping) Implement D3663R2 (Future-proof submdspan_mapping) May 27, 2025
mhoemmen added 21 commits June 3, 2025 16:07
It currently exercises the following cases for a variety
of extents types.

* full_extent_t

* integral_constant<IndexType, Value>

* IndexType
Improve testing for the case where a structured binding of the
slice into two elements is valid.  Test includes both aggregates
and non-aggregates that opt into the tuple protocol.
The test should now have complete coverage of all the cases,
for all combinations of static and/or dynamic extents.
Currently only the CPU version works.
The current (non-P3663) mdspan implementation incorrectly
cannot handle index or slice types that are convertible to
index_type, but are not integral-non-bool.

This commit fixes that, and adds unit tests for both
layout mapping indexing and submdspan slicing.
Make sure that the benchmark gets the MDSPAN_ENABLE_P3663
definition (it wasn't before).

Add new benchmarks that make it harder for the compiler
to optimize away code.  (They actually change all the values
in the mdspan and check that the changes are mathematically
correct.)

In benchmark, for the slice type that's convertible to
full_extent_t, make the conversion operator noexcept.
CUDA version is as yet untested.

It probably won't compile, at least because the P3663 implementation
currently relies on C++23 or 26 language features that may not yet be
implemented in NVCC.
... in constant_wrapper.  The issue is with the default argument
of the second template argument of constant_wrapper.

typename unspecified =
  typename decltype(exposition_only::cw_fixed_value(X))::type

Replacing that expression with use of the following alias doesn't help.

namespace exposition_only {
  template<auto X>
  using unspecified_t = typename decltype(cw_fixed_value(X))::type;
}

Moving the definitions of the various specializations of cw_fixed_value
above this point doesn't help either.

This generally works fine even with Clang 21.
Work around lack of parameter pack indexing
and fix constant_wrapper's assignment operator.

After some adjustments to the constant_wrapper implementation,
I found out that its overloaded arithmetic operators --
specifically, binary operator+ -- do not work.

.../submdspan_extents.hpp:96:22: error: static assertion failed
   96 |   static_assert(std::is_same_v<
      |                 ~~~~~^~~~~~~~~~
   97 |       decltype(counter + std::cw<size_t(1)>),
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   98 |       std::constant_wrapper<counter_value + size_t(1)>
      |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   99 |     >);

Binary operator+ doesn't kick in; the first branch of the
`is_same_v` is `unsigned long`, i.e., `size_t`.
Define MDSPAN_CONSTANT_WRAPPER_GCC_WORKAROUND to 1
to get the work-around for GCC 11.4.0 C++20.

The key is not to rely on constant_wrapper's binary operator-.
I've tried a few designs for overloaded operators on
constant_wrapper, but none of them work with GCC 11.4.0.
Remove constant_wrapper's overloaded arithmetic operators,
since the work-around doesn't need them.
* Rename option for working around constant_wrapper compiler
  issues from MDSPAN_CONSTANT_WRAPPER_GCC_WORKAROUND to
  MDSPAN_CONSTANT_WRAPPER_WORKAROUND.

* Make MDSPAN_CONSTANT_WRAPPER_WORKAROUND a CMake option.
  It's OFF by default.  It only has an effect if the CMake
  option MDSPAN_ENABLE_P3663 is ON.
GCC 15.1.0 can now build with the following options.

MDSPAN_ENABLE_P3663=ON
MDSPAN_CONSTANT_WRAPPER_WORKAROUND=OFF
CMAKE_CXX_FLAGS="-std=c++2c"
MDSPAN_CXX_STANDARD=26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant