Optimizations for unreachable sentinels#1810
Optimizations for unreachable sentinels#1810StephanTLavavej merged 14 commits intomicrosoft:mainfrom
Conversation
miscco
left a comment
There was a problem hiding this comment.
I am a bit concerned about the additional derivations of _Copy_memcpy
Also I believe the _STL_ASSERTs should bee _STATIC_ASSERT(_Always_false, ...)
| && (_Is_sized1 || same_as<_Se1, unreachable_sentinel_t>) // | ||
| #pragma warning(suppress : 6287) // Redundant code: the left and right subexpressions are identical | ||
| &&(_Is_sized2 || same_as<_Se2, unreachable_sentinel_t>) // | ||
| &&same_as<_Pj1, identity> && same_as<_Pj2, identity>) { |
There was a problem hiding this comment.
In the above case you put those constraints on top. I would prefer to keep them there but downgrade to is_same_v or _Same_impl
There was a problem hiding this comment.
All of the ranges algorithm optimizations I've written already use same_as.
| template <class _InIt, class _OutIt, typename _DistIt> | ||
| in_out_result<_InIt, _OutIt> _Copy_memcpy_distance( | ||
| _InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept { | ||
| const auto _IFirstPtr = _To_address(_IFirst); | ||
| const auto _OFirstPtr = _To_address(_OFirst); | ||
| const auto _DFirstPtr = _To_address(_DFirst); | ||
| const auto _DLastPtr = _To_address(_DLast); | ||
| const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr)); | ||
| const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr)); | ||
| const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr)); | ||
| const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr)); | ||
| const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch); | ||
| _CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count); | ||
| if constexpr (is_pointer_v<_InIt>) { | ||
| _IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count); | ||
| } else { | ||
| _IFirst += _Count / sizeof(iter_value_t<_InIt>); | ||
| } | ||
|
|
||
| if constexpr (is_pointer_v<_OutIt>) { | ||
| _OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count); | ||
| } else { | ||
| _OFirst += _Count / sizeof(iter_value_t<_OutIt>); | ||
| } | ||
| return {_STD move(_IFirst), _STD move(_OFirst)}; | ||
| } |
There was a problem hiding this comment.
I believe that overload is superfluous. The only difference is how we get to _Count
I guess we can get to that easier that duplicating all that machinery
There was a problem hiding this comment.
We might just use the _count version but that might have worse codegen. #1433 (comment)
There was a problem hiding this comment.
I thought more about determining _Count via the right method and passing that around. That should give the optimal codegen all the time
There was a problem hiding this comment.
FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.
| template <class _InIt, class _OutIt, typename _DistIt> | ||
| in_out_result<_InIt, _OutIt> _Copy_memcpy_distance( | ||
| _InIt _IFirst, _OutIt _OFirst, _DistIt _DFirst, _DistIt _DLast) noexcept { | ||
| const auto _IFirstPtr = _To_address(_IFirst); | ||
| const auto _OFirstPtr = _To_address(_OFirst); | ||
| const auto _DFirstPtr = _To_address(_DFirst); | ||
| const auto _DLastPtr = _To_address(_DLast); | ||
| const auto _IFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_IFirstPtr)); | ||
| const auto _OFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_OFirstPtr)); | ||
| const auto _DFirst_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DFirstPtr)); | ||
| const auto _DLast_ch = const_cast<char*>(reinterpret_cast<const volatile char*>(_DLastPtr)); | ||
| const auto _Count = static_cast<size_t>(_DLast_ch - _DFirst_ch); | ||
| _CSTD memcpy(_OFirst_ch, _IFirst_ch, _Count); | ||
| if constexpr (is_pointer_v<_InIt>) { | ||
| _IFirst = reinterpret_cast<_InIt>(_IFirst_ch + _Count); | ||
| } else { | ||
| _IFirst += _Count / sizeof(iter_value_t<_InIt>); | ||
| } | ||
|
|
||
| if constexpr (is_pointer_v<_OutIt>) { | ||
| _OFirst = reinterpret_cast<_OutIt>(_OFirst_ch + _Count); | ||
| } else { | ||
| _OFirst += _Count / sizeof(iter_value_t<_OutIt>); | ||
| } | ||
| return {_STD move(_IFirst), _STD move(_OFirst)}; | ||
| } |
There was a problem hiding this comment.
FWIW, I'm happy with this as-is. I think extracting the "distance" behavior from this function would either produce less efficient code or something even more confusing.
Co-authored-by: Casey Carter <cartec69@gmail.com>
StephanTLavavej
left a comment
There was a problem hiding this comment.
Looks good - I'll validate and push changes for the very minor things I noticed.
|
@AdamBucior @CaseyCarter FYI I added |
|
Also had to merge |
|
Thanks for increasing the STL's performance to previously unreachable levels! 😹 🚀 📈 |
Enables optimizations for contiguous iterator +
unreachable_sentinel_tranges.The only thing I'm concerned about is users overloading
operator==(custom contiguous iterator, unreachable_sentinel_t)which doesn't always returnfalse. Such code should definitely be forbidden by the standard but I'm not sure if it is.