Skip to content

Ref 314 #454

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 20, 2021
Merged

Ref 314 #454

merged 4 commits into from
Apr 20, 2021

Conversation

serge-sans-paille
Copy link
Contributor

This is a recommit of #314 rebased with some code cleanup and commit split, and hopefully a few bug fixes to come

@@ -176,6 +176,45 @@ namespace xsimd
return _mm512_sub_epi32(lhs, rhs);
}

static batch_type sadd(const batch_type& lhs, const batch_type& rhs)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll benchmark that approach compared to the one based on a comparison + a blend.

Copy link
Contributor Author

@serge-sans-paille serge-sans-paille Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, found a nice and efficient solution based on min/max for the unsigned version \o/

@serge-sans-paille serge-sans-paille force-pushed the ref_314 branch 3 times, most recently from 7df8a5e to b02d901 Compare April 16, 2021 07:32
@serge-sans-paille
Copy link
Contributor Author

@JohanMabille ready for review. I did some benchmarking of the generic adds method, applying it to int16_t where we have a reference intrinsic. the home made implementation still acheieves ~3Instruction per cycle, so no pipeline stall. It's roughly 4 time slower than the intrinsic, which is somehow expected. And 4 time faster than the sequential naive version, which is good too :-)

@serge-sans-paille serge-sans-paille force-pushed the ref_314 branch 2 times, most recently from 4ff5cb2 to e613094 Compare April 16, 2021 13:22
serge-sans-paille and others added 2 commits April 16, 2021 19:13
It's good to have xsimd_scalar as standalone as possible.
* int8,uint8,int16,uint16,int32,uint32,int64,uint64,float,double
* sse2/sse4
* avx/avx2
* avx512
* fallback
* neon
@serge-sans-paille
Copy link
Contributor Author

@JohanMabille ready for another round of review :-)

JohanMabille
JohanMabille previously approved these changes Apr 17, 2021
@JohanMabille JohanMabille dismissed their stale review April 17, 2021 04:55

I missed things

faster and simpler saturated add / sub for unsigned types when the builtin
doesn't exist. It uses a min/max instead of an explicit comparison, for this
instruction has a nice latency of 1 on sse and avx2 and avx512.

Also add a doc entry.
@serge-sans-paille
Copy link
Contributor Author

@JohanMabille reen and cleaned-up o/

* Distributed under the terms of the BSD 3-Clause License. *
* *
* The full license is in the file LICENSE, distributed with this software. *
****************************************************************************/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file should live in the types subfolder instead of math to guarantee the non cyclic dependency math -> types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is only included from the math folder, so I think it's the right place.

@@ -16,6 +16,8 @@
#include <cmath>
#include <utility>

#include "xsimd/math/xsimd_scalar.hpp"
Copy link
Member

@JohanMabille JohanMabille Apr 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@serge-sans-paille I mixed up with this include, sorry.

@JohanMabille
Copy link
Member

Awesome!

@JohanMabille JohanMabille merged commit e04cd93 into xtensor-stack:master Apr 20, 2021
@JohanMabille JohanMabille mentioned this pull request Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants