-
Notifications
You must be signed in to change notification settings - Fork 11
Implement Exponentiation via Associative Iteration #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
inc/zoo/swar/associative_iteration.h
Outdated
|
||
exponent = S{static_cast<T>(exponent.value() << (NB - ActualBits))}; | ||
return associativeOperatorIterated_regressive( | ||
x, S{1}, exponent, S{S::MostSignificantBit}, operation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your neutral is, lane wise, { 0, ..., 1 }, so, the upper lanes are initialized to 0
You could S{meta::BitmaskMaker<BaseType, 1, BitsPerLane>::value()}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ooh thanks for that, that makes total sense!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would broadcast(S{1})
be more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly, but using broadcast
is a complication because it relies on multiplication.
I have been slowly internalizing what needs to be done to support SIMD: no assumptions about the availability of whole-base-type multiplication. In ARM NEON, x86-64 SSE of 128 bits, there is no such thing, but at most lane-wise 64 bit multiplication, same with AVX 256 bits.
BitmaskMaker can be upgraded to support longer types.
The problem is that SIMD negate constexpr
, it's going to give us pain to make the non-constexpr flavor of the library
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is, indeed, subtle.
Perhaps not so subtle in the SIMD constexpr
negation...
@thecppzoo would you have some time tomorrow to go over some of this code together? |
@@ -41,6 +41,15 @@ static_assert( | |||
multiplication_OverflowUnsafe_SpecificBitCount<3>(Micand, Mplier).value() | |||
); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GOOD NEWS. Looks like my initial, first-pass implementation was correct after all!
Seems like part of the issue was how I'm trying to compare the binary literals.
All value I tried when evaluating using the hex idiom you guys have already got provides consistently correct answers.
.idea | ||
**cmake-build** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does your IDE really require building inline with the sources, or within the source tree?
I am loathe to pollute the repository with support for the anti-pattern of building within the sources; however, Visual Studio Code forced me to yield. VSCode is a poor tool that glitches in very frustrating ways when asked to build out of the source tree, that's the only reason for the test/.vscode
, build
and .cache
.
So, unless your tool glitches, adopt the practice of not building in-tree and remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this such a bad thing? I think most tools / LSPs expect to build in the source tree? I think CLion does this by default and I think clangd
by default looks for ./build
in the root of the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two things I dislike:
Most tools like the command line grep
, find
, etc. are extremely hard to configure for understanding .gitignore
, so, you just slow down your work flow by having to treat the non-sources directories as explicit special cases with every non-git tool you use.
Let's say you want to build with GCC and also with Clang.
Who gets to build in-tree? both?
So, for every flavor of build you may want you need to provide an entry point within the tree, and add it to the .gitignore.
How's that helpful instead of segregating builds from sources?
These are two instances of the more general problem of disorganization. A freshly cloned repository ought to contain only files generated by people or by tools that are so sophisticated they may as well. Further processing of "sources" ought not to dump the results into a place where it is hard to tell them apart from files made by people.
It baffles me that colleagues expect that whatever configuration is good for holding source files is going to be just as good for built files. For example, what if I have a configuration of very small blocks at the file system level expecting that source code is among the very small files in the distribution of file size, but keep the build filesystem with large blocks, because of the same reason on the opposite?
See? there are multiplicity of day to day advantages of keeping things organized. Very fundamental stuff for Software Engineers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry took a while to get back to this.
I understand both of these points, I think the overriding point from me regardless of my opinion is that this is how it is going to be used by people, so we might as well make it convenient for people to contribute. Many IDEs and tools like the build folder to be in the source tree. I'm using the clangd
LSP which is hooking into nvim which I believe requires the compile_commands.json
to be in ./build
. JetBrains also will build in the source tree by default into cmake-build-[BUILD_TYPE]
.
Secondly, you can use tools like ripgrep
in place of grep which are .gitignore
sensitive, which is awesome.
Though I hear your points well and agree, this addition doesn't prevent more optimal workflows and enables anyone to contribute more easily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's continue this conversation online, at this point, it is an item that should not go in this PR, but in another PR if we decide to support building in-tree.
Let's have PRs with a "single responsibility".
Thanks for this effort. |
test/swar/BasicOperations.cpp
Outdated
constexpr auto base = SWAR<8, u32>{0x02'01'05'06}; // 2 | 1 | 5 | 6 | ||
constexpr auto exponent = SWAR<8, u32>{0x07'00'02'03}; // 7 | 0 | 2 | 3 | ||
constexpr auto expected = SWAR<8, u32>{0x80'01'19'D8}; // 128 | 1 | 25 | 216 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Am I reviewing code made by ChatGPT?
It's technically correct to substitute the undefined 0^0 by the perfectly defined 1^0. Still, aren't there more useful examples? Like 3^5 = 243?
Also, think about this suggestion:
expected = SWAR<8, u32>{(127 << 24) + (1 << 16) + (25 << 8) + (216 << 0)}
help us with some notation, macros may be considered, to initialize the lanes of a SWAR.
The explicit meta::BitmaskMaker<....>::value
has been very tedious, error prone, time consuming for just repeating a constant, let alone initialize lanes to dissimilar values.
inc/zoo/swar/SWAR.h
Outdated
@@ -70,6 +71,19 @@ struct SWAR { | |||
|
|||
constexpr T value() const noexcept { return m_v; } | |||
|
|||
constexpr static T baseFromLaneLiterals(std::initializer_list<T> args) noexcept { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thecppzoo i wonder what you think about this utility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, we could do real literals for very popular cases such as
s4_64, s8_64, s16_64, s32_64;
s4_32, s8_32 , s16_32;
s4_16, s8_16;
s4_8;
A constructor from initializer list would be ambiguous since you could have an initializer list of only one element and then it would not be possible to distinguish between a literal with one element or a whole-swar initialization with a base type...
I am not able to decide this minute, but I thank you for posing the question.
I would like to improve the ergonomics, do you have ideas further down the road in this direction or is this just as far as your vision goes at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i was trying a few variants with variadic arguments but that was a little tricky when trying to iterate over them, i feel like there is yet a better solution.
this is one i was inspired for right now, but i think i will defo have more ideas with respect to ergonomics!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thecppzoo how does this update look? makes it unambiguous that the number of lanes to use.
how would you go about using the different types for the arguments here dynamically?
inc/zoo/swar/SWAR.h
Outdated
T result = 0; | ||
for (auto arg: args) { | ||
result = (result << NBits) | arg; | ||
} | ||
return result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is big endian, have you thought about which one ought to be the "canonical" endianness for literals?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should read most significant on the left, least significant on the right, like it is now.
41d0815
to
449de1f
Compare
'overflow unsafe' points out the follow-up to this PR that exponentiation via repeated multiplication via repeated saturating addition is an implementation option. |
I don't know, Scott, perhaps it is better to focus on the flow of doubling the precision, saturating, and halving the precision. In general, the "precision doubling" (casting to a pair of SWARs with double lane width, --how do we call this better than "precision doubling"?--) is a very useful technique, that we ought to illustrate often; see what I did with Lemire's conversion of ASCII to integer, see what the assembler of GLIBC did to implement |
Here is the headline test of this PR.