Implement Exponentiation via Associative Iteration #75

jamierpond · 2024-02-27T07:31:44Z

Here is the headline test of this PR.

using S = SWAR<8, u32>;
constexpr auto base     = S::fromLaneLiterals({2,   3,  5,  6});
constexpr auto exponent = S::fromLaneLiterals({7,   4,  2,  3});
constexpr auto expected = S::fromLaneLiterals({128, 81, 25, 216});
constexpr auto actual = exponentiation_OverflowUnsafe(base, exponent);
static_assert(expected.value() == actual.value());
CHECK(expected.value() == actual.value());

thecppzoo · 2024-02-27T07:46:26Z

inc/zoo/swar/associative_iteration.h

+
+    exponent = S{static_cast<T>(exponent.value() << (NB - ActualBits))};
+    return associativeOperatorIterated_regressive(
+        x, S{1}, exponent, S{S::MostSignificantBit}, operation,


Your neutral is, lane wise, { 0, ..., 1 }, so, the upper lanes are initialized to 0
You could S{meta::BitmaskMaker<BaseType, 1, BitsPerLane>::value()}

ooh thanks for that, that makes total sense!

Would broadcast(S{1}) be more readable?

Possibly, but using broadcast is a complication because it relies on multiplication.
I have been slowly internalizing what needs to be done to support SIMD: no assumptions about the availability of whole-base-type multiplication. In ARM NEON, x86-64 SSE of 128 bits, there is no such thing, but at most lane-wise 64 bit multiplication, same with AVX 256 bits.

BitmaskMaker can be upgraded to support longer types.
The problem is that SIMD negate constexpr, it's going to give us pain to make the non-constexpr flavor of the library

That is, indeed, subtle.

Perhaps not so subtle in the SIMD constexpr negation...

inc/zoo/swar/associative_iteration.h

jamierpond · 2024-02-27T08:02:32Z

@thecppzoo would you have some time tomorrow to go over some of this code together?

jamierpond · 2024-02-28T23:13:12Z

test/swar/BasicOperations.cpp

@@ -41,6 +41,15 @@ static_assert(
    multiplication_OverflowUnsafe_SpecificBitCount<3>(Micand, Mplier).value()
 );



GOOD NEWS. Looks like my initial, first-pass implementation was correct after all!
Seems like part of the issue was how I'm trying to compare the binary literals.

All value I tried when evaluating using the hex idiom you guys have already got provides consistently correct answers.

inc/zoo/swar/associative_iteration.h

test/swar/BasicOperations.cpp

thecppzoo · 2024-02-28T23:38:40Z

.gitignore

+.idea
+**cmake-build**


Does your IDE really require building inline with the sources, or within the source tree?
I am loathe to pollute the repository with support for the anti-pattern of building within the sources; however, Visual Studio Code forced me to yield. VSCode is a poor tool that glitches in very frustrating ways when asked to build out of the source tree, that's the only reason for the test/.vscode, build and .cache.
So, unless your tool glitches, adopt the practice of not building in-tree and remove this.

Why is this such a bad thing? I think most tools / LSPs expect to build in the source tree? I think CLion does this by default and I think clangd by default looks for ./build in the root of the project?

Just two things I dislike:
Most tools like the command line grep, find, etc. are extremely hard to configure for understanding .gitignore, so, you just slow down your work flow by having to treat the non-sources directories as explicit special cases with every non-git tool you use.

Let's say you want to build with GCC and also with Clang.
Who gets to build in-tree? both?
So, for every flavor of build you may want you need to provide an entry point within the tree, and add it to the .gitignore.
How's that helpful instead of segregating builds from sources?

These are two instances of the more general problem of disorganization. A freshly cloned repository ought to contain only files generated by people or by tools that are so sophisticated they may as well. Further processing of "sources" ought not to dump the results into a place where it is hard to tell them apart from files made by people.

It baffles me that colleagues expect that whatever configuration is good for holding source files is going to be just as good for built files. For example, what if I have a configuration of very small blocks at the file system level expecting that source code is among the very small files in the distribution of file size, but keep the build filesystem with large blocks, because of the same reason on the opposite?

See? there are multiplicity of day to day advantages of keeping things organized. Very fundamental stuff for Software Engineers.

Sorry took a while to get back to this.

I understand both of these points, I think the overriding point from me regardless of my opinion is that this is how it is going to be used by people, so we might as well make it convenient for people to contribute. Many IDEs and tools like the build folder to be in the source tree. I'm using the clangd LSP which is hooking into nvim which I believe requires the compile_commands.json to be in ./build. JetBrains also will build in the source tree by default into cmake-build-[BUILD_TYPE].

Secondly, you can use tools like ripgrep in place of grep which are .gitignore sensitive, which is awesome.

Though I hear your points well and agree, this addition doesn't prevent more optimal workflows and enables anyone to contribute more easily.

Let's continue this conversation online, at this point, it is an item that should not go in this PR, but in another PR if we decide to support building in-tree.

Let's have PRs with a "single responsibility".

inc/zoo/swar/associative_iteration.h

thecppzoo · 2024-02-28T23:57:34Z

Thanks for this effort.
Part of the value of re-expressing known techniques as what we call "associative iteration" is that this gets us closer to use any advantage we can find, later, about associative operations.
For example, we can identify cases in which the "counts" for the iteration are constants and device optimal addition chains for them.
Take cryptographic techniques: they may involve exponentiation (of some weird form of multiplication) by a constant: if it is a constant, chances are we could have compilation-time code that comes up with the optimal addition chain...

thecppzoo · 2024-02-29T00:25:16Z

test/swar/BasicOperations.cpp

+    constexpr auto base     = SWAR<8, u32>{0x02'01'05'06}; //   2 | 1 |  5 |   6
+    constexpr auto exponent = SWAR<8, u32>{0x07'00'02'03}; //   7 | 0 |  2 |   3
+    constexpr auto expected = SWAR<8, u32>{0x80'01'19'D8}; // 128 | 1 | 25 | 216


Am I reviewing code made by ChatGPT?
It's technically correct to substitute the undefined 0^0 by the perfectly defined 1^0. Still, aren't there more useful examples? Like 3^5 = 243?
Also, think about this suggestion:
expected = SWAR<8, u32>{(127 << 24) + (1 << 16) + (25 << 8) + (216 << 0)}
help us with some notation, macros may be considered, to initialize the lanes of a SWAR.
The explicit meta::BitmaskMaker<....>::value has been very tedious, error prone, time consuming for just repeating a constant, let alone initialize lanes to dissimilar values.

jamierpond · 2024-03-19T18:04:07Z

inc/zoo/swar/SWAR.h

@@ -70,6 +71,19 @@ struct SWAR {

    constexpr T value() const noexcept { return m_v; }

+   constexpr static T baseFromLaneLiterals(std::initializer_list<T> args) noexcept {


@thecppzoo i wonder what you think about this utility?

Ah, we could do real literals for very popular cases such as

s4_64, s8_64, s16_64, s32_64;
s4_32, s8_32 , s16_32;
s4_16, s8_16;
s4_8;

A constructor from initializer list would be ambiguous since you could have an initializer list of only one element and then it would not be possible to distinguish between a literal with one element or a whole-swar initialization with a base type...

I am not able to decide this minute, but I thank you for posing the question.

I would like to improve the ergonomics, do you have ideas further down the road in this direction or is this just as far as your vision goes at the moment?

yeah i was trying a few variants with variadic arguments but that was a little tricky when trying to iterate over them, i feel like there is yet a better solution.

this is one i was inspired for right now, but i think i will defo have more ideas with respect to ergonomics!

@thecppzoo how does this update look? makes it unambiguous that the number of lanes to use.
how would you go about using the different types for the arguments here dynamically?

thecppzoo · 2024-03-19T18:12:54Z

inc/zoo/swar/SWAR.h

+       T result = 0;
+       for (auto arg: args) {
+           result = (result << NBits) | arg;
+       }
+       return result;


This is big endian, have you thought about which one ought to be the "canonical" endianness for literals?

I think it should read most significant on the left, least significant on the right, like it is now.

rm unused multiply more tests

Scottbruceheart · 2024-03-25T21:59:59Z

'overflow unsafe' points out the follow-up to this PR that exponentiation via repeated multiplication via repeated saturating addition is an implementation option.

thecppzoo · 2024-03-25T22:56:16Z

'overflow unsafe' points out the follow-up to this PR that exponentiation via repeated multiplication via repeated saturating addition is an implementation option.

I don't know, Scott, perhaps it is better to focus on the flow of doubling the precision, saturating, and halving the precision.

In general, the "precision doubling" (casting to a pair of SWARs with double lane width, --how do we call this better than "precision doubling"?--) is a very useful technique, that we ought to illustrate often; see what I did with Lemire's conversion of ASCII to integer, see what the assembler of GLIBC did to implement strlen "masks".

jamierpond added 5 commits February 26, 2024 22:42

some progress...

530543a

hmmm

c600842

add wip from fmtgp

3b9f000

add notes

07952f5

add explicity typing to make compiler happy

bc27994

jamierpond requested review from thecppzoo and Scottbruceheart February 27, 2024 07:31

cleanup

264dce9

thecppzoo reviewed Feb 27, 2024

View reviewed changes

inc/zoo/swar/associative_iteration.h Show resolved Hide resolved

neutral lane wise

9943917

jamierpond added 5 commits February 27, 2024 19:34

wip

acdaa6f

woop seems to be working!

4eace36

tidy up

88360cf

tidy

4801e56

rm

582affe

jamierpond commented Feb 28, 2024

View reviewed changes

inc/zoo/swar/associative_iteration.h Show resolved Hide resolved

jamierpond commented Feb 28, 2024

View reviewed changes

inc/zoo/swar/associative_iteration.h Outdated Show resolved Hide resolved