Skip to content

Implement Exponentiation via Associative Iteration #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 28 commits into
base: master
Choose a base branch
from

Conversation

jamierpond
Copy link
Collaborator

@jamierpond jamierpond commented Feb 27, 2024

Here is the headline test of this PR.

using S = SWAR<8, u32>;
constexpr auto base     = S::fromLaneLiterals({2,   3,  5,  6});
constexpr auto exponent = S::fromLaneLiterals({7,   4,  2,  3});
constexpr auto expected = S::fromLaneLiterals({128, 81, 25, 216});
constexpr auto actual = exponentiation_OverflowUnsafe(base, exponent);
static_assert(expected.value() == actual.value());
CHECK(expected.value() == actual.value());


exponent = S{static_cast<T>(exponent.value() << (NB - ActualBits))};
return associativeOperatorIterated_regressive(
x, S{1}, exponent, S{S::MostSignificantBit}, operation,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your neutral is, lane wise, { 0, ..., 1 }, so, the upper lanes are initialized to 0
You could S{meta::BitmaskMaker<BaseType, 1, BitsPerLane>::value()}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ooh thanks for that, that makes total sense!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would broadcast(S{1}) be more readable?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, but using broadcast is a complication because it relies on multiplication.
I have been slowly internalizing what needs to be done to support SIMD: no assumptions about the availability of whole-base-type multiplication. In ARM NEON, x86-64 SSE of 128 bits, there is no such thing, but at most lane-wise 64 bit multiplication, same with AVX 256 bits.

BitmaskMaker can be upgraded to support longer types.
The problem is that SIMD negate constexpr, it's going to give us pain to make the non-constexpr flavor of the library

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is, indeed, subtle.

Perhaps not so subtle in the SIMD constexpr negation...

@jamierpond
Copy link
Collaborator Author

@thecppzoo would you have some time tomorrow to go over some of this code together?

@@ -41,6 +41,15 @@ static_assert(
multiplication_OverflowUnsafe_SpecificBitCount<3>(Micand, Mplier).value()
);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GOOD NEWS. Looks like my initial, first-pass implementation was correct after all!
Seems like part of the issue was how I'm trying to compare the binary literals.

All value I tried when evaluating using the hex idiom you guys have already got provides consistently correct answers.

Comment on lines +8 to +9
.idea
**cmake-build**
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does your IDE really require building inline with the sources, or within the source tree?
I am loathe to pollute the repository with support for the anti-pattern of building within the sources; however, Visual Studio Code forced me to yield. VSCode is a poor tool that glitches in very frustrating ways when asked to build out of the source tree, that's the only reason for the test/.vscode, build and .cache.
So, unless your tool glitches, adopt the practice of not building in-tree and remove this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this such a bad thing? I think most tools / LSPs expect to build in the source tree? I think CLion does this by default and I think clangd by default looks for ./build in the root of the project?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just two things I dislike:
Most tools like the command line grep, find, etc. are extremely hard to configure for understanding .gitignore, so, you just slow down your work flow by having to treat the non-sources directories as explicit special cases with every non-git tool you use.

Let's say you want to build with GCC and also with Clang.
Who gets to build in-tree? both?
So, for every flavor of build you may want you need to provide an entry point within the tree, and add it to the .gitignore.
How's that helpful instead of segregating builds from sources?

These are two instances of the more general problem of disorganization. A freshly cloned repository ought to contain only files generated by people or by tools that are so sophisticated they may as well. Further processing of "sources" ought not to dump the results into a place where it is hard to tell them apart from files made by people.

It baffles me that colleagues expect that whatever configuration is good for holding source files is going to be just as good for built files. For example, what if I have a configuration of very small blocks at the file system level expecting that source code is among the very small files in the distribution of file size, but keep the build filesystem with large blocks, because of the same reason on the opposite?

See? there are multiplicity of day to day advantages of keeping things organized. Very fundamental stuff for Software Engineers.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry took a while to get back to this.

I understand both of these points, I think the overriding point from me regardless of my opinion is that this is how it is going to be used by people, so we might as well make it convenient for people to contribute. Many IDEs and tools like the build folder to be in the source tree. I'm using the clangd LSP which is hooking into nvim which I believe requires the compile_commands.json to be in ./build. JetBrains also will build in the source tree by default into cmake-build-[BUILD_TYPE].

Secondly, you can use tools like ripgrep in place of grep which are .gitignore sensitive, which is awesome.

Though I hear your points well and agree, this addition doesn't prevent more optimal workflows and enables anyone to contribute more easily.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue this conversation online, at this point, it is an item that should not go in this PR, but in another PR if we decide to support building in-tree.

Let's have PRs with a "single responsibility".

@thecppzoo
Copy link
Owner

Thanks for this effort.
Part of the value of re-expressing known techniques as what we call "associative iteration" is that this gets us closer to use any advantage we can find, later, about associative operations.
For example, we can identify cases in which the "counts" for the iteration are constants and device optimal addition chains for them.
Take cryptographic techniques: they may involve exponentiation (of some weird form of multiplication) by a constant: if it is a constant, chances are we could have compilation-time code that comes up with the optimal addition chain...

Comment on lines 45 to 47
constexpr auto base = SWAR<8, u32>{0x02'01'05'06}; // 2 | 1 | 5 | 6
constexpr auto exponent = SWAR<8, u32>{0x07'00'02'03}; // 7 | 0 | 2 | 3
constexpr auto expected = SWAR<8, u32>{0x80'01'19'D8}; // 128 | 1 | 25 | 216
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I reviewing code made by ChatGPT?
It's technically correct to substitute the undefined 0^0 by the perfectly defined 1^0. Still, aren't there more useful examples? Like 3^5 = 243?
Also, think about this suggestion:
expected = SWAR<8, u32>{(127 << 24) + (1 << 16) + (25 << 8) + (216 << 0)}
help us with some notation, macros may be considered, to initialize the lanes of a SWAR.
The explicit meta::BitmaskMaker<....>::value has been very tedious, error prone, time consuming for just repeating a constant, let alone initialize lanes to dissimilar values.

@@ -70,6 +71,19 @@ struct SWAR {

constexpr T value() const noexcept { return m_v; }

constexpr static T baseFromLaneLiterals(std::initializer_list<T> args) noexcept {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thecppzoo i wonder what you think about this utility?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we could do real literals for very popular cases such as

s4_64, s8_64, s16_64, s32_64;
s4_32, s8_32 , s16_32;
s4_16, s8_16;
s4_8;

A constructor from initializer list would be ambiguous since you could have an initializer list of only one element and then it would not be possible to distinguish between a literal with one element or a whole-swar initialization with a base type...

I am not able to decide this minute, but I thank you for posing the question.

I would like to improve the ergonomics, do you have ideas further down the road in this direction or is this just as far as your vision goes at the moment?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i was trying a few variants with variadic arguments but that was a little tricky when trying to iterate over them, i feel like there is yet a better solution.

this is one i was inspired for right now, but i think i will defo have more ideas with respect to ergonomics!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thecppzoo how does this update look? makes it unambiguous that the number of lanes to use.
how would you go about using the different types for the arguments here dynamically?

Comment on lines 75 to 79
T result = 0;
for (auto arg: args) {
result = (result << NBits) | arg;
}
return result;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is big endian, have you thought about which one ought to be the "canonical" endianness for literals?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should read most significant on the left, least significant on the right, like it is now.

@jamierpond jamierpond requested a review from thecppzoo March 20, 2024 20:00
@jamierpond jamierpond marked this pull request as ready for review March 20, 2024 20:06
@jamierpond jamierpond changed the title Exponentiation WIP Implement Exponentiation via Associative Iteration Mar 20, 2024
rm unused multiply

more tests
@Scottbruceheart
Copy link
Collaborator

'overflow unsafe' points out the follow-up to this PR that exponentiation via repeated multiplication via repeated saturating addition is an implementation option.

@thecppzoo
Copy link
Owner

'overflow unsafe' points out the follow-up to this PR that exponentiation via repeated multiplication via repeated saturating addition is an implementation option.

I don't know, Scott, perhaps it is better to focus on the flow of doubling the precision, saturating, and halving the precision.

In general, the "precision doubling" (casting to a pair of SWARs with double lane width, --how do we call this better than "precision doubling"?--) is a very useful technique, that we ought to illustrate often; see what I did with Lemire's conversion of ASCII to integer, see what the assembler of GLIBC did to implement strlen "masks".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants