Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide float and double versions of fmin and fmax. Improve docs. #7604

Merged
merged 4 commits into from
Sep 9, 2020

Conversation

andralex
Copy link
Member

@andralex andralex requested a review from ibuclaw as a code owner August 22, 2020 19:18
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @andralex!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#7604"

@kinke
Copy link
Contributor

kinke commented Aug 22, 2020

Thx; @thewilsonator has taken care of this too: #7463

IMO, the main obstacle for providing all/most overloads is D's poor overload resolution, see dlang/dmd#11271 for a draft.

@andralex
Copy link
Member Author

Thx; @thewilsonator has taken care of this too: #7463

I'd be glad if that were rebased and pulled then.

IMO, the main obstacle for providing all/most overloads is D's poor overload resolution, see dlang/dmd#11271 for a draft.

That's a much taller order.

@kinke
Copy link
Contributor

kinke commented Aug 22, 2020

Thx; @thewilsonator has taken care of this too: #7463

I'd be glad if that were rebased and pulled then.

It is (was?) blocked by insufficient overload resolution breaking a Buildkite project (and probably more projects using integer literals).

IMO, the main obstacle for providing all/most overloads is D's poor overload resolution, see dlang/dmd#11271 for a draft.

That's a much taller order.

The overload resolution wrt. integer literals seems to work better with a template, i.e., fmax(float, <integer literal>) is an fmax!float instantiation and not ambiguous like 3 explicit float/double/real overloads.

I previously preferred explicit overloads because I hate [float, const float, immutable float, shared float] etc. instantiations, but const F is a nice reduction (and well, I don't use shared at all). And isFloatingPoint!ifloat is false too.

@andralex
Copy link
Member Author

Now depends on #7605

@kinke
Copy link
Contributor

kinke commented Aug 23, 2020

What's the reason for the latest complication? Mixing types is probably a runtime performance killer (e.g., preventing vectorization), makes using LLVM intrinsics more complicated, and compile time probably suffers too. IMO, having to explicitly cast when really operating with mixed types is to be expected and forces the user to think about stuff like that ('accidental' double literals seem quite common).

@andralex
Copy link
Member Author

What's the reason for the latest complication? Mixing types is probably a runtime performance killer (e.g., preventing vectorization), makes using LLVM intrinsics more complicated, and compile time probably suffers too. IMO, having to explicitly cast when really operating with mixed types is to be expected and forces the user to think about stuff like that ('accidental' double literals seem quite common).

As far as I understand from your and others' notes, the functions needed to accept an integer. Maybe I misunderstood. I'd be glad to remove support for integrals.

@kinke
Copy link
Contributor

kinke commented Aug 23, 2020

Integer literals Integers are already accepted and implicitly promoted to the other operand's floating-point type, nicely preventing the overload resolution problem: https://run.dlang.io/is/yffc8F

Edit: Btw, __traits(isFloating) allows vector types too. That's beneficial for LDC, as the llvm_maxnum intrinsic works for vectors too.

@andralex
Copy link
Member Author

@kinke great, back to 1 template parameter. Thanks!

@kinke
Copy link
Contributor

kinke commented Aug 27, 2020

I very much prefer your first version, without std.algorithm.comparison.max - that's IMO unnecessary template bloat, see the codegen AST: https://run.dlang.io/is/6tIO0f - 53 lines vs. 226 lines.

Edit: Or with -vtemplates:

  Number   Unique   Name
       1        1   Unqual(T)
       2        1   Demangle(T)
       1        1   isNaN(X)(X x) if (isFloatingPoint!X)
       6        1   Unqual(T)
       1        1   fmax(F)(const F x, const F y) if (__traits(isFloating, F))
       1        1   isFloatingPoint(T)

vs.

  Number   Unique   Name
       1        1   genericIndexOf(args...) if (args.length >= 1)
       7        1   Unqual(T)
       1        1   max(T...)(T args) if (T.length >= 2)
       1        1   fmax(F)(const F x, const F y) if (__traits(isFloating, F))
       2        0   AliasSeq(TList...)
       1        1   __equals(T1, T2)(T1[] lhs, T2[] rhs)
       1        1   isIntegral(T)
       2        1   at(R)(R[] r, size_t i)
       1        1   safeOp(T0, T1)(auto ref T0 a, auto ref T1 b)
       1        1   IntegralTypeOf(T)
       2        1   isAggregateType(T)
       3        2   Unqual(T)
       2        1   Demangle(T)
       1        1   OriginalType(T)
       1        1   ModifyTypePreservingTQ(alias Modifier, T)
       1        1   Impl(T)
       8        8   isSame(ab...) if (ab.length == 2)
       2        2   Flag(string name)
       2        2   CommonType(T...)
       1        1   staticIndexOf(T, TList...)
       1        1   isFloatingPoint(T)
       1        1   safeOp(string S) if (S == "<" || S == ">" || S == "<=" || S == ">=" || S == "==" || S == "!=")
       1        1   isNaN(X)(X x) if (isFloatingPoint!X)
       1        1   MaxType(T...) if (T.length >= 1)

@PetarKirov
Copy link
Member

Ping @andralex

@andralex
Copy link
Member Author

Well I kinda like it with using min and max because whatever improvements we bring to those would be reflected in fmin and fmax. One counterargument is there aren't many improvements that could possibly be brought to floating point comparison. One other counterargument was made about template bloat - I very much believe we should resist subscribing to a programming-by-fear-of-bloat methodology.

I'll make the change because it does at a level feel excessive to reuse a one-liner.

@kinke
Copy link
Contributor

kinke commented Aug 30, 2020

I very much believe we should resist subscribing to a programming-by-fear-of-bloat methodology.

Well I'd say the fear is clearly justified in this case, and min/max (and probably a lot more Phobos stuff) definitely need some work to account for the probably vast majority of trivial cases (2 operands of the same arithmetic type), the bloat is IMO embarrassing and doesn't exactly favor compiler performance (possibly something for @n8sh?). And as all of that would now have been instantiated with templated f{min,max} as opposed to the previous 'import' of a precompiled function in Phobos, this wouldn't have set a nice precedent for more templatizing of std.math.

Thanks, lgtm now.

@andralex
Copy link
Member Author

I very much believe we should resist subscribing to a programming-by-fear-of-bloat methodology.

Well I'd say the fear is clearly justified in this case, and min/max (and probably a lot more Phobos stuff) definitely need some work to account for the probably vast majority of trivial cases (2 operands of the same arithmetic type), the bloat is IMO embarrassing and doesn't exactly favor compiler performance (possibly something for @n8sh?). And as all of that would now have been instantiated with templated f{min,max} as opposed to the previous 'import' of a precompiled function in Phobos, this wouldn't have set a nice precedent for more templatizing of std.math.

Thanks, lgtm now.

There'd be two counterarguments to this.

Per measurements done my myself and others, the real pain in templates most often comes from machine-generated recursive instantiations, not the meager instantiations created by user code. (ctRegex would be a sort of an exception as even one little use will instantiate a bunch of stuff). So this meager thing with a handful of instances creates no trouble one way or another, no two ways about it.

Second, it's just a defeatist way to go about things, a bad case of premature optimization. I remember people were worried about template bloat in C++ since like 1994. And some folks wrote and promoted libraries built with fear of template bloat (I remember a bunch of container libraries with inheritance and virtuals, oh boy) and - well they're in the dustbin of history. Nobody gives a rat's tail. And then there were folks who used templates everywhere and went from strength to strength - the standard library/STL, Boost, Eigen, you name it - and now pretty much anybody in C++ can't write a dozen lines of code without using some template. And guess what, some people kept on saying that template bloat will end up being the undoing of C++, and coders and compilers just keep on getting better and everybody's exchanging high fives.

So I'm serious: don't code in fear of template bloat. It's a losing proposition.

@andralex
Copy link
Member Author

What's the matter with the SumType failure?

@thewilsonator
Copy link
Contributor

Unrelated, fails on multiple PRs.

@andralex
Copy link
Member Author

Thanks, Nick. I won't make the mistake of force merging again though :o).

@wilzbach
Copy link
Member

Thanks, Nick. I won't make the mistake of force merging again though :o).

Well, someone else did it this time - #7611 ;-)
We should really remove the ability to do that...
Anyhow, Buildkite should be green soon.

@kinke
Copy link
Contributor

kinke commented Aug 30, 2020

Per measurements done my myself and others, ...

Some hard numbers with LDC v1.23 on Win64 for ldc2 -c [-d-version=Simple] and

F fmin(F)(const F x, const F y) @safe pure nothrow @nogc
if (__traits(isFloating, F))
{
    import std.math : isNaN;
    if (isNaN(x)) return y;
    version (Simple)
    {
        return y < x ? y : x;
    }
    else
    {
        import std.algorithm.comparison : min;
        return min(x, y);
    }
}

alias T = float;
T foo(T a, T b) { return fmin(a, b); }

~80 ms (Simple) vs. ~100 ms. When putting isNaN() in the same module instead of importing std.math, the same ~20ms difference (~60ms vs. ~80ms).

@andralex
Copy link
Member Author

andralex commented Aug 30, 2020

@kinke thanks for continuing what I think is an important discussion. So important, in fact, that I sat down and reproduced the measurements to get and share more insight into the matter. Hopefully we will be able to debunk a couple of myths together.

(To be clear, the discussion in this PR is settled; I agree it's excessive to import a module just for a one-liner.)

The numbers you posted consolidate two distinct overheads that would be good to distinguish. They are:

  • the overhead of importing a module; and
  • the overhead of instantiating a template from that module.

My baseline for the simple version is 39 ms (as time measures) for a warm build (repeat the compilation a few times, then measure). The version with the import clocks at 49 ms, i.e. clearly measurable.

It turns out, as one would probably expect, that in this case virtually the entire overhead is the sheer import of the module. To check that hypothesis, I reworked the code as follows:

F fmin(F)(const F x, const F y) @safe pure nothrow @nogc
if (__traits(isFloating, F))
{
    import std.math : isNaN;
    if (isNaN(x)) return y;
    version (Simple)
    {
        return y < x ? y : x;
    }
    else
    {
        import std.algorithm.comparison : min;
        return y < x ? y : x;
    }
}

alias T = float;
T foo(T a, T b) { return fmin(a, b); }

i.e. the non-simple version does the import but uses nothing from it. Time: 49 ms, virtually indistinguishable from the version that instantiates templates.

So importing a module has overhead. But it is understood and scales well, and that's why we don't go around and tell people to avoid imports and copypaste code instead. Within the context of actually using std.math, I didn't bother to measure but I think it's pretty obvious you won't be able to measure a percentage difference whether you import std.algorithm.comparison or not. There are already 5 top-level imports and a lot more local imports (among which std.algorithm.comparison itself). So, not a scaling problem.

I should add numbers for memory consumption (as /bin/time measures). Here they are, lowest of 5 consecutive runs:

  • simplest: 62852 KB
  • import but do not instantiate: 69216 KB
  • import and instantiate: 69544 KB

So the overhead of instantiating straightforward templates is hardly a blip on the radar. Of course we know that scales poorly for recursive templates. This all seems to support the basic point: we should not tell people to not use templates, just the same as we should not tell people not to import modules.

@kinke
Copy link
Contributor

kinke commented Aug 30, 2020

You're right, and I was aware of the obvious import factor and maybe should have emphasized that it's not just about the template bloat, but the potential extra import (when using fmin alone, it is indeed an extra import and causes ~20 ms overhead on my box, after adapting fmin in actual std.math). The bloat alone seems hardly measurable in this case, too much measuring noise.

This all seems to support the basic point: we should not tell people to not use templates, just the same as we should not tell people not to import modules.

Just to be clear, I've never suggested avoiding the min/max template per se, but tried to show that it's not optimized for the trivial case of 2 operands with the same arithmetic type, instantiating a lot of other extra templates (edit: incl. further importing std.functional, but that's already imported anyway), that's the bloat I was getting at.

@andralex
Copy link
Member Author

andralex commented Aug 30, 2020

@kinke yes, thanks very much for these insights!

Independently of all (and ironically): I looked through this module and it looks like a lot of the functions (sin, cos etc) follow the three-overloads pattern, not the templated functio pattern. Should we go with this instead of the current code for uniformity?

static foreach (F; AliasSeq!(float, double, real))
    F fmin(F x) { ... }
static foreach (F; AliasSeq!(float, double, real))
    F fmax(F x) { ... }

That would be more in keeping with the other functions, and we can use this pattern in the entire module. Cons:

  • more work
  • not sure what documentation will look like
  • unclear whether there's a real advantage

Pros:

  • we can use this instead of the current troikas everywhere in the module
  • there is a nice minimalism about the approach

@kinke
Copy link
Contributor

kinke commented Aug 31, 2020

The biggest problem with the 3 explicit overloads I've previously preferred is the overload resolution wrt. integers (some common literals like 0, 1, -1 etc.) mentioned earlier on (but only applying to non-unary functions); working around that (and so potential breakages) is IMO the biggest advantage of the single template, besides less (verbose) code. I think this template approach here sets a nice precedent, and functions currently still only accepting real should be modified accordingly, and later also the existing troikas, at least for small functions. The only catch is probably the template constraint, as __traits(isFloatingPoint) includes floating-point vectors too, so something like is(F == float) || is(F == double) || is(F == real) might be better suited in other instances.

@andralex
Copy link
Member Author

andralex commented Sep 8, 2020

Is the failure in SumType related? There are warnings about unreachable code in here:

@safe unittest {
	import std.algorithm.mutation: move;

	static struct NoCopy
	{
		@disable this(this);
	}

	alias MySum = SumType!NoCopy;

	NoCopy lval = NoCopy();

	MySum x = NoCopy();
	MySum y = NoCopy();

	assert(__traits(compiles, SumType!NoCopy(NoCopy())));
	assert(!__traits(compiles, SumType!NoCopy(lval)));

	assert(__traits(compiles, y = NoCopy()));
	assert(__traits(compiles, y = move(x))); // UNREACHABLE
	assert(!__traits(compiles, y = lval)); // UNREACHABLE
	assert(!__traits(compiles, y = x)); // UNREACHABLE

	assert(__traits(compiles, x == y)); // UNREACHABLE
}

@andralex
Copy link
Member Author

andralex commented Sep 8, 2020

CC @pbackus

@pbackus
Copy link
Contributor

pbackus commented Sep 8, 2020

Can't reproduce locally and buildkite is green now, so I guess there's no problem?

@Geod24
Copy link
Member

Geod24 commented Sep 9, 2020

The failure was due to a regression introduced in #7609
Well technically it was an unintentional bugfix which uncovered an issue in SumType, which @pbackus fixed.

@dlang-bot dlang-bot merged commit f97c2c0 into dlang:master Sep 9, 2020
@UplinkCoder
Copy link
Member

Caused a regression in our code.
@andralex you changed the API.

@andralex
Copy link
Member Author

andralex commented Sep 11, 2020

Yah, the breakage is in this code:

	static foreach(f;AliasSeq!( 	fabs,		sqrt,		sin,		cos,		tan,		asin,
					acos,		atan,		sinh,		cosh,		tanh,		asinh,
					acosh,		atanh, 		/*log*/		log2,		log10,		logb,
					log1p,		exp,		exp2,		expm1,		ceil,		floor,
					round,		lround,		trunc,		rint,		lrint,		nearbyint,
					rndtol,		gamma,		logGamma,	sgnGamma,	digamma,	logmdigamma,
					logmdigammaInverse,		erf,		erfc,		normalDistribution,
					normalDistributionInverse,	atan2,		fmod,		remainder,	isIdentical,
					fmax,		fmin,		gammaIncomplete, gammaIncompleteCompl,		gammaIncompleteComplInverse,
	))
		handlers.registerHandler!(doubleOverloadOf!f);

The code attempts to pick up the address of each function, including fmin and fmax, which causes compilation to fail. If we introduce three overloads as per #7604 (comment), I just tested and the build works. However, @kinke mentioned (#7604 (comment)) that the three overloads introduce other breakages wrt literals.

So that leaves us between a rock and hard place. Insights welcome.

Somewhat independently of that, we can - and should - fix the problem on Symmetry's side. The code is very fragile at the moment and makes assumptions it oughtn't:

// this assumes that exactly one double overload actually exists and is the first arg...
// safe for std.math but not in general.
private template doubleOverloadOfHelper(alias what) {
	static foreach(overload; __traits(getOverloads, __traits(parent, what), __traits(identifier, what))) {
		static if(is(Parameters!(overload)[0] == double)) {
			//pragma(msg, "****************", __traits(identifier, what));
			alias doubleOverloadOfHelper = overload;
		}
	}
}
// so this extra wrapper will fallback to old behavior if there isn't a double overload
private template doubleOverloadOf(alias what) {
	static if(!is(typeof(doubleOverloadOfHelper!what) ==  void))
		alias doubleOverloadOf = doubleOverloadOfHelper!what;
	else
		alias doubleOverloadOf = what; // best we can do for now
}

@wilzbach
Copy link
Member

So that leaves us between a rock and hard place. Insights welcome.

Considering that all the projects on Buildkite passed (it shows that there likely not going to be many projects affected) and you have to fix the Symmetry code anyhow, I would just do that and move on.

@UplinkCoder
Copy link
Member

@wilzbach yes, that seems okay.
though I'd still like to get our SIL project on the buildkite soonish.

@wilzbach
Copy link
Member

@wilzbach yes, that seems okay.
though I'd still like to get our SIL project on the buildkite soonish.

Nobody is preventing you from doing that. Though as I mentioned, you don't want it to be main Buildkite project, because logs would be public.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.