Skip to content

Refactor std.range.chain to reduce symbol size bloat. #5611

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

quickfur
Copy link
Member

See PR #5610 for further explanation.

This is another commonly-used range function that causes symbol size bloat. This fix alone caused a 28-30% reduction in executable size in my project.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @quickfur!

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@andralex
Copy link
Member

cc @rainers

As a matter of strategy: we want to improve implementation to make Voldemort types more convivial. Only if that is not practical we should start a refactoring of the standard library.

@quickfur
Copy link
Member Author

Actually, Voldemort types have other issues that have already forced the refactoring of several Phobos algorithms, such as map. Ironically, that has slowed down the symbol bloat problem enough that I was able to write my code up to this point, because had map returned a Voldemort type, my code would have exceeded the 10MB symbol size limit long ago.

One issue that I can think of off the top of my head is that functions with alias parameters cannot return Voldemort types, because it will cause access errors: a nested struct cannot access local variables passed via the alias parameter, whereas a module-global template can.

In general, I agree with your sentiment that we need to improve the implementation of Voldemort types... but making them usable seems a long way away (Rainers' PR has been rotting in the queue for almost a year now, and as I mentioned there are other related issues that have already been worked around in Phobos), and I need my code to work now. I find it hard to swallow, for example, that my code is only ~8000 lines (a good part of which are comments and unittests, mind you), yet I'm already running into symbol sizes so huge they are overflowing dmd's symbol size limit and causing linker errors (segfaults and other ELF errors). Where would larger projects be, once they start buying into the range-based idiom we're promoting?

So I'd say, also as a matter of strategy, that it's a wiser idea to put in the current stop-gap measures to make Phobos at least nearer to usable first, and then clean things up once we actually have some hope of making Voldemort types work as they ought to.

@andralex
Copy link
Member

@rainers has been actively working on this and essentially we'll do what it takes to get his changes in, so let's hear him out. cc @WalterBright @MartinNowak

@JackStouffer
Copy link
Member

I have to agree with quickfur. From my perspective, the range piping idiom is such a huge selling point of D that either the DMD PR gets priority or we start adding fixes to Phobos. The benefit of PRs like this is that can be easily reverted after the DMD PR gets pulled.

@quickfur
Copy link
Member Author

P.S. As an addendum, @schveiguy's "horcrux" technique actually retains basically all of the benefits of Voldemort types, without its drawbacks:

  1. Since the type is hidden inside the template namespace of an eponymous template, it's actually impossible for outside code to access it: templateName.typeName is illegal because, being an eponymous template, templateName always resolves to the function rather than the template. Thus, it retains the encapsulation of Voldemort types.

  2. Having the template scope explicitly present allows us to define symbols used either by the function or the struct, without risking symbol hijacking.

    • For example, writing import std.range; inside the function body risks hijacking local variables by unintentional name collisions in std.range; you have to either use static imports or import specific symbols to prevent this. Writing the import in the template scope, OTOH, prevents this problem because local variables would shadow any symbols pulled in by the import.
    • Furthermore, any imports in the struct run the risk of leaking imported symbols to the caller, due to the way imports work in D; putting the import in template scope hides it in the inaccessible eponymous template so that there's no chance of leakage.
  3. Since the struct inside the eponymous template qualifies as a module-global template struct, it is not affected by problems with Voldemort types like alias parameters having access permission issues. So it enjoys the benefit of a separate, module-global template struct, yet without the drawback of longer symbol lengths, and also has better encapsulation since you can't name it from outside, whereas a module-global template can be named from outside, and the only protection is to put private on it.

  4. Due to the way mangling currently works, putting the type inside the namespace of the eponymous template makes the resulting symbol much smaller -- even smaller than a module-global template (because in the latter case we have to instantiate two templates rather than just one). By applying this very simple technique to my range-based code, combined with the Phobos refactorings in this PR and the linked PR, I have managed to reduce a 600MB executable to 8MB. This works today, rather than in some nebulous future when Rainers' PR finally gets merged. (And even then, Rainers' PR coupled with this refactoring will actually have additional benefits, since the smaller symbols produced will further compressed by Rainers' PR, as opposed to the ridiculously huge Voldermort symbols that a self-referencing mangling scheme can only go so far to compress.)

Given all of the above points, I'd say this is a win-win situation. The only "loss" is the ego of discovering that Voldemort types are perhaps not the best of idioms they first appeared to be. But the bright side of that is, horcrux types may turn out to be the better idiom that we're looking for. ;-)

@andralex
Copy link
Member

One possibility is to have the compiler do the same transformation automatically, and use the same name for mangling that results from the use of the template.

Voldemort types are the path of least resistance of achieving a need in a principled manner: the function defines a type and returns a value of it. The template is a mechanical trick that obscures intent rather than clarifying it. We need to make implementation better.

@schveiguy
Copy link
Member

I think the patch from @rainers is imminent. I kind of want to wait until that's merged, because I still think we can get some savings from this PR, and I want to see how much. It might still be worth merging this, even with the mangling changes.

But at the same time, this changes 0 documentation, and can be done orthogonal to the mangling update (and will fix a bunch of problems today). This PR has no conflicts, whereas the mangling update may break some code, and has lots of hairy issues.

The downside is the readability of the source -- this is harder to understand than straight-up function templates.

I'm mixed on whether this should be merged. If this were last year when the mangling was a proof-of-concept, I'd say do it now.

If we do do this, I'd say reorder the declaration so the Result comes first.

@schveiguy
Copy link
Member

One possibility is to have the compiler do the same transformation automatically

As long as the struct doesn't use compile-time definitions from inside the function, this should be trivial. Should be the majority of cases since most wrapper things like this just spend most of the function defining the range struct, and then the implementation is to return one. I have the same experience in iopipe.

Would be a nice project.

@jmdavis
Copy link
Member

jmdavis commented Jul 14, 2017

One possibility is to have the compiler do the same transformation automatically, and use the same name for mangling that results from the use of the template.

I've wondered about that ever since Steven first brought up the issues with Voldemort types. I would have thought that the obvious solution would be to transform them into the other at least as far as mangling goes, but I figured that there was something I was missing that made that not work. Still, it at least seems like if all it takes to fix the mangling problem is to move the type out of the function that it would be possible to do it automatically.

@schveiguy
Copy link
Member

schveiguy commented Jul 14, 2017

As long as the struct doesn't use compile-time definitions from inside the function

You know, I'm actually thinking we don't even need to require this for the transformation. It's really just a guarantee that the name doesn't conflict with another name (i.e. the symbol is unique). I think all we require is that there is NOTHING in the template namespace besides the eponymous function, and then we can just simplify the symbol without having to pretend it's outside the function itself. I think you could even have non-static structs be named better.

I bet the DMD changes wouldn't even be that hard.

Update: Hm... there could be multiple template function overloads. In that case, there would be a conflict. So I'm not sure it works in all cases.

@schveiguy
Copy link
Member

I tested, and actually it doesn't work at all. Redefining "Result" inside another identical template (but with different runtime parameters) is completely ignored:

template foo(T)
{
    struct Result
    {
        int x;
    }
    auto foo(T t)
    {
        return Result(1);
    }
}

template foo(T)
{
    struct Result
    {
        long x;
    }
    auto foo(T t, int y)
    {
        //return Result(long(y) << 32L); // error! thinks I'm referring to Result in first overload!
        return Result(y);
    }
}

void main()
{
    import std.stdio;
    auto s1 = foo("hi");
    auto s2 = foo("hi", 3);
    static assert(s1.sizeof == s2.sizeof);
    static assert(typeof(s1).mangleof == typeof(s2).mangleof);
}

I think there's a bug here somewhere. Either the second Result should be used for the second overload, or it should be an error to redefine.

@quickfur
Copy link
Member Author

quickfur commented Jul 14, 2017

@schveiguy said:

If we do do this, I'd say reorder the declaration so the Result comes first.

Sure, that's trivial to do. I kinda like moving Result to the bottom, though, so that it's more clear that this is an eponymous template. Besides, Result is an implementation detail that needn't occupy the prime real estate right after the template declaration. But that's just a matter of taste, I suppose.

@wilzbach
Copy link
Member

Quoting @UplinkCoder from the forum:

You will be excited to hear that my template work will fix the described problem as a side effect.

Without incurring run-time overhead. Or forcing the usage of OO wrappers.
The downside however is that the scheme I have in mind is very complex and might take over a year to implement.

While of course it would be nice to do this stuff automatically in the compiler, applying the horcrux trick to the critical bits will have a huge impact now.

I kinda like moving Result to the bottom, though, so that it's more clear that this is an eponymous template. Besides, Result is an implementation detail that needn't occupy the prime real estate right after the template declaration. But that's just a matter of taste, I suppose.

I can't agree more, but whenever sth. like this pops up, we should declare one way as standard in the style guide and stop wasting time on such matters. In some cases we are even able to automatically enforce a specific style.

@rainers
Copy link
Member

rainers commented Jul 15, 2017

IMO the new mangling is ready for prime time. It would be great if someone actually hurt by the huge symbols could build dmd with dlang/dmd#5855 and the runtime with dlang/druntime#1830 and see how it works out.

@dukc
Copy link
Contributor

dukc commented Jul 18, 2017

I once tried to make it so that chain(a, chain(b, c)) would be inferred as chain(a, b, c) but did not manage to. I believe that if somebody manages to do that it could also shorten the symbols quite a bit.

@JackStouffer
Copy link
Member

Now that dlang/dmd#5855 has been pulled, what's the process for getting it into master so that this patch isn't needed?

@quickfur
Copy link
Member Author

@JackStouffer Not so fast; that pull was only merged into a special branch. It hasn't landed in master yet. That won't be until dlang/dmd#6998 is merged.

@JackStouffer
Copy link
Member

Right

what's the process for getting it into master

@wilzbach
Copy link
Member

what's the process for getting it into master

Hehe ensuring that it doesn't cause performance regressions or other issues. Otherwise I think everyone is in favor of this PR and it looks like we might see this PR getting merged quite soon. It could even be this weekend :)

@JackStouffer
Copy link
Member

Adding the blocked label because it's still unclear what the consensus is regarding pulling this vs getting the new mangling in master

@schveiguy
Copy link
Member

pulling this vs getting the new mangling in master

The two are not mutually exclusive. I'd like to see the effect of both.

@quickfur
Copy link
Member Author

Finally, dlang/dmd#6998 has merged into master!

I did a quick test with git HEAD dmd/druntime/phobos on my range-heavy project. Here are the executable sizes, compared with the sizes compiled with a local branch of Phobos containing this PR and #5610:

The difference is 57192 bytes and 46824 bytes, respectively. Percentage-wise, that's a pretty small improvement, but in terms of size, that's an average of 52 KB savings just in symbol sizes alone; the code is exactly the same. I'll leave it to you guys to decide whether or not this is worthwhile. :D

@JackStouffer
Copy link
Member

Hmm, seems like the transformation that Andrei mentioned will still give us some savings then.

Here's the things blocking this in my mind:

  1. Do we want to lose some readability in Phobos or not
  2. Is there a problem with promoting the Voldemort pattern in D evangelization when we won't use it in our own code

@wilzbach
Copy link
Member

wilzbach commented Dec 1, 2017

From @JackStouffer:

Here's the things blocking this in my mind:

  1. Do we want to lose some readability in Phobos or not
  2. Is there a problem with promoting the Voldemort pattern in D evangelization when we won't use it in our own code

I agree on this. As the Voldemort pattern is frequently used in Phobos and other libraries, I don't think we should go on the road of addressing this problem per function, but once in the compiler for all symbols. If people still complain about the symbol size after using rainer's great backreference mangling, we could also look into going the "ugly" road for the most important symbols.

What do you think/vote? Merge or close?
CC @schveiguy @rainers @MetaLang @jmdavis @andralex

@MetaLang
Copy link
Member

MetaLang commented Dec 1, 2017

Is this still necessary now that the symbol backref patch has been merged?

@schveiguy
Copy link
Member

It's not necessary, the question is, is it worth the incremental improvement? I'd also much rather see the compiler do this, as it would immediately generate lots of savings all places voldemort types are used.

I would say let's close this, especially since it only addresses one function. At the same time, I reopened the enhancement request that I closed, noting the issues involved. If someone wants to tackle that, they can: https://issues.dlang.org/show_bug.cgi?id=17379

@schveiguy schveiguy closed this Dec 1, 2017
@quickfur
Copy link
Member Author

quickfur commented Dec 1, 2017

Even with the symbol backref patch, this change still does give some amount of savings. Not that much anymore, but still some. But yeah, it's probably better to do it once in the compiler rather than many times for each function in Phobos. And then for each function in user code.

@quickfur quickfur deleted the horcrux-chain branch December 1, 2017 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants