Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an expect intrinsic #1131

Merged
merged 2 commits into from
Jun 10, 2015
Merged

Conversation

Aatch
Copy link
Contributor

@Aatch Aatch commented May 20, 2015

`if expect(foo == bar, false) { .. }`.

The expected value is required to be a constant value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you mean to use something like llvm.expect.i8 and converting the booleans to i8s, but this could be included in the Detailed Design.

Likewise, when I tried this myself by creating extern fn expect_i8 etc functions, linking against llvm, I couldn't find the optimizations in the asm. It'd be nice to see the improvements that such a function could possibly bring in this RFC.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanmonstar actually, you can use llvm.expect.i1. As for improvements, that's hard to show since benchmark code tends to warm up the prediction table pretty quickly. It also isn't always visible in the asm if the code was already structured such that the blocks were ordered appropriately.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but we don't have an i1, the smallest in rust is i8.

True about the benchmarks. I meant the compiled asm having the jump altered, before and after using expect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanmonstar actually, bools are now treated as i1s when generating LLVM IR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, very cool. That confusion may be even more reason to include the implementation details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I left it out of the detailed design because I don't think the precise implementation details for a specific backend are important. Just the semantics of the intrinsic.

@seanmonstar
Copy link
Contributor

However, 👍 assuming there is a gain to found in there. This is one of the bigger pieces missing so that httparse can be as fast picohttpparser.

@nrc nrc added the T-lang Relevant to the language team, which will review and decide on the RFC. label May 20, 2015
@nrc
Copy link
Member

nrc commented May 20, 2015

This is something I'd like to see in Rust. However, the suggested 'syntax' seems weird (though some what traditional and I'm guessing close to llvm) from both from a use and implementation point of view: why is the intrinsic on the condition when the hint is about which branch is taken? If we start doing our own optimisations, I assume we'd have to be careful about any kind of code motion here in order to not break hinting to llvm.

Could we instead have an attribute on the if expression? Or could we make guarantees not to re-order branches and rely on llvm always treating the if case as more likely than the else case? (I realise that is not very discoverable, but this is a feature for advanced developers).

@pythonesque
Copy link
Contributor

why is the intrinsic on the condition when the hint is about which branch is taken

IIRC, this is because expect is somewhat more general than likely / unlikely (so you can tell LLVM to expect a value even if a jump isn't immediately dependent on it).

I'm fully in favor of this RFC... this is very important for getting optimal codegen in many cases and I've personally been wishing for it. Right now I've been resorting to #[cold] to get the desired behavior in a few cases, but it's too blunt of an instrument to use everywhere.

@nagisa
Copy link
Member

nagisa commented May 20, 2015

fn(bool, bool) -> bool does not allow to easily “expect” for some variant of enum unless is_variant methods are provided.

@quantheory
Copy link
Contributor

To me, the fact that a constant expression is necessary seems to argue for likely/unlikely. (Or maybe for making it an expect! macro, which can insert something to assign the RHS to a constant, and thus force the compiler to prove that it gets a constant expression.) In any case, the RFC should probably at least mention what happens if you incorrectly use a non-constant expression as an argument to expect. I.e., is that an error that the compiler detects, or is the hint just ignored?

@nagisa The enum variant case could be done with dependent types, sort of:

fn expect<const C: T, T=bool>(expr: T) {
    /* ... */
    expr
}
if expect::<true>( /* some expr */ ) { /* ... */ }
match expect::<Variant1, MyEnum>( /* some expr */ ) {
    Variant1 => { /* ... */ }
    _ => { /* ... */ }
}

But that's a bit clunky and doesn't deal with patterns (e.g. if you want to state that a branch for Variant1 | Variant2 is likely). I think that match patterns probably could only be dealt with as in @nrc's solution, i.e. somehow annotating the construct that's doing the branching, rather than an expression used in the condition being tested.

Edit: My original example didn't make any sense at all. Hopefully it's better now.

@Aatch
Copy link
Contributor Author

Aatch commented May 20, 2015

@quantheory yeah, I'd be happy with likely/unlikely too, it's pretty much how we handled similar restrictions for atomic operations (since the ordering has to be a constant).

@nrc Ok, where would the attribute go? On the if itself? Where would you put it if you wanted to hint an else if branch? What if you're taking advantage of short-circuiting and only want to hint the second part of an and or or? The only reasonable place is to add the attribute to the condition, since you're not hinting branch, you're hinting the likelyhood of the condition the branch is dependent on. Unconditional branches don't need hinting.

@nagisa most of the time use of branch hinting doesn't help at all. The fact that you can't hint about match expressions isn't really a concern I have. This is quite simply to help tune code where the penalty of a misprediction can end up quite large. Most of the time, the only difference this will make to codegen is block reordering (for example, we hint the overflow checks as being unlikely, which pushes the panics down to the bottom of the function).

@llogiq
Copy link
Contributor

llogiq commented May 21, 2015

@quantheory 👍 for likely(b: bool)/unlikely(b: bool) – it's certainly more readable than if expect!(my_predicate, true) { … }, does not need any special handling and will be completely sufficient to get the majority of the (rare) cases where giving the compiler a hint about branches matters.

@nrc
Copy link
Member

nrc commented May 21, 2015

@Aatch Seems reasonable to have the annotation on the condition. What about the idea of always assuming the if branch is always expected? Do you think that is impractical?

@seanmonstar
Copy link
Contributor

I'd rather it not be an attribute, as that would make it harder when an if statement includes different conditions with different likelihoods. Take this example: https://github.com/h2o/picohttpparser/blob/master/picohttpparser.c#L173

@Stebalien
Copy link
Contributor

@nrc

What about the idea of always assuming the if branch is always expected? Do you think that is impractical?

That would lead to bad codegen when checking error conditions:

if some_unexpected_error_condition {
    return Err(...);
}

Aside:

It would also be nice to get likely/unlikely attributes for enum variants:

enum Result<V, E> {
    #[likely]
    Ok(V),
    #[unlikely]
    Err(E)
}

These wouldn't override an explicit likely/unlikely call on a branch but would provide sane defaults. Unfortunately, this would likely take quite a bit more work to implement.

@Aatch
Copy link
Contributor Author

Aatch commented May 22, 2015

@nrc having the attribute on the condition is ultimately identical to having a function call wrapping the condition, no? I'd argue that an attribute on the condition itself would be really unclear, especially since you'd probably need to wrap the condition in parens anyway to be sure that it's applying to the correct expression.

What this means is that instead of if likely(foo == bar) { .. } you have if #[likely] (foo == bar) { .. }. How is that any better?

@seanmonstar
Copy link
Contributor

I really like the attribute on variants idea by @Stebalien

@nrc
Copy link
Member

nrc commented May 22, 2015

@Aatch sorry, I was asking about the idea of always assuming the if branch is expected, not the attribute idea. But I think @Stebalien answered pretty well. I'm pretty much convinced that the suggested path is the best one, but I just want to check there isn't some way the implicit solution could work - could we guarantee that if there is an if and else branch then we expect the if branch otherwise there is no expectation? AIUI, this is the way that optimising compilers and branch prediction hardware work (plus a whole bunch of heuristics on top of that). I imagine that any level of guarantees for this kind of optimisation is just making things complicated between Rust and LLVM though. I guess I'm just a little bit afraid of preserving behaviour here if we add our own optimising passes in the future - having a special case function call that the compiler needs to treat differently to everything else scares me.

@Aatch
Copy link
Contributor Author

Aatch commented May 22, 2015

@nrc, I don't think we should have any guarantees for this feature beyond "doesn't change the observable behaviour of the function", meaning we could just ignore it if we wanted without technically doing anything wrong. In fact, there's no guarantee that it will do anything anyway. LLVM might have ordered the blocks the same way with or without the hint.

I guess what I mean is that there isn't any behaviour to preserve for branch hinting. The point is that stuff like ordering blocks is pretty much arbitrary (especially in LLVM IR where you have to have a terminator, no fallthrough) so compilers make a best guess at how to do it. This just gives them another piece of information they can use to do so. If using expect, likely or unlikely changes what your code actually does, that's a bug.

@bluss
Copy link
Member

bluss commented May 22, 2015

It would also be nice to get likely/unlikely attributes for enum variants:

@Stebalien These hints shouldn't be used in a blanket way like this. The default should always be to let the branch prediction do its job.

likely and unlikely seems like the most readable API. I would like if the eventual docs recommend profiling and not using these indiscriminately.

I'm skeptical of this. Are we giving programmers the illusion of control? Isn't profile guided optimization the correct approach? It also appears that branch prediction hints are being obsoleted with newer microarchitectures.

@bluss
Copy link
Member

bluss commented May 22, 2015

I tested the basic benchmark in picohttpparser with and without the likely/unlikely hints. Just to show that it makes a significant difference. I don't know the specifics of this code. Both clang-3.5 and gcc-4.9 reach the same performance with hints.

clang-3.5 -O3 -DNDEBUG -Wall bench.c picohttpparser.c
perf stat -r 3 -- ./a.out

clang-3.5 with likely/unlikely

       8726,429083      task-clock (msec)         #    0,999 CPUs utilized            ( +-  0,48% )
                15      context-switches          #    0,002 K/sec                    ( +- 13,33% )
                 6      cpu-migrations            #    0,001 K/sec                    ( +- 25,64% )
                42      page-faults               #    0,005 K/sec                  
    14 896 309 916      cycles                    #    1,707 GHz                      ( +-  0,76% )
       665 293 679      stalled-cycles-frontend   #    4,47% frontend cycles idle     ( +- 17,51% )
       217 128 725      stalled-cycles-backend    #    1,46% backend  cycles idle     ( +- 18,56% )
    50 496 839 534      instructions              #    3,39  insns per cycle        
                                                  #    0,01  stalled cycles per insn  ( +-  0,00% )
    13 091 246 811      branches                  # 1500,184 M/sec                    ( +-  0,00% )
        10 015 406      branch-misses             #    0,08% of all branches          ( +-  0,25% )

       8,732213677 seconds time elapsed                                          ( +-  0,48% )

clang-3.5 without likely/unlikely

       9119,914947      task-clock (msec)         #    0,999 CPUs utilized            ( +-  0,06% )
                37      context-switches          #    0,004 K/sec                    ( +- 58,36% )
                 8      cpu-migrations            #    0,001 K/sec                    ( +- 35,59% )
                43      page-faults               #    0,005 K/sec                    ( +-  1,56% )
    15 524 708 328      cycles                    #    1,702 GHz                      ( +-  0,05% )
       228 164 325      stalled-cycles-frontend   #    1,47% frontend cycles idle     ( +-  2,36% )
       217 936 659      stalled-cycles-backend    #    1,40% backend  cycles idle     ( +-  0,34% )
    51 077 255 602      instructions              #    3,29  insns per cycle
                                                  #    0,00  stalled cycles per insn  ( +-  0,00% )
    13 581 323 829      branches                  # 1489,194 M/sec                    ( +-  0,00% )
        10 611 136      branch-misses             #    0,08% of all branches          ( +-  3,26% )

       9,126001095 seconds time elapsed                                          ( +-  0,06% )

the difference is small but it's there.

@nikomatsakis
Copy link
Contributor

First, this seems like an important area to have support for in general. When I was at the NY meetup, I was approached by a few people working in the "HFT" area who had all sorts of questions about low-level codegen details like this. In most cases, the answer was basically "well, doing that optimization would be LLVM's job, but we'd probably have to expose some primitive".

Anyway, I am currently in favor of likely/unlikely (in preference to the current text) as well, but @Aatch I think it would be great to see some discussion of the approaches taken by other compilers, or projects. We've seen already what picohttpparser uses, but I know that many codebases (the linux kernel, the firefox code base, etc) incorporate some sort of hints, and it'd be nice to know what they use.

  • Data point: Firefox uses MOZ_LIKELY and MOZ_UNLIKELY

@nikomatsakis
Copy link
Contributor

Oh, the other question I was wondering: how fragile are LLVM's optimizations with regard to the expected value? (i.e., with regard to @nrc's point about us having to be careful when we reorder code etc). I would sort of expect that the expectation is associated with an SSA value produced by the expect intrinsic, or something similar, and that hence it is fairly robust.

@nikomatsakis nikomatsakis self-assigned this May 22, 2015
@Aatch
Copy link
Contributor Author

Aatch commented May 23, 2015

@nikomatsakis well C/C++ exposes __builtin_expect, which matches the signature of the LLVM intrinsic (probably not a coincidence). Most projects I've seen define likely and unlikely macros around it. I don't know about other languages, a quick search suggests that the C/C++ are the only languages that provide these hints.

As for how LLVM actually does it, it alters the weights of branches where the condition is the result of the intrinsic. It processes all the branches in the function first before replacing llvm.expect with the passed value.

@nikomatsakis
Copy link
Contributor

@Aatch I think you mean gcc exposes __builtin_expect, not C/C++, right? But I was wondering whether any projects actually use it directly: as you say, all examples we've seen so far do not. I am also therefore in favor of exposing likely and unlikely instead.

@huonw
Copy link
Member

huonw commented May 27, 2015

It seems to me that likely(x == 20) is essentially the same as expect(x, 20)? (It certainly seems like a peephole optimisation/normalisation that LLVM might do, or, if not, rustc could do.)

@bluss
Copy link
Member

bluss commented May 27, 2015

likely(x == 20) is the same as expect(x == 20, true), the expectation is on the conditional.

@huonw
Copy link
Member

huonw commented May 27, 2015

Yeah, I understand the desugaring, I'm more interested in the semantics. (If you're talking about semantics, what's the difference between expectation on the conditional vs. expectation on the values?)

@eternaleye
Copy link

@bluss, @huonw

On the one hand, LLVM does have the i8 expect intrinsic. In that sense, @huonw may be correct.

On the other hand, that will in many cases get expanded right back out again for the underlying machine, to some variety of jump-if-equal. (or a branch prediction hint of some sort taking similar arguments)

@bluss
Copy link
Member

bluss commented May 27, 2015

@huonw consider likely(x > 0).

@huonw
Copy link
Member

huonw commented May 27, 2015

@bluss while true, I'm actually focusing on the == case. It seems to me that likely/unlikely are more natural in many ways (e.g. your example), so I'm trying to explore reasons for why we might possibly want a raw expect. The only reason I can see is if expect(x, y) is somehow "better" than likely(x == y), but, as my parenthetical stated, I don't think it would be, since, e.g. the compiler can untangle it. (Basically just looking for confirmation.) In other words, does implementing expect in terms of likely lose us anything?

I'm only second guessing myself because the C compilers expose it as expect rather than likely/unlikely.

@eternaleye
Copy link

Michael Kerrisk's post regarding the performance impact of likely()/unlikely() may be worth mining for documentation notes.

He maintains the Linux manpages project, and one particularly useful point in his post is where the breakeven point is on the benchmark he provides: 1/10^5 (EDIT: typo, actually 1/10^4) incorrect predictions is about where likely() and unlikely() begin to hurt performance more than they help.

@Aatch
Copy link
Contributor Author

Aatch commented Jun 5, 2015

@eternaleye that is interesting. The fact that 1/10^5 is the "break-even" point is surprising. At any rate, there's a reason my motivating example is a branch not taken in about 1/(2*10^19) cases. I'll keep stuff like this in mind when documenting the intrinsics though.

@Aatch
Copy link
Contributor Author

Aatch commented Jun 6, 2015

Since the end of the final comment period is approaching and it's likely that this will be accepted when it does, I started on implementation. Turns out that the way we translated function calls meant that the intrinsics didn't actually work, but I improved that, so it all works now. The branch is here: https://github.com/Aatch/rust/tree/branch-hinting

@llogiq
Copy link
Contributor

llogiq commented Jun 6, 2015

👍 after having read the diff.

@brson
Copy link
Contributor

brson commented Jun 8, 2015

I'm ok with the unstable intrinsics but still share @nrc's concern about how these are exposed. In Rust they will just look like functions, which I think is pretty damn misleading. At least in gcc they look like magic functions.

@Aatch
Copy link
Contributor Author

Aatch commented Jun 9, 2015

@brson sure, the lower-level __builtin_expect looks a little bit special, but the vast majority of projects create macros that wrap it in a way that they look exactly function calls. Furthermore, how does gcc putting __builtin at the front make it seem any more magic than our std::intrinsics?

In gcc, nobody uses __builtin_expect directly in their code. All code I've seen using branch hinting has if (LIKELY(cond)) { ... } or similar. Should I make the intrinsics capitalised? Not that all projects use capitalised macros anyway.

I only think them being functions is misleading if you don't actually know what they do. In which case I ask: why the hell are you using them then?! The actual use of them is pretty clear: if likely(cond) { .. }, don't deviate from that and you'll be fine. And remember, the absolute worst case scenario for using these incorrectly is that they change nothing at all. Unlike, say, assume where using it incorrectly can break your program completely!

@Valloric
Copy link

Valloric commented Jun 9, 2015

Agreed with @Aatch. Let's not forget why gcc uses a __builtin_ prefix here: the C & C++ standards say that identifiers that start with __ are reserved for the compiler. builtin is used as a namespacing prefix on top of __. So the only reason why we don't have namespace::expect() in C is because there's no namespacing in C.

Let's not try to infer some sort of grand design for the weird-looking names of intrinsic functions in C; there's no such thing, the names are weird because of technical restrictions that Rust thankfully doesn't have.

@nikomatsakis
Copy link
Contributor

It's official... the language subteam has decided to accept this RFC. Thanks @Aatch!

@nikomatsakis
Copy link
Contributor

Tracking issue: rust-lang/rust#26179

@nikomatsakis nikomatsakis merged commit 31d230b into rust-lang:master Jun 10, 2015
@nikomatsakis
Copy link
Contributor

@ticki
Copy link
Contributor

ticki commented Jan 16, 2016

What is the state of this?

@Aatch
Copy link
Contributor Author

Aatch commented Jan 26, 2016

@ticki, implementation turned out to be rather hard, if you actually wanted the intrinsics to do anything at least. It's turned out to simply be easier to wait for work on MIR to progress and build them on top of that.

@ticki
Copy link
Contributor

ticki commented Jan 27, 2016

@Aatch, the intrinsics will currently just send an instruction to LLVM to assume a statement to be true. (I thought about the assume intrinsics).

@Aatch
Copy link
Contributor Author

Aatch commented Jan 29, 2016

@ticki the macro you added? These intrinsics aren't implemented in the compiler at all yet, so I'm not sure what your macro is doing. The likely/unlikely intrinsics work differently to assume, and LLVM is very finicky about the relationship between the generated expect and the branch being hinted. The way the compiler currently generates LLVM code isn't what it wants so the intrinsics end up just being ignored.

@ticki
Copy link
Contributor

ticki commented Jan 29, 2016

Oh! My bad. I was speaking about the assume intrinsics.

I wonder why the intrinsics cannot just use LLVM's branch weight metadata?

@jonas-schievink
Copy link
Contributor

The feature name of the RFC still says expect_intrinsic

@ofek
Copy link

ofek commented May 5, 2017

implementation turned out to be rather hard, if you actually wanted the intrinsics to do anything at least. It's turned out to simply be easier to wait for work on MIR to progress and build them on top of that.

@Aatch Does this do anything now that we have MIR?

@joshlf
Copy link
Contributor

joshlf commented Sep 6, 2017

Are there thoughts on how to get this to work with if let? I haven't seen any mention of if let in the doc or this discussion.

@emberian
Copy link
Member

emberian commented Sep 6, 2017 via email

@Centril Centril added A-intrinsic Proposals relating to intrinsics. A-hint Hints related proposals & ideas A-optimization Optimization related proposals & ideas labels Nov 23, 2018
@elichai
Copy link

elichai commented Oct 18, 2019

Any news on that?
been a couple of years, MIR is in stable, anything going on in that direction?

@SimonSapin
Copy link
Contributor

Merged PRs are generally considered done and their comment thread not a great place to have further discussion. After accepting a Rust RFC (merging the PR) we open a “tracking issue” to discuss its implementation. This one is at rust-lang/rust#26179

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-hint Hints related proposals & ideas A-intrinsic Proposals relating to intrinsics. A-optimization Optimization related proposals & ideas final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.