Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce @branchWeight builtin with numerical branch weights #20642

Closed
andrewrk opened this issue Jul 15, 2024 · 17 comments
Closed

introduce @branchWeight builtin with numerical branch weights #20642

andrewrk opened this issue Jul 15, 2024 · 17 comments
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@andrewrk
Copy link
Member

andrewrk commented Jul 15, 2024

Originally specified at #489 (comment).

This proposal supplants:

Summary:

  • No "expect" builtin or similar
  • Each branch gets a default weight of 10
  • Error branches (weight error branches against errors #84) get a default weight of 1
  • Branch weight can be overridden with @branchWeight builtin.
  • Branch weights are of type u32

This generalizes better to switch expressions.

This allows source code to be annotated with PGO data.

@andrewrk andrewrk added the proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. label Jul 15, 2024
@andrewrk andrewrk added this to the 0.14.0 milestone Jul 15, 2024
@mlugg
Copy link
Member

mlugg commented Jul 16, 2024

The only issue I see here is that this doesn't work so well with #8220, because @branchWeight tells us nothing about the source of the branch; perhaps it is unlikely for a given prong to be reached from the initial switch, but very likely to be reached from a labeled continue. You could also argue that this applies to loops: maybe a loop body is unlikely to be initially entered, but highly likely to be repeated once entered (although this case can't really be nicely modeled with @expect either). However, on the whole I agree that this is a better model than @expect.

@rohlem
Copy link
Contributor

rohlem commented Jul 16, 2024

@mlugg If I understand your comment correctly, the actual distinction here would be to mark the branches = code paths rather than the target block they lead to.
I think this could be modeled by introducing additional blocks along the way, and only specifying @branchWeight for these intermediate blocks.

Current syntax might make this more verbose than would be ideal, but here's some examples anyway:

if (iterator.hasNext()) {
  @branchWeight(2); //unlikely to enter
  while(true) {
    const next = iterator.next().?;

    // actual logic - no @branchWeight directly here

    if (!iterator.hasNext()) {
      @branchWeight(1); //very unlikely to exit
      break;
    }
    @branchWeight(20); //very likely to continue looping
  }
}

switch(nextToken()) {
  .end => {},
  else => |tok| {
    switch (tok) {
      .a => @branchWeight(1), //unlikely to be .a initially
    }
    inner: switch(tok) {
      .a => {

        // actual logic - no @branchWeight directly here

        const next_token = nextToken();
        switch(next_token) {
          .a => @branchWeight(20), //very likely to be .a as a followup
          // other cases...
        }
        continue :inner next_token; //or alternatively inline this continue into the switch above with `@branchWeight` to make it easier for the optimizer to understand this?
      },
      // other cases...
    }
  }
}

@kprotty
Copy link
Member

kprotty commented Jul 16, 2024

The idea of a builtin on the block instead of the condition makes sense to support cases where the condition isn't visible (or would be annoying to make visible) in the source code like captures (while (x) |y|, if (x) |y|) and monadic control flow (orelse {}, catch {}).

One thing that still feels fuzzy is the idea of having arbitrary weight values. The original mentions that probability values are problematic when keeping consistent across code edits. But more concretely, there's the argument for switch cases and profiling:

Switch cases in reality have little control on how they're dispatched: If a case is marked as "likely/unlikely" but a jump table would be ideal, what should codegen do here? If cases are sparse (jump table unavailable), what is the codegen for two equally "likely/unlikely" cases? Branch hints primarily affect which path falls through a conditional and which is jumped to. Default Switch statements don't execute comparisons in order (you can have else => first) so there's no concept of fall-through. Labeled continue may allow expressing this, but branching within them is still a matter of likeliness (binary), not probability/weight.

Profiling (Profile Guided Optimization) tracks which branches are taken to form a probability model, but how this affects codegen is another story: PGO can either 1) reorganize branching to make most/least probable ones fall-through/jumped-to or 2) eliminate the (conditional) branch entirely. The latter is aggressive and only correct when the optimized program will only run again with similar inputs to when profiled. The former is still a decision that's binary, not probability/weighted.

Note

It's also unclear how PGO could take advantage of "source code annotations" here since it bases optimizations on what's executed not speculated. Maybe it could report if an unlikely-marked branch was profiled to be more likely? Either ways, no codegen changes that this builtin wouldn't already do.


I propose the builtin be renamed to something like @{set}branchHint(x) which takes in a comptime enum with variant names like .likely/.unlikely or .fall_through/.jumped_to. The semantic edge cases could then be defined:

  • Q: How to simulate old @setCold(x) behavior?
    A: The old "coldness" is a more a property of inlining (a type of branching, with more implications). At the function level, use noinline or callconv(if (x) .NoInline else .Auto) if x is comptime-computed (requires a new CallingConvention.NoInline). At the scope level, use this branching hint. We could also instead define branchHint at the function level to be similar to inline from C and __attribute__((cold)) from GCC-likes respectively.

  • Q: What happens if two branchHint()s are called in the same scope?
    A: We should mimic whatever @setRuntimeSafety does here. Would propose to just ignore any subsequent others that aren't the first.

  • Q: What happens if branchHint occurs at function level of inline fn?
    A: Inline function bodies IIRC are semantically copied to the caller's invocation-site. It should then either be cancelled out (as noted above) or affect the branch hint for the caller's body which contains the invocation.

  • Q: What to do if branching to the same location must be jumped-to in one path and fall-through in another?
    A: Borrowing from @rohlem, add the respective branch hints to the conditional block at each path before branching to the same location. Example:

    while (true) {
        if (x) break; // unlikely
        if (y) break; // likely
    }
    // branchHint(???)
    sameLocation();
    
    // should become
    
    while (true) {
        if (x) { branchHint(.unlikely); break; }
        if (y) { branchHint(.likely); break; }
    }
    sameLocation();

@MatthiasPortzel
Copy link

MatthiasPortzel commented Jul 16, 2024

I’m not a fan of “weight” values with undefined semantics (in general), because it leads to paranoid developers doing things like @branchWeight(999999999) or @branchWeight(4_294_967_295).

(It may be too early, but) it would be nice if this had a documentation that;

  1. the compiler would use these values only relative to each other, never implement thresholds where weights over a magic number trigger different code-gen and
  2. these values would be compared only locally, and if I call a function that uses large branch-weights, I don’t have to increase the branch weights in my code.

These are just the concerns that pop into my head, I don’t know how it would actually be implemented.

@Snektron
Copy link
Collaborator

Each branch gets a default weight of 10

This seems a little bit arbitrary. Why not use floats and specify 1.0 as the default weight?

@andrewrk
Copy link
Member Author

LLVM's branch weight metadata encodes probability. To those who suggest to toss out this data, can you explain why LLVM devs have decided to include this information, and why they have made a mistake by doing so?

@Rexicon226
Copy link
Contributor

LLVM optimizes branches using weights to be compatible with PGO:

Branch weights might be fetch from the profiling file, or generated based on __builtin_expect

The weights are also influenced by variables like what's inside the branch. LLVM internally warns against using arbitrary weights given the clash with implementation thresholds. We should take their advice and prefer the binary likely/unlikely API

@dweiller
Copy link
Contributor

dweiller commented Jul 18, 2024

The only issue I see here is that this doesn't work so well with #8220, because @branchWeight tells us nothing about the source of the branch; perhaps it is unlikely for a given prong to be reached from the initial switch, but very likely to be reached from a labeled continue.

Is there any kind of control flow that would want information about the source of a branch other than replicated dispatch switch loops? In a replicated dispatch switch loop you probably would want to have more control than simply saying which is the mostly likely target anyway, so I don't think @expect would be the right thing. If there aren't other control flow constructs that benefit from the source information and we want to give the programmer enough control to specify per-branch dispatch weightings/expectations, I think that @branchWeight could generalise to this situation by allowing it to annotate the continue expression in a labelled switch loop:

sw_loop: switch (instructions[i].tag) {
    .add => {
        i += 1;
        continue :sw_loop @branchWeight(.{ .add 4 }, .{ .mul, 2 }, .{ .div, 1 }) intructions[i].tag;
    },
    .mul => {
        i += 1;
        continue :sw_loop @branchWeight(.{ .add, 2 }, .{ .mul, 3 },}) intructions[i].tag;
    },
    .div => {
        i += 1;
        continue :sw_loop intructions[i].tag;
    },
}

That looks a bit unweildy, but maybe it could take an anonymous struct instead allowing the weight map to be specified elsewhere and leaving continue: sw_loop @branchWeight(add_weights) instructions[i].tag;. There might still be a readability issue as it's not obvious whether branch weights provided to a continue statement merely override @branchWeight statements in the toplevel body of switch cases or if @branchWeight in a branch only applies to the initial switch and any continues without a @branchWeight annotation.

You could also argue that this applies to loops: maybe a loop body is unlikely to be initially entered, but highly likely to be repeated once entered (although this case can't really be nicely modeled with @expect either). However, on the whole I agree that this is a better model than @expect.

This could be modelled with either @branchWeight or @expect by manully doing loop inversion, though readability suffers:

if (condition) {
    @branchWeight(1);
    while (true) {
        @branchWeight(2);
        // loop code
        if (!condition) { 
            @branchWeight(1);
            break;
        }
    }
} else {
    @branchWeight(2); // shouldn't be needed given default weight of 10
}

It may be sensible to just make the above the interpretation of an 'unexpected' loop body so it could be simply written as

while (condition) {
    @branchWeight(1);
    // loop code
} else {
    @branchWeight(2); //shouldn't be needed given default weight of 10
}

as I'm not sure it makes sense (and as far as I know it's not possible) to have a loop continually predicted to exit.

Switch cases in reality have little control on how they're dispatched: If a case is marked as "likely/unlikely" but a jump table would be ideal, what should codegen do here? If cases are sparse (jump table unavailable), what is the codegen for two equally "likely/unlikely" cases? Branch hints primarily affect which path falls through a conditional and which is jumped to. Default Switch statements don't execute comparisons in order (you can have else => first) so there's no concept of fall-through. Labeled continue may allow expressing this, but branching within them is still a matter of likeliness (binary), not probability/weight.

I would expect the branch weightings to just be additional input to assist codegen heuristics - there doesn't have to be a order-preserving map between the weightings and the branch's preference in the generated code, rather I would expect them to be used to try and optimise something like branch-weight-weighted average performance. If a jump table would be ideal just ignore them and generate a jump table. For sparse cases there could be a whole rabbit hole of techniques here, but the weights provide the information needed to make decisions. It should be relatively straightforward to map branch weights to a binary decision tree that optimises fall-through/jump behaviour relative to the provided weights; while any specific branch in that tree is binary the best way to arrange the whole tree requires the per-branch weighting information precisely because there is no comparison order information in a switch statement.

LLVM optimizes branches using weights to be compatible with PGO:

Branch weights might be fetch from the profiling file, or generated based on __builtin_expect

The weights are also influenced by variables like what's inside the branch. LLVM internally warns against using arbitrary weights given the clash with implementation thresholds. We should take their advice and prefer the binary likely/unlikely API

Even if we use the binary API, we can still use something like @branchWeight to decide when a branch is likely/unlikely - the weight doesn't necessarily need to be mapped directly to LLVM's branch weight metadata probability.

@kprotty
Copy link
Member

kprotty commented Jul 18, 2024

In a replicated dispatch switch loop you probably would want to have more control than simply saying which is the mostly likely target anyway

An efficient dispatch switch should have dense comparison targets so that it generates a jump-table. If the cases are sparse, then dispatch is no longer efficient and hints wont help: It then requires at most log(N) comparisons and choosing which to fall-through into with a continue :sw weight can be cancelled out by that target choosing another, even if Zig added a guaranteed comparison order.

the best way to arrange the whole tree requires the per-branch weighting information precisely because there is no comparison order information in a switch statement.

A weight can dictate the level in the comparison tree that a conditional appears (and maybe the order in that level), but not which side or the depth of the tree. So in these tree-based sparse comparisons, one still can't accurately control branching codegen.

As a practical example, something like this could be written using a switch statement. But unsure how weights at each case could achieve the desired codegen compared to its explicit use of likely ifs / writing out the desired comparison-tree.

as I'm not sure it makes sense (and as far as I know it's not possible) to have a loop continually predicted to exit.

I think so too. A while loop at codegen is two conditionals: 1) condition to enter the loop and 2) condition to keep going. For a loop to jump back up, 2) must conditionally branch so its fundamentally unlikely. The branch hint would then apply to 1) for whether it falls-through to loop body or jumps-away to it.

@rohlem
Copy link
Contributor

rohlem commented Jul 18, 2024

A while loop at codegen is two conditionals: 1) condition to enter the loop and 2) condition to keep going. For a loop to jump back up, 2) must conditionally branch so its fundamentally unlikely.

Technically unrolling the loop is a mechanism to model 2) as the likely branch via inlining, although the resulting code size limits the scalability of this.
(No idea how small a loop would have to be to make the likely-unlikely tradeoff realistically worth the size increase - I imagine pretty small.)

@dweiller
Copy link
Contributor

the best way to arrange the whole tree requires the per-branch weighting information precisely because there is no comparison order information in a switch statement.

A weight can dictate the level in the comparison tree that a conditional appears (and maybe the order in that level), but not which side or the depth of the tree. So in these tree-based sparse comparisons, one still can't accurately control branching codegen.

I agree that the programmer is not going to have precise control over the branching and I don't think they ever will with a sparse switch; what I'm getting at is that I would expect a 'sparse switch optimizer' to be minimizing some (potentially target specific) objective function of branch weights.

In a replicated dispatch switch loop you probably would want to have more control than simply saying which is the mostly likely target anyway

An efficient dispatch switch should have dense comparison targets so that it generates a jump-table. If the cases are sparse, then dispatch is no longer efficient and hints wont help: It then requires at most log(N) comparisons and choosing which to fall-through into with a continue :sw weight can be cancelled out by that target choosing another, even if Zig added a guaranteed comparison order.

I think I agree with you, though I might have to think about that last sentence a bit more (not 100% sure at the moment what you mean with the cancelling out). I wasn't really trying to suggest the branch weights in switch continue loops is going to be very useful because, as you say, when you care you'll make jump tables. I was just saying that I think if someone did end up wanting source and target branch information for some kind of optimization pass here I think a special form of @branchWeight could be used to do it.

@andrewrk
Copy link
Member Author

@jacobly0 I would value your input here if you have the energy to spare. Otherwise I'll go ahead and make an executive decision.

@jacobly0
Copy link
Member

I was mostly just exploring ideas that hadn't been discussed yet, so I don't have much more to say than I already did, since it seems to have accomplished the goal of creating more discussion. I can reiterate the points that I personally give extra weight to, which are that "expected value"-based builtins are, in my experience, difficult to use as a programmer, especially during refactors, and potentially impossible to make use of by a self-hosted backend. There needs to be a direct way of associating the annotation with the (air) instructions that it is supposed to affect.

with variant names like .likely/.unlikely

This would also allow adding things like .unpredictable to encourage an optimizer to perform branchless speculation of both prongs, something which isn't representable with probabilities/weights.

The old "coldness" is a more a property of inlining (a type of branching, with more implications).

Just want to explicitly mention that one of the extra implications of cold over unlikely is that cold code can be loaded to different pages of memory than other code. This allows each page fault to load more hot code with the expectation that a cold code path is allowed to be very expensive the first time it is executed, and locality with nearby code is not desirable. This means that it is possible for an unlikely annotation to be beneficial where a cold annotation would be harmful.

@mlugg
Copy link
Member

mlugg commented Jul 27, 2024

@jacobly0, regarding coldness allowing code to live in different memory pages, perhaps this could be another enum variant, e.g.:

const BranchWeight = enum {
    /// This branch of control flow is more likely to be reached than its peers.
    /// The optimizer should optimize for reaching it.
    likely,
    /// This branch of control flow is less likely to be reached than its peers.
    /// The optimizer should optimize for not reaching it.
    unlikely,
    /// It is difficult to predict whether this branch of control flow will be reached.
    /// The optimizer should avoid branching behavior with expensive mispredictions.
    unpredictable,
    /// This branch of control flow is unlikely to *ever* be reached.
    /// The optimizer may place it in a different page of memory to optimize other branches.
    cold,
};

@nickelca
Copy link

@kprotty

Q: What happens if two branchHint()s are called in the same scope?
A: We should mimic whatever @setRuntimeSafety does here. Would propose to just ignore any subsequent others that aren't the first.

If possible, I think multiple calls of this builtin in the same branch should be a compile error. I can't think of a reason to have multiple calls in the same branch other than to add confusion

@andrewrk andrewrk added the breaking Implementing this issue could cause existing code to no longer compile or have different behavior. label Aug 13, 2024
@andrewrk
Copy link
Member Author

Discussion continues at #21058

@andrewrk andrewrk changed the title introduce @branchWeight builtin introduce @branchWeight builtin with numerical branch weights Aug 20, 2024
@andrewrk
Copy link
Member Author

Rejected in favor of #21148.

@andrewrk andrewrk closed this as not planned Won't fix, can't repro, duplicate, stale Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

10 participants