Skip to content

match an std::cmp::Ordering generates less optimized code in nightly #86511

Open
@yume-chan

Description

@yume-chan

I'm interested in this comment:

let cmp = f(unsafe { self.get_unchecked(mid) });
// The reason why we use if/else control flow rather than match
// is because match reorders comparison operations, which is perf sensitive.
// This is x86 asm for u8: https://rust.godbolt.org/z/8Y8Pra.
if cmp == Less {
left = mid + 1;
} else if cmp == Greater {
right = mid;
} else {
// SAFETY: same as the `get_unchecked` above
unsafe { crate::intrinsics::assume(mid < self.len()) };
return Ok(mid);
}

So I did some testing:


The following code snippet is the "bad" one (match an std::cmp::Ordering) in above comment, but with f manually inlined.

use std::cmp::Ordering;
use std::cmp::Ordering::*;

pub fn custom_binary_search_1<'a>(slice: &'a [u8], x: &u8) -> Result<usize, usize> {
    let s = slice;
    let mut left = 0;
    let mut right = s.len();
    while left < right {
        // never overflow because `slice::len()` max is `isize::MAX`.
        let mid = (left + right) / 2;
        // SAFETY: the call is made safe by the following invariants:
        // - `mid >= 0`
        // - `mid < size`: `mid` is limited by `[left; right)` bound.
        let cmp = unsafe{ s.get_unchecked(mid).cmp(x) };
        match cmp {
            Less => left = mid + 1,
            Greater => right = mid,
            Equal => return Ok(mid),
        }
    }
    Err(left)
}

In version 1.53.0-beta.12 (2021-06-12 e7a67cc), release mode, it generates the following x86 assembly (related parts only):

        jmp	.LBB0_3
.LBB0_7:
	add	rdx, 1                     # mid += 1
	mov	rcx, rdx                   # left = mid
	cmp	rcx, rsi                   # while left < right
	jae	.LBB0_9                    # (end loop)
.LBB0_3:
	lea	rdx, [rcx + rsi]           # mid = left + right
	shr	rdx                        # mid /= 2
	cmp	byte ptr [rdi + rdx], r8b  # match s[mid].cmp(x)
	jb	.LBB0_7                    # Less => ...
	je	.LBB0_6                    # Equal => ...
	mov	rsi, rdx                   # Greater => right = mid
	cmp	rcx, rsi                   # while left < right
	jb	.LBB0_3                    # (continue loop)
.LBB0_6:
	xor	eax, eax                   # return (0, mid)
	ret

(Which is identical to manually expanding to if..else if..else, however it's not the point of this issue)


In version 1.55.0-nightly (2021-06-20 e82b650), release mode, the generated assembly code is:

	mov	r9, -1                       # constant of std::cmp::Ordering::Less
	jmp	.LBB0_3

.LBB0_5:                                #   in Loop: Header=BB0_3 Depth=1
	mov	rcx, rdx                     # left = mid         
	add	rcx, 1                       # left += 1
	mov	rdx, rsi                     # temp = right

.LBB0_6:                                #   in Loop: Header=BB0_3 Depth=1
	mov	rsi, rdx                     # right = mid
	cmp	rcx, rdx                     # while left < right
	jae	.LBB0_7                      # (end loop)

.LBB0_3:                                # =>This Inner Loop Header: Depth=1
	lea	rdx, [rcx + rsi]             # mid = left + right
	shr	rdx                          # mid /= 2
	xor	eax, eax                     # cmp = 0
	cmp	byte ptr [rdi + rdx], r8b
	setne	al                           # cmp = s[mid] == x ? 0 : 1
	cmovb	rax, r9                      # cmp = s[mid] < x ? -1 : temp
	cmp	rax, -1                      # if cmp == Less
	je	.LBB0_5                      # Less => ...
# %bb.4:                                #   in Loop: Header=BB0_3 Depth=1
	cmp	rax, 1                       # if cmp == Greater
	je	.LBB0_6                      # Greater => ...
# %bb.9:
	ret

It's same as the "bad" result in original comment, and obviously much unoptimized.

I think match an Ordering should be a quite common use case, so shouldn't be deoptimized.


Version it worked on

Rust Playground: 1.53.0

Rust Playground: 1.53.0-beta.12 (2021-06-12 e7a67cc)

Version with regression

a55748f

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-codegenArea: Code generationA-mir-optArea: MIR optimizationsC-bugCategory: This is a bug.E-mentorCall for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion.I-slowIssue: Problems and improvements with respect to performance of generated code.P-mediumMedium priorityT-compilerRelevant to the compiler team, which will review and decide on the PR/issue.regression-from-stable-to-stablePerformance or correctness regression from one stable version to another.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions