Stop peeling the last iteration of the loop in `Vec::resize_with` #104818

scottmcm · 2022-11-24T11:19:24Z

resize_with uses the ExtendWith code that peels the last iteration:

Lines 2525 to 2529 in 341d8b8

    
           if n > 0 { 
        
               // We can write the last element directly without cloning needlessly 
        
               ptr::write(ptr, value.last()); 
        
               local_len.increment_len(1); 
        
           }

But that's kinda weird for ExtendFunc because it does the same thing on the last iteration anyway:

rust/library/alloc/src/vec/mod.rs

Lines 2494 to 2502 in 341d8b8

    
           struct ExtendFunc<F>(F); 
        
           impl<T, F: FnMut() -> T> ExtendWith<T> for ExtendFunc<F> { 
        
               fn next(&mut self) -> T { 
        
                   (self.0)() 
        
               } 
        
               fn last(mut self) -> T { 
        
                   (self.0)() 
        
               } 
        
           }

So this just has it use the normal extend-from-TrustedLen code instead.

r? @ghost

…ed directly

scottmcm · 2022-11-24T11:38:59Z

@bors try @rust-timer queue

bors · 2022-11-24T11:39:08Z

⌛ Trying commit a8954f1 with merge 8575d87094db626f2ff3e7a4b8d1af877b9a3095...

bors · 2022-11-24T13:50:53Z

☀️ Try build successful - checks-actions
Build commit: 8575d87094db626f2ff3e7a4b8d1af877b9a3095 (8575d87094db626f2ff3e7a4b8d1af877b9a3095)

rust-timer · 2022-11-24T15:42:46Z

Finished benchmarking commit (8575d87094db626f2ff3e7a4b8d1af877b9a3095): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.2%, 0.2%]	2
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-0.4%	[-0.6%, -0.2%]	6
All ❌✅ (primary)	0.2%	[0.2%, 0.2%]	2

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.6%	[2.3%, 2.8%]	2
Regressions ❌ (secondary)	3.0%	[0.9%, 4.2%]	4
Improvements ✅ (primary)	-3.3%	[-3.3%, -3.3%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.6%	[-3.3%, 2.8%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.4%	[2.2%, 2.6%]	3
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

scottmcm · 2022-11-24T20:32:53Z

Those perf changes look quite tolerable to me.

LLVM can't optimize out the split loop here today (https://rust.godbolt.org/z/b5KEc6r8G), leaving two callsites for the closure in the loop:

So even if LLVM takes a bit longer to optimize now sometimes, that seems fine to me because it's plausible that it's from it being willing to inline more, for example, because more callsites (as it was before this PR) suppresses inlining.

r? @the8472

rustbot · 2022-11-24T20:32:58Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

the8472 · 2022-11-24T21:39:18Z

Have you checked the new assembly? It switches from external to internal iteration which then goes through Take::try_fold of which I don't know how well it optimizes.

So even if LLVM takes a bit longer to optimize now sometimes, that seems fine to me because it's plausible that it's from it being willing to inline more, for example, because more callsites (as it was before this PR) suppresses inlining.

I'm a bit more concerned about await-call-tree check incr-unchanged since it takes a massive wall-time hit and also a smaller one on cycles. That may be a fluke, it had a similar spike a few weeks ago. Maybe rerun just that one?

Other than that it looks fine.

scottmcm · 2022-11-24T22:11:07Z

Hmm, good call. I'd checked that it has the one callsite with a tight loop https://rust.godbolt.org/z/747xco7dq, but it looks like it's slightly more stack traffic that way -- something doesn't get fully register-promoted.

I'll take a look and see if I can improve Take::fold on the way by.

@rustbot author

scottmcm · 2022-11-25T03:25:26Z

library/alloc/src/vec/mod.rs

-                    ptr = ptr.add(1);
-                    // Since the loop executes user code which can panic we have to bump the pointer
-                    // after each step.
+                    ptr::write(ptr.add(local_len.current_len()), element);


This looks pointless, but seems like it reduces register pressure since we have to maintain the local_len for panic safety anyway.

Before:

.LBB4_3: mov qword ptr [rbp - 16], rax mov qword ptr [rbp - 24], rcx ; Note Spill call make_thing mov rcx, qword ptr [rbp - 24] mov dword ptr [rcx], eax add rcx, 4 mov rax, qword ptr [rbp - 16] dec rax cmp rsi, rax jne .LBB4_3

After:

.LBB6_3: mov qword ptr [rbp - 16], rax call make_thing mov rdx, qword ptr [rbp - 16] lea rcx, [rdx + 1] mov dword ptr [rbx + 4*rdx], eax ; Note addressing mode mov rax, rcx dec rsi jne .LBB6_3

(And LLVM understands that kind of loop very well too, since it's a for i in A..B { v[i] = foo(); } loop that's common all over the place.)

scottmcm · 2022-11-25T03:31:48Z

@bors try @rust-timer queue

bors · 2022-11-25T03:31:57Z

⌛ Trying commit 9d68a1a with merge 9fa29ccfda5f8e07d329c703730ac4f591d738c4...

bors · 2022-11-25T05:40:21Z

☀️ Try build successful - checks-actions
Build commit: 9fa29ccfda5f8e07d329c703730ac4f591d738c4 (9fa29ccfda5f8e07d329c703730ac4f591d738c4)

rust-timer · 2022-11-25T06:58:20Z

Finished benchmarking commit (9fa29ccfda5f8e07d329c703730ac4f591d738c4): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.4%, 0.6%]	2
Regressions ❌ (secondary)	1.1%	[1.1%, 1.1%]	2
Improvements ✅ (primary)	-0.5%	[-1.1%, -0.2%]	6
Improvements ✅ (secondary)	-1.5%	[-3.1%, -0.1%]	13
All ❌✅ (primary)	-0.3%	[-1.1%, 0.6%]	8

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.6%	[2.6%, 2.6%]	1
Regressions ❌ (secondary)	3.4%	[2.2%, 5.2%]	3
Improvements ✅ (primary)	-2.2%	[-3.7%, -0.8%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.0%	[-3.7%, 2.6%]	4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.0%	[-1.0%, -1.0%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.0%	[-1.0%, -1.0%]	1

scottmcm · 2022-11-25T07:16:01Z

Ah, there we go, that's much better. Net win on instructions for both primary and secondary, the scary 90% wall time hit to a secondary is gone, and the opt-full regression for image is all in LLVM taking longer, seemingly due to a different split of modules.

@rustbot ready

the8472 · 2022-11-25T19:48:11Z

library/core/src/iter/adapters/take.rs

+        fn check<'a, Item>(
+            mut action: impl FnMut(Item) + 'a,
+        ) -> impl FnMut(usize, Item) -> Option<usize> + 'a {
+            move |more, x| {
+                action(x);
+                more.checked_sub(1)
+            }
+        }


A pity that the duplication-by-unused-generics issues still haven't been fixed 😮‍💨

the8472 · 2022-11-25T19:50:57Z

@bors r+ rollup=never

bors · 2022-11-25T19:50:58Z

📌 Commit 9d68a1a has been approved by the8472

It is now in the queue for this repository.

bors · 2022-11-26T21:43:13Z

⌛ Testing commit 9d68a1a with merge b181d0623de0e9446e16accca83ee9d2eb736dd5...

jyn514 · 2022-11-26T21:47:11Z

@bors retry (yield to #104950)

rust-log-analyzer · 2022-11-26T23:09:35Z

A job failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)

bors · 2022-11-27T00:58:53Z

⌛ Testing commit 9d68a1a with merge faf1891...

bors · 2022-11-27T04:09:47Z

☀️ Test successful - checks-actions
Approved by: the8472
Pushing faf1891 to master...

rust-timer · 2022-11-27T05:29:32Z

Finished benchmarking commit (faf1891): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Next Steps: If you can justify the regressions found in this perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please open an issue or create a new PR that fixes the regressions, add a comment linking to the newly created issue or PR, and then add the perf-regression-triaged label to this PR.

@rustbot label: +perf-regression
cc @rust-lang/wg-compiler-performance

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.8%	[0.8%, 0.8%]	1
Regressions ❌ (secondary)	0.3%	[0.3%, 0.3%]	1
Improvements ✅ (primary)	-0.3%	[-0.3%, -0.3%]	3
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.1%]	7
All ❌✅ (primary)	-0.0%	[-0.3%, 0.8%]	4

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.0%	[-3.7%, -2.0%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-3.0%	[-3.7%, -2.0%]	3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.1%	[1.8%, 2.3%]	3
Improvements ✅ (primary)	-2.4%	[-2.8%, -2.1%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.4%	[-2.8%, -2.1%]	4

nnethercote · 2022-11-27T22:08:57Z

Perf changes are very small, and wins slightly outweigh losses.

@rustbot label: +perf-regression-triaged

scottmcm added 2 commits November 24, 2022 03:12

Extract the logic for TrustedLen to a named method that can be call…

1c966e7

…ed directly

Stop peeling the last iteration of the loop in Vec::repeat_with

a8954f1

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 24, 2022

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 24, 2022

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 24, 2022

rustbot assigned the8472 Nov 24, 2022

scottmcm marked this pull request as ready for review November 24, 2022 20:32

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 24, 2022

Tune RepeatWith::try_fold and Take::for_each and Vec::extend_trusted

9d68a1a

scottmcm commented Nov 25, 2022

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 25, 2022

This comment has been minimized.

Sign in to view

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 25, 2022

the8472 reviewed Nov 25, 2022

View reviewed changes

bors added the merged-by-bors This PR was explicitly merged by bors. label Nov 27, 2022

bors merged commit faf1891 into rust-lang:master Nov 27, 2022

rustbot added this to the 1.67.0 milestone Nov 27, 2022

bors mentioned this pull request Nov 27, 2022

vec: add try_* methods and a try_vec! macro to make Vec usable in without infallible allocation methods #95051

Closed

scottmcm deleted the refactor-extend-func branch November 27, 2022 05:12

rustbot added the perf-regression-triaged The performance regression has been triaged. label Nov 27, 2022

	if n > 0 {
	// We can write the last element directly without cloning needlessly
	ptr::write(ptr, value.last());
	local_len.increment_len(1);
	}

	struct ExtendFunc<F>(F);
	impl<T, F: FnMut() -> T> ExtendWith<T> for ExtendFunc<F> {
	fn next(&mut self) -> T {
	(self.0)()
	}
	fn last(mut self) -> T {
	(self.0)()
	}
	}

Stop peeling the last iteration of the loop in Vec::resize_with #104818

Stop peeling the last iteration of the loop in Vec::resize_with #104818

Uh oh!

Conversation

scottmcm commented Nov 24, 2022

Uh oh!

scottmcm commented Nov 24, 2022

Uh oh!

This comment has been minimized.

bors commented Nov 24, 2022

Uh oh!

bors commented Nov 24, 2022

Uh oh!

This comment has been minimized.

rust-timer commented Nov 24, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

scottmcm commented Nov 24, 2022

Uh oh!

rustbot commented Nov 24, 2022

Uh oh!

the8472 commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

scottmcm commented Nov 24, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scottmcm commented Nov 25, 2022

Uh oh!

This comment has been minimized.

bors commented Nov 25, 2022

Uh oh!

bors commented Nov 25, 2022

Uh oh!

This comment has been minimized.

rust-timer commented Nov 25, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

scottmcm commented Nov 25, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

the8472 commented Nov 25, 2022

Uh oh!

bors commented Nov 25, 2022

Uh oh!

bors commented Nov 26, 2022

Uh oh!

jyn514 commented Nov 26, 2022

Uh oh!

rust-log-analyzer commented Nov 26, 2022

Uh oh!

bors commented Nov 27, 2022

Uh oh!

bors commented Nov 27, 2022

Uh oh!

rust-timer commented Nov 27, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Uh oh!

nnethercote commented Nov 27, 2022

Uh oh!

Uh oh!

Stop peeling the last iteration of the loop in `Vec::resize_with` #104818

Stop peeling the last iteration of the loop in `Vec::resize_with` #104818

the8472 commented Nov 24, 2022 •

edited

Loading