Skip to content
This repository was archived by the owner on Feb 25, 2025. It is now read-only.

Vectorize rrect_blur #55576

Merged
merged 13 commits into from
Oct 3, 2024
Merged

Vectorize rrect_blur #55576

merged 13 commits into from
Oct 3, 2024

Conversation

gaaclarke
Copy link
Member

@gaaclarke gaaclarke commented Oct 1, 2024

issue: flutter/flutter#148496

28% speed improvement in the rrect_blur shader.

before

Screenshot 2024-10-01 at 3 45 46 PM

after

Screenshot 2024-10-01 at 3 42 30 PM

Pre-launch Checklist

  • I read the Contributor Guide and followed the process outlined there for submitting PRs.
  • I read the Tree Hygiene wiki page, which explains my responsibilities.
  • I read and followed the Flutter Style Guide and the C++, Objective-C, Java style guides.
  • I listed at least one issue that this PR fixes in the description above.
  • I added new tests to check the change I am making or feature I am adding, or the PR is test-exempt. See testing the engine for instructions on writing and running engine tests.
  • I updated/added relevant documentation (doc comments with ///).
  • I signed the CLA.
  • All existing and new tests are passing.

If you need help, consider asking for advice on the #hackers-new channel on Discord.

@chinmaygarde
Copy link
Member

Really curious to see the malioc diff.

@gaaclarke
Copy link
Member Author

Really curious to see the malioc diff.

Here's one from before my last (and final) vectorization: https://logs.chromium.org/logs/flutter/buildbucket/cr-buildbucket/8735240889996581441/+/u/test:_malioc_diff/stdout

@gaaclarke
Copy link
Member Author

gaaclarke commented Oct 2, 2024

One thing that is unfortunate is that it seems to falling into "stack spilling". I would have expected less or equal register usage. It possibly could be faster if we could avoid that.

@chinmaygarde
Copy link
Member

chinmaygarde commented Oct 2, 2024

The stack spills are concerning. And thats happening on Mali-T880 which is our low-end representative. Perhaps this makes things faster all the way till we run out of stack space.

It's faster on iOS because there are likely no spills.

@gaaclarke
Copy link
Member Author

It's faster on iOS because there are likely no spills.

Or there is spilling on iOS too but the savings from the vectorization are larger than the load/store calls. I could try to reuse variables more to see if that reduces register pressure but I can't run malioc locally. Do we have an easy way to do that? I have it installed locally but it doesn't handle includes and doesn't seem to have an argument to make them work.

$ /Applications/Arm_Performance_Studio_2024.3/mali_offline_compiler/malioc --opengles --fragment --core Mali-G78 impeller/entity/shaders/rrect_blur.frag 
Mali Offline Compiler v8.4.1 (Build 215478)
Copyright (c) 2007-2024 Arm Limited. All rights reserved.

Configuration
=============

Hardware: Mali-G78 r1p1
Architecture: Valhall
Driver: r48p0-00rel0
Shader type: OpenGL ES Fragment

Compiler messages
=================

ERROR: 0:7: P0001: Unknown preprocessing directive 'include'
     
  >> #include <impeller/gaussian.glsl>
     #include <impeller/types.glsl>

ERROR: 0:8: P0001: Unknown preprocessing directive 'include'
     #include <impeller/gaussian.glsl>
  >> #include <impeller/types.glsl>
     
ERROR: 0:10: L0001: Typename expected, found 'FragInfo'
     
  >> uniform FragInfo {
       f16vec4 color;

ERROR: 0:10: L0001: Expected identifier, found '{'
     
  >> uniform FragInfo {
       f16vec4 color;

Compilation failed.

@gaaclarke
Copy link
Member Author

For mali-t880 longest/shortest path, (14.85 + 9.0) < (25.74 + 1.0) so it still seems like a net positive.

@chinmaygarde
Copy link
Member

If you have malioc installed locally, then it should be possible to just run the same script the bots are running to generate the diff. If malioc not in your path perhaps?

@chinmaygarde
Copy link
Member

My sense was that a stack spill was an instakill to performance. And so, I don't think we have any other shaders that spill the stack even on low end devices. I do not think you are seeing any spills on iOS but we could verify.

But, I think this may still be fine. Thanks to you your patch, the arith unit is more hungry now. Hungry enough that the data can't fit the stack (on low-end devices). So the load-store unit does more work to ferry stuff between the stack and memory. But the shader is still arith bound. Perhaps my intuition about spills being bad is misinformed and based on cases where the spills cause the shader to be load-store bound?

I'll ask around for some guidance since I do feel skill-checked here. I'd feel more comfortable doing this only on iOS or testing on the representative low-end device. I'll also followup on more informed guidance on how to interpret this.

Either way, nicely done! Good spot on the loop count fitting in a vector.

Screenshot 2024-10-02 at 10 28 17 AM

@jonahwilliams
Copy link
Contributor

So I'd be careful with vectorization. Most GPUs are SIMT now, and so despite data types being vec4 based all of the math ops are still scalar.

My sense was that a stack spill was an instakill to performance.

This is correct, a stack spill is absolutely catastrophic for performance.

@jonahwilliams
Copy link
Contributor

I would like to spend some time verifying the reported performance numbers. I had previously investigate this and on iOS found that it was net negative.

@gaaclarke
Copy link
Member Author

I got rid of the stack spill by reusing the integral variable:

$ malioc --fragment --core Mali-T880 ../out/host_debug_unopt_arm64/gen/flutter/impeller/entity/gles/rrect_blur.frag.gles
Mali Offline Compiler v8.4.1 (Build 215478)
Copyright (c) 2007-2024 Arm Limited. All rights reserved.

Configuration
=============

Hardware: Mali-T880 r2p0
Architecture: Midgard
Driver: r23p0-00rel0
Shader type: OpenGL ES Fragment

Main shader
===========

Work registers: 7 (87% used at 50% occupancy)
Uniform registers: 3 (17% used)
Stack spilling: false

                                A      LS       T    Bound
Total instruction cycles:   13.33    1.00    0.00        A
Shortest path cycles:       12.87    1.00    0.00        A
Longest path cycles:        12.87    1.00    0.00        A

A = Arithmetic, LS = Load/Store, T = Texture

Shader properties
=================

Has uniform computation: false

Note: This tool shows only the shader-visible property state.
API configuration may also impact the value of some properties.

@gaaclarke gaaclarke marked this pull request as ready for review October 2, 2024 18:26
@jonahwilliams
Copy link
Contributor

Numbers check out locally for me. Good to hear that you got rid of the stack spill!

LGTM

@chinmaygarde
Copy link
Member

I got rid of the stack spill by reusing the integral variable

Awesome. I have no concerns past that. Seems like Jonah had some prior experience not seeing expected results with vectorization so get a +1 from him as well.

@chinmaygarde
Copy link
Member

Numbers check out locally for me.

Nevermind 🙂

@gaaclarke gaaclarke added the autosubmit Merge PR when tree becomes green via auto submit App label Oct 2, 2024
Copy link
Contributor

auto-submit bot commented Oct 2, 2024

auto label is removed for flutter/engine/55576, due to - The status or check suite Linux linux_unopt has failed. Please fix the issues identified (or deflake) before re-applying this label.

@auto-submit auto-submit bot removed the autosubmit Merge PR when tree becomes green via auto submit App label Oct 2, 2024
@flutter-dashboard
Copy link

Golden file changes have been found for this pull request. Click here to view and triage (e.g. because this is an intentional change).

If you are still iterating on this change and are not ready to resolve the images on the Flutter Gold dashboard, consider marking this PR as a draft pull request above. You will still be able to view image results on the dashboard, commenting will be silenced, and the check will not try to resolve itself until marked ready for review.

Changes reported for pull request #55576 at sha cbb3198

@chinmaygarde
Copy link
Member

Hmm, the edges don't look quite right
Screenshot 2024-10-02 at 3 40 38 PM

@gaaclarke
Copy link
Member Author

The error was introduced in 56a32c7 when i tried to avoid the stack spilling. Let me double check that diff.

This reverts commit 56a32c7.

fixed stack spilling change
@gaaclarke
Copy link
Member Author

gaaclarke commented Oct 2, 2024

fixed, I got my swizzles swizzled

@gaaclarke gaaclarke added the autosubmit Merge PR when tree becomes green via auto submit App label Oct 2, 2024
@auto-submit auto-submit bot merged commit 60c3ca0 into flutter:main Oct 3, 2024
30 checks passed
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
engine-flutter-autoroll added a commit to engine-flutter-autoroll/flutter that referenced this pull request Oct 3, 2024
auto-submit bot pushed a commit to flutter/flutter that referenced this pull request Oct 3, 2024
…156144)

flutter/engine@bd44b58...247bc68

2024-10-03 skia-flutter-autoroll@skia.org Roll Dart SDK from 8b45ba3fdbfa to 1c2e6ad84af1 (1 revision) (flutter/engine#55610)
2024-10-03 skia-flutter-autoroll@skia.org Roll Skia from f074e2bd0aba to 68fea8aa589b (1 revision) (flutter/engine#55609)
2024-10-03 skia-flutter-autoroll@skia.org Roll Skia from df2e478e9f8d to f074e2bd0aba (1 revision) (flutter/engine#55608)
2024-10-03 skia-flutter-autoroll@skia.org Roll Skia from ba95ec201dfd to df2e478e9f8d (2 revisions) (flutter/engine#55607)
2024-10-03 skia-flutter-autoroll@skia.org Roll Dart SDK from 14b931c40076 to 8b45ba3fdbfa (1 revision) (flutter/engine#55606)
2024-10-03 skia-flutter-autoroll@skia.org Roll Skia from 03431ca9d337 to ba95ec201dfd (3 revisions) (flutter/engine#55599)
2024-10-03 30870216+gaaclarke@users.noreply.github.com Vectorize rrect_blur (flutter/engine#55576)

If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/flutter-engine-flutter-autoroll
Please CC matanl@google.com,zra@google.com on the revert to ensure that a human
is aware of the problem.

To file a bug in Flutter: https://github.com/flutter/flutter/issues/new/choose

To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622

Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
nick9822 pushed a commit to nick9822/flutter that referenced this pull request Dec 18, 2024
issue: flutter#148496

28% speed improvement in the rrect_blur shader.

## before
<img width="1728" alt="Screenshot 2024-10-01 at 3 45 46�PM" src="https://github.com/user-attachments/assets/643068a5-ab1e-4fa3-bc03-184b2ee4a6cf">

## after
<img width="1728" alt="Screenshot 2024-10-01 at 3 42 30�PM" src="https://github.com/user-attachments/assets/41445231-ffea-4279-8142-ce126df8187c">

[C++, Objective-C, Java style guides]: https://github.com/flutter/engine/blob/main/CONTRIBUTING.md#style
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
autosubmit Merge PR when tree becomes green via auto submit App e: impeller will affect goldens
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants