Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GS/HW: Reduce number of copies for HDR #12254

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

refractionpcsx2
Copy link
Member

@refractionpcsx2 refractionpcsx2 commented Jan 30, 2025

THIS WILL ONLY AFFECT SPECIFIC GAMES!

Description of Changes

Tries to reduce the number of copies for HDR draws.

Rationale behind Changes

Master would copy the render target to a new texture, draw, then copy back, every single time, but a lot of games do a lot of separate draws in a row to the same target, so it's better to keep it in one place until the end of the chain. This PR will keep it HDR as long as possible before converting back, hugely reducing copies and render passes in some cases.

TL;DR: Makes some games go brr more

Suggested Testing Steps

Test the games listed below (sorry for the names)

Slightly Cherry Picked Reduction Results from Vulkan:

Ace Combat - Squadron Leader_SCES-52424 ['Draw Calls: -1 [620=>619]', 'Render Passes: -1 [44=>43]', 'Copies: -1 [8=>7]']
Big.Mutha.Truckers ['Draw Calls: -352 [903=>551]', 'Render Passes: -423 [440=>17]', 'Barriers: -127 [132=>5]', 'Copies: -352 [355=>3]']
DT_Carnage_SLUS-21793_20230908174547 ['Draw Calls: -29 [936=>907]', 'Render Passes: -32 [344=>312]', 'Copies: -30 [82=>52]']
Echo_Night_-_Beyond_SLUS-20928_20221019163851 ['Draw Calls: -2745 [6965=>4220]', 'Render Passes: -3384 [3456=>72]', 'Barriers: -1104 [1133=>29]', 'Copies: -2745 [2787=>42]']
evolution_colcliphdr ['Draw Calls: -142 [464=>322]', 'Render Passes: -179 [195=>16]', 'Barriers: -46 [50=>4]', 'Copies: -143 [149=>6]']
Full spectrum warrior ['Draw Calls: -3 [189=>186]', 'Render Passes: -3 [23=>20]', 'Copies: -3 [8=>5]']
JonnyMoseleydebug ['Draw Calls: -115 [211=>96]', 'Render Passes: -116 [128=>12]', 'Copies: -116 [118=>2]']
Kunoichi ['Draw Calls: -48 [1156=>1108]', 'Render Passes: -53 [168=>115]', 'Barriers: -1 [1=>0]', 'Copies: -48 [146=>98]']
Malice_SLES-52413_20240617171243 ['Draw Calls: -4 [453=>449]', 'Render Passes: -4 [232=>228]', 'Copies: -4 [23=>19]']
Mercenaries_-_Playground_of_Destruction_SLUS-20932 ['Draw Calls: -22 [780=>758]', 'Render Passes: -22 [86=>64]', 'Copies: -22 [51=>29]']
Nightshade_SLUS-20810_20231001162431 ['Draw Calls: -42 [1117=>1075]', 'Render Passes: -44 [131=>87]', 'Copies: -42 [119=>77]']
Pac-Man World Rally_SLUS-21328_20230127222927 ['Draw Calls: -57 [189=>132]', 'Render Passes: -64 [78=>14]', 'Barriers: -4 [4=>0]', 'Copies: -59 [64=>5]']
Pac-Man_World_2_SLUS-20224_20220816132032 ['Draw Calls: -2 [82=>80]', 'Render Passes: -2 [14=>12]', 'Copies: -2 [3=>1]']
sakura taisen shadows 1 (hw) ['Draw Calls: -96 [349=>253]', 'Render Passes: -111 [138=>27]', 'Barriers: -7 [7=>0]', 'Copies: -96 [116=>20]']
Sly 2 - Band of Thieves_SCUS-97316_20230422201628 ['Draw Calls: -173 [5962=>5789]', 'Render Passes: -152 [214=>62]', 'Barriers: -21 [546=>525]', 'Copies: -173 [178=>5]']
Snoopy vs Red Baron (snoopy) ['Draw Calls: -40 [198=>158]', 'Render Passes: -44 [60=>16]', 'Barriers: -2 [2=>0]', 'Copies: -40 [47=>7]']
Soul_Calibur_III_SLUS-21216_20231007100145 ['Draw Calls: -138 [526=>388]', 'Render Passes: -138 [156=>18]', 'Copies: -138 [143=>5]']
State_of_Emergency_2_SLUS-20966_20230921232225 ['Draw Calls: -2 [484=>482]', 'Render Passes: -2 [28=>26]', 'Copies: -2 [6=>4]']
Superman Returns - The Video Game_SLUS-21434_20221208170947 ['Render Passes: -1 [9=>8]']
Tekken_5_SLUS-21059_20240108195029 ['Draw Calls: -168 [641=>473]', 'Render Passes: -194 [244=>50]', 'Barriers: -45 [45=>0]', 'Copies: -168 [175=>7]']
tom_jerry_clear ['Draw Calls: -2 [93=>91]', 'Render Passes: -2 [13=>11]', 'Copies: -2 [3=>1]']
Virtua Quest _NTSC-U__SLUS-20977 ['Draw Calls: -2 [444=>442]', 'Render Passes: -2 [11=>9]', 'Copies: -2 [4=>2]']

@JordanTheToaster
Copy link
Member

Some benchmarks Sly 2 at 12x gains 30.3% Echo Night Beyond at 6x gains 333.6% Big Mutha Trucka at 6x gains 294.7%.

image
image
image

@crashGG
Copy link
Contributor

crashGG commented Jan 31, 2025

OS:Win10 22H2
HW: AMD 7840HS with 780M Core Graphics (RDNA 3)
Test game: Soulcalibur III SLUS-21216
Test scenario: Museum - Character Profiles - XINGHUA
A relatively fixed scene, the heaviest load in the entire game, with the most HDR highlights and the lowest frame rate.
SoulCalibur III_SLUS-21216_20250131100422

Use 5x internal resolution to test GPU full load
1

Conclusion: The most significant efficiency improvement in the past two years.

@refractionpcsx2
Copy link
Member Author

Awesome, thanks for the tests @crashGG !

Unfortunately the performance uplift is limited to a few games, but it's nice where it is :) HDR/COLCLIP has been a bit of a sticking point for performance over the years.

@VGkav
Copy link

VGkav commented Jan 31, 2025

Nice! By the way, does this affect native resolution? Also, does this affect software rendering?

@lightningterror
Copy link
Contributor

Nice! By the way, does this affect native resolution? Also, does this affect software rendering?

It affects all resolutions, only hardware renderers.

@refractionpcsx2
Copy link
Member Author

It will depend on what you're limited by, however.

if you're GPU limited, it will have the most significant effect, if you're CPU limited (on the GS thread) then you might notice a small performance increase.. But as mentioned it will be limited to what games are affected, mainly the ones listed, but there might be some other games affected, I just don't have GS dumps of them :)

@crashGG
Copy link
Contributor

crashGG commented Jan 31, 2025

This is why I do not agree with this PR. According to the test data, if you want to experience high rendering (4K) under limited GPU conditions (core graphics), the best choice for RDNA3 GPU is still DX11.

@refractionpcsx2
Copy link
Member Author

It's not really relevant to this PR, so please keep the discussion on-topic.

@TheTechnician27
Copy link
Contributor

OS: Linux 6.12; KDE 6.2.5; Manjaro
CPU: Ryzen 7 5800X
GPU: GTX 1070
Rendering: Vulkan; Basic blend; Native res; default
Game: Sly 3
Benchmark: First mission from start to finish (not including binocucomm cutscene) taken across two seconds

Master:
0.1%: 36.4
1%: 83.1
97th percentile: 615.1
Average: 170
GPU Load: 30
CPU Load: 19.9
Avg Frame Time: 5.9

PR:
0.1%: 86.5
1%: 108.1
97th percentile: 407.4
Average: 197.8
GPU Load: 32
CPU Load: 21.7
Avg Frame Time: 5.1

NOTE: Seemingly because PR runs faster than master and because I was using F4, Sly got to the unwinnable boss fight at the end of the level faster, and so I had to run around in the same area for longer than on master, thus potentially skewing the results somewhat. But Sly getting to the boss fight so much faster was probably due in large part to the faster framerate (partially to getting better at completing the level in fast-forward).

@JordanTheToaster
Copy link
Member

Some DX11 and OpenGL tests on a 2070 Super.

image
image
image
image
image
image

@refractionpcsx2 refractionpcsx2 force-pushed the gs_hdropt branch 2 times, most recently from efaa566 to 6cdc778 Compare February 1, 2025 00:38
@refractionpcsx2 refractionpcsx2 force-pushed the gs_hdropt branch 2 times, most recently from 38f48e0 to 7888fb5 Compare February 1, 2025 03:19
@refractionpcsx2 refractionpcsx2 force-pushed the gs_hdropt branch 2 times, most recently from e0e707f to 0321db3 Compare February 1, 2025 11:19
@kamfretoz
Copy link
Contributor

kamfretoz commented Feb 2, 2025

Moar benchmarks, this time on team red!

CPU: Ryzen 7 5800X
RAM: 32GB
GPU: Radeon RX 6600 XT

Tested with Vulkan at 6x Internal Resolution and Basic Blending

image-3.png

image-4.png

image-5.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants