Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU #54052

Open
djrain opened this issue Oct 21, 2021 · 24 comments

Comments

@djrain
Copy link

djrain commented Oct 21, 2021

Godot version

3.3.2 stable, 3.4 RC1

System information

macOS Big Sur
GLES3

Issue description

I was working in my main project when I noticed some frame drops when I added more than about 20 Particles2D nodes at the same time. This seemed unreasonably slow, so I made a test scene in a new project that instances 500 basic particle systems with only 1 particle each. On my 2020 M1 Mac Mini this scene runs consistently at a ridiculous 10 FPS. I tested this in 3.3.2 stable as well as 3.4 RC1, same results on both.

For comparison, running this same scene in 3.3.2 on my 2015 MacBook Pro gets a solid 30 FPS, which seems reasonable for an older laptop with integrated graphics. Also, some fellow devs testing the same code on PC and Linux had no issues. So it seems this may be an issue specific to M1 machines.

Steps to reproduce

run the test project, presumably on an M1 Mac

Minimal reproduction project

particle performance test.zip

@djrain djrain changed the title Particles2D running incredibly slow on macOS Particles2D running incredibly slow on M1 Mac Oct 21, 2021
@floppyhammer
Copy link
Contributor

Got similar result on the same machine.

20211021-100014

@clayjohn
Copy link
Member

I wonder if M1 macs implement transform feedback on the CPU. That would explain the CPU and GPU times spiking so high together.

After a quick google search it looks like that may indeed be the case. I have found a few posts claiming that transform feedback and geometry shaders are implemented on the CPU.

Unfortunately this may mean that CPUParticles are the only viable option for particles on M1 macs (in 3.x that is, in 4.0 the Vulkan/Metal renderer may work much better)

@djrain
Copy link
Author

djrain commented Oct 21, 2021

I wonder if M1 macs implement transform feedback on the CPU. That would explain the CPU and GPU times spiking so high together.

Would that suggest I should see similar performance with either particle type? Because that's not the case. I'm seeing that a single Particles2D can handle around 10X as many particles compared to the equivalent CPUParticles2D.

@clayjohn
Copy link
Member

Would that suggest I should see similar performance with either particle type

If I am right above, it would mean that CPUParticles may even be faster

Because that's not the case. I'm seeing that a single Particles2D can handle around 10X as many particles compared to the equivalent CPUParticles2D.

Ah, so I guess my guess is wrong. Are CPUParticles also way slower on your M1 than on your MacBook pro?

@Calinou
Copy link
Member

Calinou commented Oct 21, 2021

Can you reproduce this after disabling batching in the Project Settings? Also try playing with the buffer orphaning project setting.

@djrain
Copy link
Author

djrain commented Oct 21, 2021

Can you reproduce this after disabling batching in the Project Settings? Also try playing with the buffer orphaning project setting.

Yep, just tried (batching off, orphan buffers off, both off). Still around 10 fps in any case.

@djrain
Copy link
Author

djrain commented Oct 21, 2021

Are CPUParticles also way slower on your M1 than on your MacBook pro?

No, CPUParticles seem capable enough on either machine.

Here are all 4 numbers, for clarity:

MacBook Pro 2015 (integrated graphics)
1000 CPUParticles2D: 60 fps
1000 Particles2D: 13 fps

M1 Mac Mini
1000 CPUParticles2D: 60 fps
1000 Particles2D: 2 fps

So, it looks like Particles2D is actually subpar on both Macs, but especially bad on M1. I also noticed that the version using Particles2D takes noticeably longer to even start up - I get the spinning beach ball of death on the splash screen for a few seconds on the MacBook and up to 20 seconds on M1. Whereas the CPUParticles2D scene starts running pretty much instantly.

@Richard74Huang
Copy link

Having same issue here on M1 with Godot 3.4 stable. Also tried to export as release but still laggy.

@clayjohn
Copy link
Member

I am fairly certain this is caused by Apples poor support of the OpenGL standard. Specifically, I don't think Apple devices support using transform feedback on the GPU, so the drivers emulate it by passing data back to the CPU. If that is the case then the solution is to run OpenGL over Metal on Apple devices. There is a draft PR already: #50253

@Richard74Huang
Copy link

Apple joins Blender Dev team recently. If they really want to promote their apple silicon, helping game developers like Godot engine is crucial too.

Anyway, I'll use CPUParticle2D instead on macOS for now. Thanks :)

@Calinou
Copy link
Member

Calinou commented Nov 23, 2021

I wonder if we should expose the "Convert to CPUParticles" option to be used at run-time, and run it automatically on all GPU-based particle nodes by default on macOS (unless they use a custom shader). This option works fairly well for simple particle setups, and it still allows you to use GPU-based particles on other platforms.

PS: If any of you have an iOS device, does this slowdown also apply on iOS?

@Calinou Calinou changed the title Particles2D running incredibly slow on M1 Mac GPU-based Particles running slow on macOS due to transform feedback being implemented on the CPU Nov 23, 2021
@akien-mga
Copy link
Member

Closed by #55268.

@djrain
Copy link
Author

djrain commented Jan 13, 2022

Maybe I'm just missing some information, but I'm not quite convinced that we've found the real issue here.

Even on my M1 Mac, as I mentioned previously, a single Particles2D is still far more capable than a single CPUParticles2D. It can do 1 million particles at almost 60 fps. It's only when I instance numerous Particles2D nodes that the performance degrades. And the number of nodes has much more of an impact on the frame rate than the total number of particles. The fact that only 100 Particles2D nodes, emitting one particle each, is taking the fps down to 39... just seems very odd. Is there an explanation?

@clayjohn
Copy link
Member

Yes, same explanation as above. TransformFeedback is implemented on the CPU in apple's OpenGL driver, so to update each Particles2D node the entire GPU process stalls while the particle data is passed from GPU to CPU and back to GPU.

It is more efficient to transfer a million particles once than it is send a single particle a thousand times. That is why things like batching can be effective. It is more efficient to send data to and from the GPU in big batches than it is to send a thousand tiny commands.

@djrain
Copy link
Author

djrain commented Jan 13, 2022

I see, so sending from GPU to CPU and back is just that slow... and CPUParticles is much faster because information is only being sent one way (to GPU)?

If batching is a potential solution, how can I make sure that happens? I'm setting all the instances to the same ParticlesMaterial, no texture/material and tiny visibility rect (is this used for overlap test?), but it isn't batching:

items
	joined_item 1 refs, 
			batch D 0-1 PA 
	joined_item 1 refs, 
			batch D 0-1 PA 
	joined_item 1 refs, 
			batch D 0-1 PA 

@clayjohn
Copy link
Member

I see, so sending from GPU to CPU and back is just that slow... and CPUParticles is much faster because information is only being sent one way (to GPU)?

That is my understanding, yes. I don't know much about how Transform Feedback is implemented on Apple devices (other than that it is in software) so there could be some other things that the driver does that explain the poor performance.

If batching is a potential solution, how can I make sure that happens? I'm setting all the instances to the same ParticlesMaterial, no texture/material and tiny visibility rect (is this used for overlap test?), but it isn't batching:

Particles are not batched. If you only have a single particle in each Particles2D then it will be much more efficient to use a Sprite instead. For most hardware, Particles only become worthwhile once you start having hundreds to thousands of instances within a Particles2D (hundreds on lower end hardware, thousands on higher end hardware).

My guess is that 1000 Sprites will outperform 1000 Particles2D.

You have three alternatives to try out:

  1. Use fewer Particles2D nodes and increase the number of particles in each,
  2. Use CPUParticles where possible (also with a high number of particles in each),
  3. Use regular Sprites (if sharing a material and texture these should batch automatically).

I can't say for sure what will perform the best for you. But it is worth trying a few different approaches to see what works best for your workflow and performance goals.

@djrain
Copy link
Author

djrain commented Jan 13, 2022

@clayjohn Thanks, good to know! As of now I'm using VisualServer for particles as well as several thousand stars, and it's running like a dream :)

@lekoder
Copy link
Contributor

lekoder commented Aug 8, 2022

@akien-mga I'm not sure this is a good solution to this issue. Please correct me if I'm wrong, but you choose to solve "particles are working slow on M1" by having "if the developer happens to work on M1, warn them about that".

It doesn't really fix anything for multi-platform games, which are typically not developed on the target platform. Adding the notice when you export the game would at least notify the developer of the problem, but you cannot reasonably expect developer to re-make entire particle system in their game, which might be hundreds of nodes across hundreds of scenes, just for sake of targeting additional platform - which will additionally decrease the performance for the unaffected platforms.

A proper solution would be to offer conversion from Particles2D to CPUParticles2D either at runtime or at export time.

@akien-mga
Copy link
Member

Well there doesn't seem to be a good solution to that issue indeed, at least until a macOS focused rendering contributor decides to look into what workaround could be implemented for that platform.

A conversion to CPUParticles2D sounds interesting but AFAIK the features don't map 1:1 so I'm not sure how well this would work. Anyways reopening for further discussion.

@akien-mga akien-mga reopened this Aug 8, 2022
@akien-mga akien-mga modified the milestones: 3.5, 3.x Aug 8, 2022
@lekoder
Copy link
Contributor

lekoder commented Aug 9, 2022

I think at the very least it should present a warning when you target a OSX export regardless of the platform you are developing on. This would make the developer aware of the problem and they can work around it.

I intend to solve that by making a container to sibling Particles2D and CPUParticles2D placeholders, with a unified API exposed by the container, but probably the ideal solution would be to get a full compatibility of CPUParticles2D and Particles2D ; that would allow both the ability convert them on export or perhaps a run-time option to switch between them.

@Calinou
Copy link
Member

Calinou commented Aug 9, 2022

I think we should add a project setting that automatically converts GPUParticles3D to CPUParticles3D at run-time on macOS by default. Most built-in particles should be able to convert with similar visuals, but for custom particle shaders, it's better to have broken particles than unplayable performance when the project runs on macOS.

The editor should also warn you before assigning custom particle shaders on macOS, as they can't be converted to CPUParticles.

@GeraldineSullivan
Copy link

This is happening to me today on M2 macbook pro running Sonoma

@Calinou
Copy link
Member

Calinou commented Dec 20, 2023

This is happening to me today on M2 macbook pro running Sonoma

Which Godot version and rendering method are you using?

@GeraldineSullivan
Copy link

GeraldineSullivan commented Dec 20, 2023

This is happening to me today on M2 macbook pro running Sonoma

Which Godot version and rendering method are you using?

I am using 4.1.2

I fixed it by using the renderer settings from a file posted in the comments. They were all set to gl_compatibility by default.

Screenshot 2023-12-20 at 15 21 37

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants