Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve glBufferSubData usage where safe #33527

Merged

Conversation

clayjohn
Copy link
Member

@clayjohn clayjohn commented Nov 10, 2019

Fixes: #23956

Special thanks to @oeleo1 for testing out and identifying the best candidates.

Tagged as ios, but this may benefit other platforms as well as certain drivers handle glBufferData better than glBufferSubData.

There may be other places where this optimization may be appropriate, however, these are the places @oeleo1 identified as being safest (minus a couple that I removed). For more information about the rationale see this excellent comment #23956 (comment)

This PR now also has the changes from #23956 (comment). This uses a glBufferData() orphaning trick to vastly speed up the usage of glBufferSubData on mobile and webgl

@clayjohn clayjohn force-pushed the GLES2-bufferdata_optimization branch from 842134a to 1253a33 Compare November 12, 2019 00:40
@clayjohn clayjohn changed the title Use BufferData instead of BufferSubData where safe Improve glBufferSubData usage where safe Nov 12, 2019
@@ -3099,7 +3099,7 @@ void RasterizerStorageGLES3::_update_material(Material *material) {
}

glBindBuffer(GL_UNIFORM_BUFFER, material->ubo_id);
glBufferSubData(GL_UNIFORM_BUFFER, 0, material->ubo_size, local_ubo);
glBufferData(GL_UNIFORM_BUFFER, material->ubo_size, local_ubo, GL_STATIC_DRAW);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this is the only call using GL_STATIC_DRAW instead of GL_DYNAMIC_DRAW, is this on purpose or an oversight?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On purpose. The buffer was allocated a few lines above with GL_STATIC_DRAW. We match the access type at the time of creation. This is only a hint anyway.

@reduz
Copy link
Member

reduz commented Nov 19, 2019

This is a strange optimization, but if it works faster I guess its fine. Does it make sense only on GL or would desktop also take advantage of it too?

@clayjohn
Copy link
Member Author

@reduz the benefit only applies to iOS and some Android devices. On desktop there is no known benefit.

I prefer to keep it in the ifdefs because of the risk of this interacting weird with bad drivers.

@akien-mga
Copy link
Member

Let's give it a spin in 3.2 beta 2 today and see whether users report any regression on GLES/WebGL.

@akien-mga akien-mga merged commit 6536105 into godotengine:master Nov 19, 2019
@akien-mga
Copy link
Member

Thanks!

@oeleo1
Copy link

oeleo1 commented Nov 19, 2019

Great. Thanks to you -- this makes a huge difference on mobile. On desktop, the improvement is barely noticeable so I share @clayjohn conservatism - I had to VSync off to go beyong the 60fps mark and I get around 900-950 fps in both cases (Win 10, OpenGL ES 2.0 Renderer: Quadro K2100M/PCIe/SSE2).

@lawnjelly
Copy link
Member

lawnjelly commented Nov 25, 2019

I may be a couple of decades out of date on this, but if you are getting pipeline stalls as a result of trying to write to data that is still being used, an alternative is to create 3 buffers and use them in round robin fashion.

That said, if you mark a buffer as dynamic you would assume it was doing this behind the scenes, but maybe not on some drivers.

https://www.bfilipek.com/2015/01/persistent-mapped-buffers-in-opengl.html

@oeleo1
Copy link

oeleo1 commented Nov 25, 2019

@lawnjelly You are absolutely right. Good pointer - thanks. I think we got 3 problems here in decreasing priority:

  • Problem 1: Godot's status quo is that it uses a single big fixed max size poly buffer preallocated at startup. So it writes on this buffer over and over again at every poly draw request. That mildly sucks and we addressed that problem with the least obtrusive code changes (code readability counts, indeed).
  • Problem 2: Round-robin is cool but requires 3 max size buffers with the status quo.
  • Problem 3: No "optimal" size buffer management, optimal being subjective (needed size, power of 2, a mix of that, or something else).
    I guess great minds will join forces to fix all that in 4.0 and beyond by rewriting the whole buffer allocation stuff...

@lawnjelly
Copy link
Member

It sounds as though glBufferData is avoiding the stall because it is effectively saying 'I don't care about the old contents of the buffer, allocate me a completely new buffer'. So internally it is likely using the same amount of memory internally as if you explicitly allocated these yourself, so it may not be 'saving memory' versus 2 / 3 buffers as much as expected.

Ok, you in theory you can save a bit by only allocating what is needed, but on the other hand this is traded off against the risk of potentially thrashing memory around to achieve this variable size allocation (or an allocation failure worse case!). So deciding on a maximum (or at least only changing it infrequently) would probably be my preference, as it is constant very small time. This is the usual divergence between general app memory allocation schemes and game allocation schemes.

The optimal size to use sounds doable, via the methods you suggest.

@oeleo1
Copy link

oeleo1 commented Nov 25, 2019

Agreed again. We are definitely not 'saving memory' by replacing glBufferSubData with glBufferData -- quite the contrary. And allocating on-demand size is indeed suboptimal versus fixed cap size. The focus here is to find the trade-off between spending more "reasonably" to avoid the stalls.

@clayjohn clayjohn deleted the GLES2-bufferdata_optimization branch January 31, 2020 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bad 2D performance on iOS 12
5 participants