-
Notifications
You must be signed in to change notification settings - Fork 12
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
update Posts about DrawCall
- Loading branch information
Showing
88 changed files
with
189 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# Unreal Insights 系列 | ||
|
||
本系列主要讲述UE4引擎的内部实现,记录了一些客制化的需求实现过程中的一些坑。 | ||
|
||
* [真实角色的渲染](shading_models/paragon_character_tech.md) | ||
* [基于LPV的动态全局光实现](global_illumination/lpv.md) | ||
* [UE4的渲染框架](renderer_architect/renderer.md) |
File renamed without changes.
File renamed without changes
File renamed without changes.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Kaleido3D的开发日记 | ||
|
||
* [NGFX库的灵感](ngfx/ngfx_impl.md) | ||
* [Kaleido3D的开始](ngfx/initial.md) | ||
* [NGFX Shader编译的改造](ngfx/compiler.md) | ||
* [跨平台实现](posts/cross_platform.md) |
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
# 跨平台实现的细节 |
Empty file.
Empty file.
File renamed without changes
File renamed without changes
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
# Decima引擎在PS4 Pro上的Checkboard Rendering | ||
|
||
* 每帧渲染50%的像素 | ||
* 每一帧有选择性地采样坐标 | ||
* 以下的部分需要以原有的分辨率渲染: | ||
* 深度缓冲 | ||
* 三角形IndexBuffer | ||
* AlphaTested Coverage | ||
|
||
## 棋盘旋转 | ||
|
||
We can transform this rotated buffer into what we call a ‘tangram’. We call it a tangram because it’s sort of like that so-called puzzle game. | ||
|
||
We can cut the rotated buffer into parts and shuffle them around like so. | ||
The nice thing about that is that it’s completely lossless, and allows the 2160p checkerboard data to be packed into a compact 2160x2160 texture again. And it also still supports bilinear sampling. | ||
And because of the exact way we placed these parts, we can use the built-in texture-wrap hardware to do the unwrapping for us, without any additional logic or shader instructions during sampling. | ||
|
||
The only thing required during sampling is rotating the native-res UV by 45 degrees, and offsetting this by an offset that’s constant per frame. | ||
|
||
```c | ||
struct Vertex | ||
{ | ||
Vec3 mPos; | ||
Vec2 mUV; | ||
Vertex(const Vec3& pos, const Vec2& uv) : mPos(pos), mUV(uv) { } | ||
}; | ||
// UV旋转 | ||
void GetVerticesForTangramRendering(int native_width, int native_height, bool is_even_frame, Vertex* out_vertices) | ||
{ | ||
ASSERT(native_width == (native_height * 16) / 9); | ||
float half_width = 0.5f * (float)native_width; | ||
float half_height = 0.5f * (float)native_height; | ||
|
||
// Prepare three 45-degree rotated quads, placed to cover each checkerboard pixel exactly once. | ||
for (int i = 0; i < 3; ++i) | ||
{ | ||
float x = (float)native_height * (i == 2 ? 1.0f : 0.0f) + (is_even_frame ? -0.5f : 0.0f); | ||
float y = (float)native_height * (i == 1 ? -1.0f : 0.0f) + (is_even_frame ? 0.0f : 0.5f); | ||
out_vertices[4 * i + 0] = Vertex(Vec3(x, y, 1.0f), Vec2(0.0f, 0.0f)); | ||
out_vertices[4 * i + 1] = Vertex(Vec3(half_width + x, half_width + y, 1.0f), Vec2(1.0f, 0.0f)); | ||
out_Vertices[4 * i + 2] = Vertex(Vec3(half_width - half_height + x, half_width + half_height + y, 1.0f), Vec2(1.0f, 1.0f)); | ||
out_vertices[4 * i + 3] = Vertex(Vec3(-half_height + x, half_height + y, 1.0f), Vec2(0.0f, 1.0f)); | ||
} | ||
} | ||
``` | ||
## 七巧板拼装和采样 | ||
``` c | ||
// Get the uv for the native-res output pixel, repeating the outer most pixels to prevent blending with different tangram parts/the padding areas. | ||
// The border distance was chosen to allow for a bit of safe neighborhood sampling, but this detail is implementation specific. | ||
int2 native_pos = (int2)(uv * float2(native_width, native_height)); | ||
native_pos.x = clamp(native_pos.x, 1.0, native_width – 3.0); | ||
native_pos.y = min(native_pos.y, native_height – 3.0); | ||
float is_odd_frame = ... // 1 for odd frames, 0 for even frames | ||
// Get the tangram uv, pointing exactly to halfway the nearest two corner samples in the tangram. | ||
float2 tangram_uv = float2(-1.0 + is_odd_frame + native_pos.x - native_pos.y, 2.0 + is_odd_frame + native_pos.x + native_pos.y) * (0.5 / native_height); | ||
// Do a simple resolve | ||
float4 tangram_color = tex2Dlod(tangram_texture, bilinear_sampler, tangram_uv, 0.0); | ||
``` |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# Draw Call优化的一些思考 | ||
|
||
当Profile游戏应用的帧率瓶颈发生在CPU时,你可能要注意GFX API接口的调用占比,他很可能是低帧率的元凶。 | ||
优化DrawCall的方向有两个: | ||
|
||
* 引擎层的Renderer改造 | ||
* 游戏资源的改造 | ||
|
||
## AZDO (Approaching Zero Driver Overhead) | ||
|
||
为什么Driver会产生Overhead? | ||
|
||
* 传统的DX11/OGL图形驱动在每个API调用期间都会**检查调用参数、资源是否符合逻辑**,Validation这部分会消耗一部分时间 | ||
* 为了提高API调用的容错性,**GPU/CPU内存分配**的时机也存在不确定性 | ||
* GFX Object的**绑定操作以及同步**也消耗了CPU时间 | ||
|
||
为了减少上述的开销,GFX API提供了AZDO的接口供开发者使用。 | ||
|
||
在传统的DX11和OGL接口下,使用提供的Indirect Drawing接口就能实现AZDO的调用。 | ||
|
||
* DrawIndexedInstancedIndirect | ||
* glMultiDrawElementsIndirect | ||
|
||
> DX11调用 | ||
``` c | ||
DrawElementsIndirectCommand* commands = ...; | ||
foreach( object ) | ||
{ | ||
writeUniformData( object, &uniformData[i] ); | ||
writeDrawCommand( object, &commands[i] ); | ||
} | ||
updateCommands(drawArgsBuffer, commands, commandCount); | ||
context->DrawIndexedInstancedIndirect(drawArgsBuffer, 0); | ||
``` | ||
> OGL调用 | ||
``` c | ||
DrawElementsIndirectCommand* commands = ...; | ||
foreach( object ) | ||
{ | ||
writeUniformData( object, &uniformData[i] ); | ||
writeDrawCommand( object, &commands[i] ); | ||
} | ||
glMultiDrawElementsIndirect( | ||
GL_TRIANGLES, | ||
GL_UNSIGNED_SHORT, | ||
commands, | ||
commandCount, | ||
0 | ||
) | ||
``` | ||
|
||
* 使用Indirect Draw绘制批次模型,可以减少CPU绘制时间,前提是同批次绘制的模型的渲染状态以及资源绑定类型必须一致。 | ||
* 在资源绑定阶段,针对纹理的绑定可考虑使用TextureArrayObject来减少绑定次数,Buffer直接拷贝即可。 | ||
* 资源绑定的方式也可以使用驱动厂商提供的BindLess接口最优化开销,但会增加代码复杂度。 | ||
|
||
> 渲染状态包括Shader、RasterState/DepthStencil/VertexLayout/PrimitiveTopology等。 | ||
### Shader改造 | ||
|
||
使用Indirect Draw方法后,针对资源的绑定代码,可以考虑重建绘制ID与资源ID的索引。 | ||
|
||
## 传统的Mesh、Texture合并 | ||
|
||
* UE4引擎中针对场景中的静态物体也可以通过HLOD系统实现模型的合并来减少DrawCall数目 | ||
* 在Android的字体/UI渲染库同样使用了ATLAS、BatchRendering完成DrawCall的合并 | ||
|
||
### 彩虹6号DrawCall优化实践 | ||
|
||
* 基于材质的DrawCall分发系统(本质上是分批次渲染) | ||
* 统一的Buffer定义(方便资源绑定) | ||
* VertexBuffer | ||
* IndexBuffer | ||
* ConstantBuffer | ||
* StructBuffer表示DrawCall的参数 | ||
* Shader的自动生成允许我们快速验证新的模型 | ||
* DrawCall收集 | ||
* 每一个批次绘制对应一个IndirectDraw的命令 | ||
|
||
优化结果: | ||
|
||
|未合批次的DrawCall数目|合批次的DrawCall数目(VIS+GBuffer+贴花)|合批次的DrawCall数目(阴影)|剔除效率提升| | ||
|:--:|:--:|:--:|:--:| | ||
|10537|412|64|73%| | ||
|
||
## 多线程Command Buffer构建提交 | ||
|
||
如果将Renderer的接口使用迁移至DX12级别的接口(VK/MTL),在驱动的优化下DrawCall的提交效率可以提升十倍,通过GPU命令的并行绑定和提交,最大程度的榨干GPU的机能。 | ||
|
||
![](images/3d_mark.png) | ||
|
||
如上图,在3DMark的测试中,在相同时间下,**Vulkan和DX12的DrawCall数**最多可以达到**DX11的13倍**,驱动带来的优化比较明显。 | ||
|
||
* 即使是在DX12级别的API下,传统的DrawCall优化方法仍有应用的空间。 | ||
|
||
# 参考 | ||
|
||
1. [Approaching Zero Driver Overhead](https://www.slideshare.net/CassEveritt/approaching-zero-driver-overhead) | ||
2. [Rendering Rainbow Six](http://twvideo01.ubm-us.net/o1/vault/gdc2016/Presentations/El_Mansouri_Jalal_Rendering_Rainbow_Six.pdf) | ||
3. [Android HWUI硬件加速模块浅析](https://github.com/TsinStudio/AndroidDev) |
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,13 @@ | ||
# 游戏开发笔记 | ||
|
||
- [《虚幻争霸》角色技术分析](1.character_tech_in_paragon/paragon_character_tech.md) | ||
- [Unreal Insights 系列](1.ue4_insights/ue4_insights.md) | ||
- [《虚幻争霸》角色技术分析](1.ue4_insights/shading_models/paragon_character_tech.md) | ||
- [LPV动态全局光技术](1.ue4_insights/global_illumination/lpv.md) | ||
- [渲染框架](1.ue4_insights/renderer_architect/renderer.md) | ||
- [Kaleido3D开发日记](3.build_next_gen_gfx_lib/ReadMe.md) | ||
- [使用Clang构建C++反射框架](2.reflect_cpp_with_clang/reflect_cpp_with_clang.md) | ||
- [SIGGRAPH2017游戏渲染技术:海洋渲染](5.ocean_rendering/ocean_rendering.md) | ||
- [SIGRGAPH2017游戏渲染技术:Decima的棋盘渲染](7.checkboard_rendering/decima.md) | ||
- [Oculus VR的重投影优化](6.oculus_vr_reprojection/oculus_reprojection.md) | ||
- SIGGRAPH2017游戏高级渲染技术 | ||
- [海洋渲染](4.siggraph2017_game/ocean_rendering.md) | ||
- [Decima的棋盘渲染](5.checkboard_rendering/decima.md) | ||
- [Oculus VR的重投影优化](6.oculus_vr_reprojection/oculus_reprojection.md) | ||
- [DrawCall优化的一些思考](7.about_drawcall/draw_call.md) |