(转载)(官方)UE4--图像编程----Parallel Rendering Overview
2017-12-25 18:00
1021 查看
Parallel Rendering Overview
本页面的内容:Stage 1
Stage 2
Stage 3
Synchronization
Debugging
In the old days before Parallel rendering was an option there was just the GameThread and the RenderThread. The GameThread would enqueue RenderThread commands to execute later. These commands would directly make calls into the
RHIlayer that serves as our cross-platform interface into the different graphics API's. This means that the RenderThread would directly call items such as
Lock,
DrawPrimitive, etc on D3D11. Unfortunately this didn't let us parallelize very well so then the new Parallel renderer came online in the following stages.
Stage 1
The purpose of Stage 1 was to separate the renderer into a front-end and a back-end. This way the RenderThread no longer makes direct calls through the RHI. Instead the RenderThread generates a cross-platform command list derived fromFRHICommandList. There are multiple types of command list for different purposes. E.g.
FRHICommandListImmediateand
FRHIAsyncComputeCommandList. So when RenderThread wants to perform
DrawPrimitive, it takes an RHICommandList object and enqueues a
DrawPrimitivecommand. These commands are small classes with an overridden
executefunction that store the data they need to execute when created. So for
DrawPrimit stores off
NumPrims,
NumInstances,etc, puts it on the end of the commandlist and then the RenderThread goes on its way.
These RHI Commandlists are then 'Translated' (i.e. executed) on a separate thread that is called the
RHI Thread. This is now where we make calls into the actual platform level API's. There are still some non-commandlist operations like
Lock/Unlockthat may need to be executed immediately by the RenderThread. In these cases we either flush the RHI Thread and wait, or copy data and queue. This is platform implementation dependent per-operation. Submission of commands to the GPU can be platform controlled by heuristic data (submit multiple times per frame, queue all commands till end of frame, etc). Finally, we added a new
RHISubmitCommandsHintcommand to indicate to the RHI that it should submit now if possible.
Stage 2
Now that command generation is separated from command execution we can parallelize either one independently. This means that even on DX11 where parallelization doesn't work very well on the backend, we can generate commandlists in parallel easily now. When we parallelize data it is done as data-parallel rather than task-parallel. In other words, we will break each individual pass up into chunks rather than run the different passes as long tasks concurrently. We do this in all the major passes such as BasePass, DepthPass, VelocityPass, etc.The mechanism for this is the pure virtual
FParallelCommandListSetclass. You can find a derivation of this class for every pass which is parallelized. E.g.
FbasePassParallelCommandListSet. These classes are responsible for creating an RHICommandList for each thread, submitting the results in the proper order, setting any necessary state at the start of the partial commandlist, etc. Load balancing is important here to avoid some worker threads having too little or too much work to do compared to the others. UE4 will automatically do its best to load balance properly. Special submission commands are inserted into the RHICommandList to ensure that GPU submission happens in the correct order, and that translation is finished before submitting.
Once all the worker threads for generating a given pass are kicked off, the rendering thread continues on. It does not wait for the tasks to finish. Thus the renderer is generally no longer allowed to modify state shared with these workers such as the View and Projection Matrices.
Stage 3
Stage three brings support for backend parallelization on platforms that support it like consoles, DX12 and Vulkan. In this case we actually do the Translation in parallel where we can. Basically, anything generated in parallel on the frontend is translated in parallel on the backend. The main interface used in translation is theIRHICommandContext. There is a derived
RHICommandContextfor each platform and API. During translation the RHICommandList is given an
RHICommandContextto operate on. Each command�s execute function calls into the
RHICommandContextAPI. The CommandContext is responsible for state shadowing, validation, and any API specific details necessary to perform the given operation.
Synchronization
Synchronization of the renderer between the GameThread, RenderThread, RHI Thread, and the GPU is a complex topic. At the highest level UE4 is normally configured as a single frame-behind renderer. Meaning specifically that the GameThread may be processing Frame N+1 while the RenderThread is allowed to be processing Frame N or Frame N+1 (as commands come in) if the RenderThread is processing faster than the GameThread. The addition of the RHIThread complicates this slightly in that we allow the RenderingThread to move ahead of the RHIThread by about half a frame. Specifically, the RenderThread is allowed to complete the visibility calculations for Frame N+1 before waiting for the RHI Thread to complete Frame N. Thus for a GameThread on Frame N+1, the RenderThread may be processing commands for Frame N or Frame N+1, and the RHI Thread may also be translating commands from Frame N or Frame N+1 depending on execution times. These guarantees are arbitrated byFframeEndSyncand
FRHICommandListImmediate::RHIThreadFence.
Another useful guarantee is that no matter how the parallelization is configured, the order of Submission of commands to the GPU is unchanged from the order the commands would have been submitted in a single-threaded renderer. This is required for correctness and must be maintained during any code refactoring.
Debugging
There are various CVARs to control this behavior. Because many of these stages are orthogonal they can be independently disabled for testing, and new platforms can be brought up in stages as time allows. e.g.Command | Description |
---|---|
r.rhicmdusedeferredcontexts | Will control parallelization of the backend. |
r.rhicmduseparallelalgorithms | Will control parallelization of the frontend. |
r.rhithread.enable | Will disable the RHIThread completely. Commandlists will still be generated, they will just be translated directly on the RenderThread at certain points. |
r.rhicmdbypass | Can completely disable commandlist generation and make the renderer behave like it originally did, bypassing the commandlist and directly calling the RHI commands on the rendering thread This only takes effect after you have also disabled the RHI thread. |
![](https://docs.unrealengine.com/latest/images/Programming/Rendering/ParallelRendering/Parallel_Rendering_00.jpg)
原文:https://docs.unrealengine.com/latest/CHN/Programming/Rendering/ParallelRendering/index.html
相关文章推荐
- (转载)(官方)UE4--图像编程----着色器开发
- (转载)(官方)UE4--图像编程----着色器开发----异步计算(AsyncCompute)
- (转载)(官方)UE4--图像编程----着色器开发----HLSL 交叉编译器
- (转载)(官方)UE4--图像编程----线程渲染
- (转载)(官方)UE4--图像编程----FShaderCache
- (转载)(官方)UE4--图像编程----图形编程总览
- (转载)(官方)UE4--图像编程----粒子发射器技术指南
- (转载)(官方)UE4--图像编程----粒子模块技术指南
- (转载)(官方)UE4--图形编程
- (转载)VS2010/MFC编程入门之四十九(图形图像:CDC类及其屏幕绘图函数)
- 【数字图像处理】C++读取、旋转和保存bmp图像文件编程实现(转载)
- (转载)VS2010/MFC编程入门之五十(图形图像:GDI对象之画笔CPen)
- [转载]Net 4.0 Parallel 编程 -Task
- (转载)VS2010/MFC编程入门之五十一(图形图像:GDI对象之画刷CBrush)
- NVIDIA官方中文版GPU编程指南v2.20(转载自GZeasy)
- (转载)(官方)UE4--坐标空间术语
- 《转载》如何自学图像编程
- php面向对象(OOP)编程完全教程(转载笔记,有兴趣可以看看))
- VC6.0 串口编程转载
- 多媒体编程——ios摄像头图像抓取工具类