[toc]
Unreal Source Explained (USE) is an Unreal source code analysis, based on profilers.
For more infomation, see the repo in github.
Unreal has these several important threads:
- Game thread
- Main thread
- Task Threads
- Render thread (maybe with the additional RHI thread)
- File I/O threads
- Mipmap streaming calculations
- etc.
This following image is the threads overview in the profiler. Threads are sorted by their CPU Time, which usually infer their importance.
We'll briefly discuss some important threads below.
see "
[IOSAppDelegate MainAppThread:]
" in the above thread overview image.
Game thread's main mission is running FEngineLoop
(link), including its initialization PreInit()
(link) and tick Tick()
(link).
Every game is running frame by frame. Inside one frame, several submodules are called sequentially. This routine is known as Game Loop.
FEngineLoop
is Unreal's game loop. Each time FEngineLoop::Tick()
is called, that means one new frame has just begun.
Note that in Unreal, game thread's name is [IOSAppDelegate MainAppThread:]
, it's Unreal's "main thread", do not confuse with the process's main thread.
see "
Main Thread
" in the above thread overview image.
This thread is the iOS process's main thread, it's the first thread that gets created and the entry point gets called.
In Unreal, Main thread doesn't carry out heavy jobs, it just handles some native iOS messages, such as touch event.
Unreal has several ways to assign tasks to threads, these threads are called Task Threads. This task threads management will be discussed in future chapters.
These following threads are implemented as task threads.
see "
FRenderingThread::Run()
" in the above thread overview image.
Render thread calls FRenderingThread::Run()
(link), and takes charge of all the rendering tasks, such as updating pritimitives' transform, updating particle systems, drawing slate ui elements, etc. These rendering tasks usually update and prepare rendering data for the GPU to run.
Render thread and the game thread are usually the heaviest 2 threads in most games. You can see the render thread is actually the heaviest thread in this profiling.
Render thread is created in FEngineLoop::PreInit()
(link). You can observe the thread creation in the Allocation profiler, because each thread creation comes along with some amount of thread stack memory allocation.
Note the thread creation call stack is reversed, the caller is under the callee.
Notice that sometimes you can see there seems to be another thread running FRenderingThread::Run()
in the Time Profiler, this is because render thread will be recreated during viewport resizes(link), and the Time Profiler captures both the destroyed and recreated render threads. There is only one render thread at any given time.
What's more, Unreal can be Parallel Rendering with the RHI (Render Hardware Interface) thread, which translates the render thread's render commands into specific device GPU commands. This RHI thread may improve performance in some platform.
However, in iOS the RHI thread is disabled, because GRHISupportsRHIThread
(link) and bSupportsRHIThread
(link) is disabled. Unreal has this comment(link):
/** Parallel execution is available on Mac but not iOS for the moment - it needs to be tested because it isn't cost-free */
You might modify the source code to enable the RHI thread in mobile devices with proper device capability test.
see "
FAsyncTask<FGenericReadRequestWorker>::DoThreadedWork()
" in the above thread overview image.
As you can see in the image above, Unreal is initialized by two main steps: FEngineLoop::PreInit()
(link) and FEngineLoop::Init()
(link). They are called in FAppEntry::Init()
(link) in iOS, and AndroidMain()
(link) in Android.
You may think PreInit()
is the low-level initializaiton and Init()
is the high-level.
Note in Unreal there are two ways to manage submodules: Module and Plugins. Module conatains only code, while Plugin can contain assets and/or codes.
To name a few things get initialized in PreInit()
, in order:
- load core module(link):
LoadCoreModules()
(link) for "CoreUObject"; - fundamental modules(link):
LoadPreInitModules()
(link) for "Engine", "Renderer", "AnimGraphRuntime", "Landscape", "RenderCore"; - "application-like" modules(link):
FEngineLoop::AppInit()
(link) - scalability(link):
InitScalabilitySystem()
(link). Scalability adjusts the quality of various features in order to maintain the best performance for your game on different platforms and hardware; - game physics(link):
InitGamePhys()
(link) - slate application(link):
FSlateApplication::Create()
(link) - RHI(link):
RHIInit()
(link) - global shaders resources(link):
CompileGlobalShaderMap()
(link) - render thread (link):
StartRenderingThread()
(link); - most
UObject
s' reflection data(link):ProcessNewlyLoadedUObjects()
(link); - start-up modules: (link):
LoadStartupCoreModules()
(link): "Core", "Networking", "Messaging", "Slate", "UMG"; - load task graph module(link);
- engine and game localizaiton(link);
and Init()
initializes these in order:
- Create the high-level game engine objects(link):
UGameEngine::Init()
(link)UEngine::Init()
(link),UEngine
is abstract base class ofUGameEngine
andUEdtiorEngine
, and is responsible for management of systems critical to editor or game systems.;UGameUserSettings::LoadSettings()
(link);UGameInstance
(link),UGameInstance
is high-level manager object for an instance of the running game- create
UWorld
inUGameInstance::InitializeStandalone()
(link),UWorld
is the top level object representing a map or a sandbox in which Actors and Components will exist and be rendered;
- create
UGameViewportClient
(link)
- and start the high level game engine(link):
UGameEngine::Start()
(link)
Most (near all) Z_Construct_UClass_XXX()
fuctions are called only in the initialization stage via ProcessNewlyLoadedUObjects()
(link).
Z_Construct_UClass_XXX()
are functions that construct the Unreal intrinsic "class reflection data". These functions' code are generated by macro in IMPLEMENT_INTRINSIC_CLASS
(link):
#define IMPLEMENT_INTRINSIC_CLASS(TClass, TRequiredAPI, TSuperClass, TSuperRequiredAPI, TPackage, InitCode) \
IMPLEMENT_CLASS(TClass, 0) \
TRequiredAPI UClass* Z_Construct_UClass_##TClass(); \
struct Z_Construct_UClass_##TClass##_Statics \
{ \
static UClass* Construct() \
{ \
extern TSuperRequiredAPI UClass* Z_Construct_UClass_##TSuperClass(); \
UClass* SuperClass = Z_Construct_UClass_##TSuperClass(); \
UClass* Class = TClass::StaticClass(); \
UObjectForceRegistration(Class); \
check(Class->GetSuperClass() == SuperClass); \
InitCode \
Class->StaticLink(); \
return Class; \
} \
}; \
UClass* Z_Construct_UClass_##TClass() \
{ \
static UClass* Class = NULL; \
if (!Class) \
{ \
Class = Z_Construct_UClass_##TClass##_Statics::Construct();\
} \
check(Class->GetClass()); \
return Class; \
} \
...
Inside FEngineLoop::Tick()
(link), there are many hardcoded submodules' ticks get called sequetially. This following image is the tick overview, however, it's sorted by the CPU Time, not the calling order.
Calling order is important, it's one of the reasons that lead to one-frame-off bugs.
The general rule is: if statusa is depended by statusb, then statusa should gets updated earlier than statusb inside one frame.
This seems to be easy, but if there are lots of status, and the dependecies are complicated, it needs lots of effort to achieve correct status update order.
But luckily, lots status dependecy don't care correct update order at all because their one-frame-off usually dont't result in visually noticeable motion. For other crucial status (e.g., camera, character), they still demands correct update order.
So, here is some important call extractions from FEngineLoop::Tick()
, sorted by the calling order:
- broadcast the frame begin event(link):
FCoreDelegates::OnBeginFrame
(link); - update the time stamp, max tick rate of this frame(link):
UEngine::UpdateTimeAndHandleMaxTickRate()
(link) - get the input data from the OS(link) :
FIOSApplication::PollGameDeviceState()
(link); - update the world of objects!(link):
UGameEngine::Tick()
(link), this is the most important call among others; - process slate operations accumulated in the world ticks(link):
FEngineLoop::ProcessLocalPlayerSlateOperations()
(link); - rearrange and paint the UI(link):
FSlateApplication::Tick()
(link); - custom registered tick is called(link):
FTicker::Tick()
(link) - broadcast the frame end event(link):
FCoreDelegates::OnEndFrame
(link)
Task Graph is an thread pool implementation in Unreal. All kinds of tasks are scheduled among a pool of threads.
The following image is the call stacks filtered by "TGraphTask", you may notice the both the render thread and the game thread use the task graph to accomplish many specific tasks.
Most heap memory is allocated via FMallocBinned::Malloc()
(link).
FMallocBinned
is commentted as "Optimized virtual memory allocator", it's actually implemented as Memory Pool, where objects with specific size (8B, 16B, ..., 32KB)(link) is allocated from corresponding pool(link). This can help to reduce memory fragmentation to some degree.
Allocation is thread-safe and locked for the specific pool.(link)
Some engine (e.g. Unity) uses Global overloaded new operator to hook the new
opeartor and make its own custom memory management.
But in Unreal, only Windows overrides the global operator new()
(link), which means, unlike Unity, your code's new
operator is not managed by the engine, and is just the plain c++ new
.
#ifdef OVERRIDE_NEW_DELETE
#if defined(_WIN32) || defined(_WIN64) || defined(WIN32) || defined(WIN64) || defined(__WIN32__) || defined(__WINDOWS__)
#include <malloc.h>
void* operator new(size_t size)
{
void* p = malloc(size);
MEMPRO_TRACK_ALLOC(p, size);
return p;
}
void operator delete(void* p)
{
MEMPRO_TRACK_FREE(p);
free(p);
}
...
#endif
#endif
Blueprints is Unreal's visual scripting, it is usually used to write some high-level logic, such as gameplay, UI, etc.
Blueprints is event driven, and they usually look like this:
The above image is the a ActionRPG's BP_PlayerController blueprints event graphs.
There are two red titled nodes, which are the events node: InputAction Pause and InputAction Inventory. Event nodes are the start porint of a graph, hence, there are two graphs in the image. The top graph is handling the logic when the pause event triggers, and the bottom graph is handling the inventory.
Like Java and C# having their process virtual machine (or just virtual machine, VM), Blueprints is also running on a Unreal implemented virtual machine.
And the following image is the native call stacks of all blueprints of ActionRPG, including the above BP_PlayerController:
Important calls are highlighted. You may observe:
- they all start from
UObject::ProcessEvent()
, and end withFFrame::StepXX()
; FFrame
appears as the parameter all the way along the call stacks;
FFrame
(link) is the most important class for the blueprints VM. It should have a better name FCallStackFrame to emphasize its relationship with VM and call stack frame: each stack frame corresponds to a call to a subroutine wich has not yet termined with a return.
Anyway, don't confuse that FFrame
has nothing to do with rendering frame.
Here is the key fields and methods of FFrame
, each of its field is additionally commented:
//
// Information about script execution at one stack level.
//
struct FFrame : public FOutputDevice
{
public:
// Variables.
// the function that is executing
UFunction* Node;
// the object that is executing ("this")
UObject* Object;
uint8* Code;
uint8* Locals;
/** Previous frame on the stack */
FFrame* PreviousFrame;
/** contains information on any out parameters */
FOutParmRec* OutParms;
/** Currently executed native function */
UFunction* CurrentNativeFunction;
...
public:
// Constructors.
FFrame( UObject* InObject, UFunction* InNode, void* InLocals, FFrame* InPreviousFrame = NULL, UField* InPropertyChainForCompiledIn = NULL );
...
// Functions.
COREUOBJECT_API void Step( UObject* Context, RESULT_DECL );
...
};
A FFrame
holds the UObject* Object
as the object that is executing, UFunction* Node
and uses the FFrame* PreviousFrame
to link to the previous stack frame.
There is actually no a concrete stack container of FFrame
in runtime. The only two places where create the new FFrame
are UObject::ProcessEvent()
(link) and ProcessScriptFunction()
(link):
/*-----------------------------
Virtual Machine
-----------------------------*/
/** Called by VM to execute a UFunction with a filled in UStruct of parameters */
void UObject::ProcessEvent( UFunction* Function, void* Parms )
{
...
uint8* Frame = NULL;
...
const bool bUsePersistentFrame = (NULL != Frame);
if (!bUsePersistentFrame)
{
Frame = (uint8*)FMemory_Alloca(Function->PropertiesSize);
// zero the local property memory
FMemory::Memzero(Frame + Function->ParmsSize, Function->PropertiesSize - Function->ParmsSize);
}
// initialize the parameter properties
FMemory::Memcpy(Frame, Parms, Function->ParmsSize);
// Create a new local execution stack.
FFrame NewStack(this, Function, Frame, NULL, Function->Children);
...
// Call native function or UObject::ProcessInternal.
Function->Invoke(this, NewStack, ReturnValueAddress);
...
}
// Helper function to set up a script function, and then execute it using ExecFtor.
// ...
template<typename Exec>
void ProcessScriptFunction(UObject* Context, UFunction* Function, FFrame& Stack, RESULT_DECL, Exec ExecFtor)
{
...
FFrame NewStack(Context, Function, nullptr, &Stack, Function->Children);
...
if( Function->Script.Num() > 0)
{
// Execute the code.
ExecFtor( Context, NewStack, RESULT_PARAM );
}
...
}
For better support of massive renderers, GPU driven pipeline and ray-tracing, Epic has refactored and introduce a new mesh drawing pipeline in 4.22.
It's disabled by default in mobile, you can enable it by setting r.Mobile.SupportGPUScene=1
in your project's DefaultEngine.ini.
FMeshBatch
cantains the vertex buffer and material.
FMeshDrawCommand
fully describes a mesh pass draw call, captured just above the RHI
/**
* A batch of mesh elements, all with the same material and vertex buffer
*/
struct FMeshBatch
{
TArray<FMeshBatchElement,TInlineAllocator<1> > Elements;
...
uint32 ReverseCulling : 1;
uint32 bDisableBackfaceCulling : 1;
/**
* Pass feature relevance flags. Allows a proxy to submit fast representations for passes which can take advantage of it,
* for example separate index buffer for depth-only rendering since vertices can be merged based on position and ignore UV differences.
*/
uint32 CastShadow : 1; // Whether it can be used in shadow renderpasses.
uint32 bUseForMaterial : 1; // Whether it can be used in renderpasses requiring material outputs.
uint32 bUseForDepthPass : 1; // Whether it can be used in depth pass.
uint32 bUseAsOccluder : 1; // Hint whether this mesh is a good occluder.
...
/** Vertex factory for rendering, required. */
const FVertexFactory* VertexFactory;
/** Material proxy for rendering, required. */
const FMaterialRenderProxy* MaterialRenderProxy;
...
};