Add 'high level / low level' and 'tools' sections to the optimization article.

Ivorforce · Ivorforce · commit a357a9f5dd22 · 2025-10-13T11:22:08.000+02:00
diff --git a/engine/guidelines/optimization.rst b/engine/guidelines/optimization.rst
@@ -20,16 +20,13 @@ Choosing what to optimize
 -------------------------
 
 Predicting which code would benefit from optimization can be difficult without
-using performance analysis tools.
+using performance analysis `tools <#tools-for-optimization>`_.
 
 Oftentimes code that looks slow has no impact on overall performance, and code
 that looks like it should be fast has a huge impact on performance. Further,
 reasoning about why a certain chunk of code is slow is often impossible to do
 without detailed metrics (e.g. from a profiler). 
 
-Instructions on using some common profilers with Godot can be found `here
-<https://docs.godotengine.org/en/stable/engine_details/development/debugging/using_cpp_profilers.html>`_.
-
 As an example, you may optimize a chunk of code by caching intermediate values.
 However, if that code was slow due to memory constraints, caching the values and
 reading them later may be even slower than calculating them from scratch!
@@ -96,6 +93,89 @@ Once you have your baseline profile/benchmark, make your changes and rebuild the
 engine with the exact same build settings you used before. Then profile again
 and compare the results.
 
+High level vs low level optimization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Optimizing code is different between "high level" and "low level" code.
+
+"High level" code refers to code that heavily relies on frameworks and functions to
+perform its task. This is most of Godot's code. In high level code, it is often most
+important to avoid doing expensive work entirely. For example, by caching values
+rather than making duplicate calls, by avoiding copying data unnecessarily, or by
+replacing calls to expensive functions with calls to cheap functions.
+
+In contrast, "low level" code refers to code that is working mostly with C++ language
+features, such as primitive types and ``for``-loops. Optimizing low level code, often
+referred to as "micro-optimization", can be more difficult, because it requires intimate
+knowledge about C++ compiler intrinsics, as well as the inner workings of the CPU and RAM.
+Furthermore, improving low level code is often unintuitive, and can reduce its readability
+or robustness. We recommend against attempting to optimize low level code, unless you are
+a very experienced low level C++ programmer.
+
+.. note:::
+
+    For micro-optimizations, C++ compilers will often be aware of basic tricks and
+    will already perform them in optimized builds. Therefore, not all changes that
+    look like they should optimize the code will actually make the code faster.
+
+Tools for optimization
+~~~~~~~~~~~~~~~~~~~~~~
+
+Profilers
+^^^^^^^^^
+
+Profilers are the most important tool for everyone optimizing code.
+They show you which parts of the code are responsible for slow execution or heavy CPU load,
+and are therefore excellent for identifying what needs to be optimized. Profilers can
+also be used to identify whether the problem has been resolved, by profiling again after
+making the changes. Godot has a built-in profiler, but it does not provide very detailed
+information. Instead, use dedicated C++ profilers, which are
+`explained in the Godot documentation <https://docs.godotengine.org/en/stable/engine_details/development/debugging/using_cpp_profilers.html>`__.
+
+Benchmarks
+^^^^^^^^^^
+
+Benchmarks can be a great and simple tool to test the impact of your changes
+of an isolated piece of code. However, benchmarks can be deceptive: It's easy to
+accidentally write a benchmark that highlights a way in which performance was
+improved, while ignoring other ways in which it was made worse.
+
+To give one example: The most expensive operation of modern CPU programming is fetching RAM
+that is not in the cache. Benchmarks often test code with values that are already in the cache
+("hot" execution), but often, it is more important to optimize for the case where values are not
+in the cache yet ("cold" execution).
+
+Another common source of confusion is compiler optimization: One might write a benchmark that
+looks like it should test the code faithfully, but the benchmarks show no improvement. This might
+be indicative of a poorly written benchmark, which the compiler is able to "optimize away" by using
+`constant folding <https://en.wikipedia.org/wiki/Constant_folding>`__ and other tricks.
+For these, and other reasons, it is difficult to write good benchmarks. When using benchmarks to
+test the performance of your code, always be aware of its potential caveats, and try to familiarize
+yourself with good benchmark practices.
+
+To start writing benchmarks in Godot, use the following GDScript code template:
+
+.. code-block:: gdscript
+
+    var start = Time.get_ticks_msec()
+    var s := "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
+    for i in range(10000):
+        s.replace("e", "b")  # Benchmarks the 'replace' function.
+    print(Time.get_ticks_msec() - start, "ms")
+
+Alternatively, you can benchmark right from C++:
+
+.. code-block:: cpp
+
+    String s = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";
+
+    auto t0 = std::chrono::high_resolution_clock::now();
+    for (int i = 0; i < 100000; i ++) {
+        String s1 = s.replace("e", "b"); // Benchmarks the 'replace' function.
+    }
+    auto t1 = std::chrono::high_resolution_clock::now();
+    std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(t1 - t0).count() << "ms\n";
+
 .. note::
 
     Results will fluctuate, so you'll need to make your test project or
@@ -104,20 +184,29 @@ and compare the results.
     test multiple times, and observe how much the results fluctuate. Fluctuations of up
     to 10% are common and expected. The fastest run is usually the most accurate number.
 
+Assembly viewers
+^^^^^^^^^^^^^^^^
+
+Assembly viewers show the final compiled version of your code in a readable
+format called assembly. Examining the assembly can be an effective way to
+optimize low level code. It is not effective for optimization of high level
+code, and should often be the "last resort", when it is clear that other
+optimization tools are not applicable. Effectively working with assembly to
+optimize code requires an intimate understanding of the cost of individual
+instructions. Agner Fog's `software optimization resources <https://www.agner.org/optimize/>`__
+are invaluable for this, especially his `C++ optimization guide <https://agner.org/optimize/optimizing_cpp.pdf>`__.
+To view assembly, you either use an assembly viewer program for desktop, or write dedicated
+functions in the popular multi-architecture tool `Compiler Explorer <https://godbolt.org>`__.
+
 Pull request requirements
 -------------------------
 
 When making an optimization PR you should:
 
 - Explain why you chose to optimize this code (e.g. include the profiling result, link the issue report, etc.).
 - Show that you improved the code either by profiling again, or running systematic benchmarks.
+  See `tools <#tools-for-optimization>`__ for more info.
 - Test on multiple platforms where appropriate, especially mobile.
-- When micro-optimizing, show assembly before / after where appropriate.
-
-In particular, you should be aware that for micro-optimizations, C++ compilers will often
-be aware of basic tricks and will already perform them in optimized builds. This is why
-showing before / after assembly can be important in these cases.
-(`godbolt <https://godbolt.org/>`_ can be particularly useful for this purpose.)
 
 The most important point to get across in your PR is to highlight the source of
 the performance issues, and have a clear explanation for how your PR fixes that