Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. (…

…microsoft#5831) * Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. * Address PR comments * More comments
pranav-prakash · Nov 18, 2020 · c2a993e · c2a993e
1 parent b3a6ed1
commit c2a993e
Show file tree

Hide file tree

Showing 3 changed files with 18 additions and 7 deletions.
diff --git a/docs/C_API.md b/docs/C_API.md
@@ -21,13 +21,22 @@ is as follows
    * Create env using ```CreateEnvWithGlobalThreadPools()```
    * Create session and call ```DisablePerSessionThreads()``` on the session options object
    * Call ```Run()``` as usual
-* **Share allocator(s) between sessions:** Allow multiple sessions in the same process to use the same allocator(s). This
-allocator is first registered in the env and then reused by all sessions that use the same env instance unless a session
-chooses to override this by setting ```session_state.use_env_allocators``` to "0". Usage of this feature is as follows
-   * Register an allocator created by ORT using the ```CreateAndRegisterAllocator``` API.
-   * Set ```session.use_env_allocators``` to "1" for each session that wants to use the env registered allocators.
-   * See test ```TestSharedAllocatorUsingCreateAndRegisterAllocator``` in
+* **Share allocator(s) between sessions:**
+   * *Description*: This feature allows multiple sessions in the same process to use the same allocator(s).
+   * *Scenario*: You've several sessions in the same process and see high memory usage. One of the reasons for this is as follows. Each session creates its own CPU allocator which is arena based by default. [ORT implements](onnxruntime/core/framework/bfc_arena.h) a simplified version of an arena allocator that is based on [Doug Lea's best-first with coalescing algorithm](http://gee.cs.oswego.edu/dl/html/malloc.html). Each allocator lives in its own session. It allocates a large region of memory during init time and thereafter it chunks, coalesces and extends this initial region as per allocation/deallocation demands. Overtime the arena ends up with unused chunks of memory per session. Moreover, the memory allocated by the arena is never returned to the system; once allocated it always remains allocated. All these factors add up when using multiple sessions (each with its own arena) thereby increasing the overall memory consumption of the process. Hence it becomes important to share the arena allocator between sessions.
+   * *Usage*:
+      * Create and register a shared allocator with the env using the ```CreateAndRegisterAllocator``` API. This allocator is then reused by all sessions that use the same env instance unless a session
+chooses to override this by setting ```session_state.use_env_allocators``` to "0".
+      * Set ```session.use_env_allocators``` to "1" for each session that wants to use the env registered allocators.
+      * See test ```TestSharedAllocatorUsingCreateAndRegisterAllocator``` in
      onnxruntime/test/shared_lib/test_inference.cc for an example.
+      * Configuring *OrtArenaCfg*:
+         * Default values for these configs can be found in the [BFCArena class](onnxruntime/core/framework/bfc_arena.h).
+         * ```initial_chunk_size_bytes```: This is the size of the region that the arena allocates first. Chunks are handed over to allocation requests from this region. If the logs show that the arena is getting extended a lot more than expected, you're better off choosing a big enough initial size for this.
+         * ```max_mem```: This is the maximum amount of memory the arena allocates. If a chunk cannot be serviced by any existing region, the arena extends itself by allocating one more region depending on available memory (max_mem - allocated_so_far). An error is returned if available memory is less than the requested extension.
+         * ```arena_extend_strategy```: This can take only 2 values currently: kSameAsRequested or kNextPowerOfTwo. As the name suggests kNextPowerOfTwo (the default) extends the arena by a power of 2, while kSameAsRequested extends by a size that is the same as the allocation request each time. kSameAsRequested is suited for more advanced configurations where you know the expected memory usage in advance.
+         * ```max_dead_bytes_per_chunk```: This controls whether a chunk is split to service an allocation request. Currently if the difference between the chunk size and requested size is less than this value, the chunk is not split. This has the potential to waste memory by keeping a part of the chunk unused (hence called dead bytes) throughout the process thereby increasing the memory usage (until this chunk is returned to the arena).
+
 * **Share initializer(s) between sessions:**
    * *Description*: This feature allows a user to share the same instance of an initializer across
 multiple sessions.

diff --git a/docs/ONNX_Runtime_Perf_Tuning.md b/docs/ONNX_Runtime_Perf_Tuning.md
@@ -151,6 +151,8 @@ The most widely used environment variables are:
   * ACTIVE will not yield CPU, instead it will have a while loop to check whether the next task is ready
   * Use PASSIVE if your CPU usage already high, and use ACTIVE when you want to trade CPU with latency
 
+## Using and configuring shared arena based allocator to reduce memory consumption between multiple sessions
+See `Share allocator(s) between sessions` section in [C API documentation](C_API.md).
 
 ## Troubleshooting model performance issues
 The answers below are troubleshooting suggestions based on common previous user-filed issues and questions. This list is by no means exhaustive and there is a lot of case-by-case fluctuation depending on the model and specific usage scenario. Please use this information to guide your troubleshooting, search through previously filed issues for related topics, and/or file a new issue if your problem is still not resolved.

diff --git a/include/onnxruntime/core/session/onnxruntime_c_api.h b/include/onnxruntime/core/session/onnxruntime_c_api.h
@@ -144,7 +144,7 @@ typedef enum OrtErrorCode {
 } OrtErrorCode;
 
 // This configures the arena based allocator used by ORT
-// See ONNX_Runtime_Perf_Tuning.md for details on what these mean and how to choose these values
+// See docs/C_API.md for details on what these mean and how to choose these values
 typedef struct OrtArenaCfg {
   size_t max_mem;                // use 0 to allow ORT to choose the default
   int arena_extend_strategy;     // use -1 to allow ORT to choose the default, 0 = kNextPowerOfTwo, 1 = kSameAsRequested