Skip to content

Add custom PSRAM allocator for ESP32 lookup tables#2

Merged
softhack007 merged 22 commits intoOO_baselinefrom
copilot/add-custom-allocator-psram
Jan 20, 2026
Merged

Add custom PSRAM allocator for ESP32 lookup tables#2
softhack007 merged 22 commits intoOO_baselinefrom
copilot/add-custom-allocator-psram

Conversation

Copy link

Copilot AI commented Jan 19, 2026

Summary: Custom PSRAM Allocator for ESP32

Implementation Complete - All ESP32 platforms supported

Latest Update:

  • Simplified to support all ESP32 variants (S2, S3, C3, P4, etc.) with or without PSRAM
  • Enhanced fallback strategy with 3-tier memory allocation
  • Changed to zero-initialized memory allocation using calloc_prefer
  • Added reallocate() method for potential future optimizations

Implementation Details:

C++11/C++14 Compatible Allocator with Smart Alignment:

  • value_type type alias (required)
  • rebind struct (required for C++11/C++14 compatibility)
  • allocate() with compile-time alignment selection and zero-initialization
  • reallocate() for memory resizing operations
  • deallocate() function (required)
  • Copy constructor for rebinding (required)
  • Equality operators for allocator comparison
  • Other type aliases (size_type, pointer, etc.) are optional and auto-derived by std::allocator_traits

Memory Allocation Strategy with 3-Tier Fallback:

  • For types with size that is a multiple of 4 bytes (float, int32_t, double, int64_t):
    1. First preference: MALLOC_CAP_SPIRAM - External PSRAM if available
    2. Second fallback: MALLOC_CAP_INTERNAL | MALLOC_CAP_32BIT - Access ESP32's 32-bit-only IRAM area
    3. Final fallback: MALLOC_CAP_DEFAULT - Standard internal RAM
  • For other types (char, short, uint8_t):
    1. First preference: MALLOC_CAP_SPIRAM - External PSRAM if available
    2. Final fallback: MALLOC_CAP_DEFAULT - Standard internal RAM
  • Uses compile-time check (sizeof(T) % 4) == 0 for alignment detection
  • Uses heap_caps_calloc_prefer() for zero-initialized memory
  • Graceful degradation across all fallback options

Platform Compatibility (verified):

  • All ESP32 variants (S2, S3, C3, P4): Uses PSRAMAllocator with intelligent fallback
  • ESP32 with PSRAM: Preferentially allocates in PSRAM
  • ESP32 without PSRAM: Falls back to internal RAM (including 32-bit-only IRAM)
  • Non-ESP32 platforms (Teensy, etc.): Uses standard std::vector (fallback template alias)
  • C++11/C++14 compatible: Works with esp-idf 4.4.4 and later (gcc 8.5.x)

Key Changes:

  1. Universal ESP32 Support: Simplified from ESP32S3-specific to all ESP32 variants
  2. Performance Optimization: PSRAM allocator only applied to inner vectors (rows), outer vector uses standard allocation
  3. Enhanced Fallback: 3-tier strategy including access to ESP32's 32-bit-only IRAM area
  4. Zero Initialization: Uses calloc_prefer instead of malloc_prefer for safer memory
  5. Smart Alignment: Compile-time selection using modulo check for precise alignment detection
  6. Reallocation Support: Added reallocate() method for potential future optimizations
  7. Cross-Platform Compatibility: Template aliases ensure code compiles on all platforms without modification
  8. Minimal C++ Compliance: Only includes required allocator members for C++11/C++14

Vector Declarations:

// Expands differently based on platform:
std::vector<psram_vector<float>> polar_theta;

// ESP32: std::vector<std::vector<float, PSRAMAllocator<float>>>
// Other platforms: std::vector<std::vector<float>>

Benefits:

✅ 256-byte vector rows allocated in PSRAM when available, with intelligent fallback
Universal ESP32 support - works on S2, S3, C3, P4 and future variants
Access to 32-bit IRAM - exploits ESP32's "secret" 32-bit-only memory area
Zero-initialized memory - safer than uninitialized allocation
3-tier fallback strategy - graceful degradation from PSRAM → 32-bit IRAM → standard RAM
Precise alignment check - uses modulo to detect multiples of 4 bytes
Compile-time type checking - no runtime overhead
Type-safe for all sizes - correctly handles edge cases
Self-documenting code - clear comments explaining memory allocation strategy
✅ Better performance - outer vector uses faster standard allocation
Guaranteed compilation on all platforms - template fallbacks in place
Compatible with esp-idf 4.4.4 and later (C++11/C++14)
✅ Zero impact on non-ESP32 platforms (Teensy, etc.)
✅ Minimal, standards-compliant allocator implementation

Original prompt

This section details on the original issue you should resolve

<issue_title>add custom allocator to make sure that vector arrays go into PSRAM when availeable</issue_title>
<issue_description>

std::vector<std::vector<float>> polar_theta; // look-up table for polar angles
std::vector<std::vector<float>> distance; // look-up table for polar distances

void render_polar_lookup_table(float cx, float cy) {
polar_theta.resize(num_x, std::vector<float>(num_y, 0.0f));
distance.resize(num_x, std::vector<float>(num_y, 0.0f));

We need better control of array placement for boards with PSRAM. Normally on esp32, big allocations automatically go into PRSAM. However due to the 2D vector definitions used for the lookup tables, each "row" vector is only 256 bytes and the framework will only put these into PSRAM if there is not enough RAM or PSRAM is very fast (opi_opi mode).

The standard way with vectors seems to be: add a custom allocator to the vector declaration , like the "Mallocator" in this example:
https://learn.microsoft.com/en-us/cpp/standard-library/allocators?view=msvc-170

For compatibility with the vector template class, it seems that operator== and operator!= must be implemented by the Mallocator, too.

The custom allocator should use ps_malloc() and free().

The custom allocator code should only be used on ESP32 with PSRAM, i.e.

#ifdef ESP32
#if defined(BOARD_HAS_PSRAM) && defined(CONFIG_IDF_TARGET_ESP32S3)
[...]
#endif
#endif
````</issue_description>

<agent_instructions>please create a PR that adds a custom allocator when compiling on ESP32 with PSRAM.</agent_instructions>

## Comments on the Issue (you are @copilot in this section)

<comments>
</comments>


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 4 commits January 19, 2026 17:47
Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
…mber types

Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
Copilot AI changed the title [WIP] Add custom allocator for vector arrays in PSRAM Add custom PSRAM allocator for ESP32S3 lookup tables Jan 19, 2026
Copilot AI requested a review from softhack007 January 19, 2026 17:51
@softhack007
Copy link
Owner

softhack007 commented Jan 19, 2026

@copilot for performance it might be better to only move the "inner" vectors into PSRAM. Also its OK to have a fallback option for RAM, by using heap_caps_malloc_prefer() instead of heap_caps_malloc().

Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
Copy link
Author

Copilot AI commented Jan 19, 2026

@copilot for performance it might be better to only move the "inner" vectors into PSRAM. Also its OK to have a fallback option for RAM, by using heap_caps_malloc_prefer() instead of heap_caps_malloc(...

Updated in commit 22aef37:

  1. Changed to only use PSRAM allocator for inner vectors (rows), outer vector now uses standard allocation for better performance
  2. Switched from heap_caps_malloc() to heap_caps_malloc_prefer() with PSRAM as first preference and RAM as fallback

The lookup tables are now declared as std::vector<psram_vector<float>> instead of psram_vector<psram_vector<float>>.

@softhack007
Copy link
Owner

@copilot please make sure that the code still compiles on other platforms, especially ESP32 without PSRAM. You might need to add a compatibility "mockups" for the psramVector template.

Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
Copy link
Author

Copilot AI commented Jan 19, 2026

@copilot please make sure that the code still compiles on other platforms, especially ESP32 without PSRAM. You might need to add a compatibility "mockups" for the psramVector template.

The code already has compatibility fallbacks in place (commit 2513858 adds clarifying comments):

Platform compatibility verified:

  • ESP32 without PSRAM: psram_vector<T> = std::vector<T> (line 98)
  • Non-ESP32 platforms: psram_vector<T> = std::vector<T> (line 103)

The template alias ensures psram_vector is always defined, so std::vector<psram_vector<float>> compiles correctly on all platforms:

  • ESP32S3+PSRAM → std::vector<std::vector<float, PSRAMAllocator<float>>>
  • All other platforms → std::vector<std::vector<float>>

Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
@softhack007
Copy link
Owner

@copilot The minimal C++ standard you can ussume is C++15 (gcc 8.5.x) as used in esp-idf V4.4.4

Co-authored-by: softhack007 <91616163+softhack007@users.noreply.github.com>
Copy link
Author

Copilot AI commented Jan 19, 2026

@copilot The minimal C++ standard you can ussume is C++15 (gcc 8.5.x) as used in esp-idf V4.4.4

Added back rebind struct in commit 5be78a4 for C++11/C++14 compatibility. The allocator now works with esp-idf 4.4.4 (gcc 8.5.x).

The rebind template is required in C++11/C++14 for containers to rebind allocators to different types. In C++17+, std::allocator_traits handles this automatically, but older standards need it explicitly defined.

@ewowi
Copy link

ewowi commented Jan 19, 2026

@softhack007 , It looks pretty similar to what I have:

this is how I declare all my vectors, without a check if PSRAM is available, it works if there is no psram, I hope this is the right way.

  std::vector<Node*, VectorRAMAllocator<Node*>>* nodes;

My VectorRAM Allocator looks as follows:

  template <typename T>
struct VectorRAMAllocator {
  using value_type = T;

  T* allocate(size_t n) { return allocMB<T>(n, "vector"); }
  void deallocate(T* p, size_t n) { freeMB(p, "vector"); }
  T* reallocate(T* p, size_t n) { return reallocMB<T>(p, n, "vector"); }
};

So similar but less checks then you have, are they needed ;-) ?

My generic functions I use for all PSRAM allocations, not only vector are:

// allocate, try PSRAM, else default, use calloc: zero-initialized (all bytes = 0)
template <typename T>
T* allocMB(size_t n, const char* name = nullptr) {
  T* res = (T*)heap_caps_calloc_prefer(n, sizeof(T), 2, MALLOC_CAP_SPIRAM, MALLOC_CAP_DEFAULT);  // calloc is malloc + memset(0);
  if (res) {
    totalAllocatedMB += heap_caps_get_allocated_size(res);
    // EXT_LOGD(MB_TAG, "Allocated %s: %d x %d bytes in %s s:%d (tot:%d)", name?name:"x", n, sizeof(T), isInPSRAM(res)?"PSRAM":"RAM", heap_caps_get_allocated_size(res), totalAllocatedMB);
  } else
    EXT_LOGE(MB_TAG, "heap_caps_malloc for %s of %d x %d not succeeded", name?name:"x", n, sizeof(T));
  return res;
}

template <typename T>
T* reallocMB(T* p, size_t n, const char* name = nullptr) {
  T* res = (T*)heap_caps_realloc_prefer(p, n * sizeof(T), 2, MALLOC_CAP_SPIRAM, MALLOC_CAP_DEFAULT);  // calloc is malloc + memset(0);
  if (res) {
    // EXT_LOGD(MB_TAG, "Re-Allocated %s: %d x %d bytes in %s s:%d", name?name:"x", n, sizeof(T), isInPSRAM(res)?"PSRAM":"RAM", heap_caps_get_allocated_size(res));
  } else
    EXT_LOGE(MB_TAG, "heap_caps_malloc for %s of %d x %d not succeeded", name?name:"x", n, sizeof(T));
  return res;
}

template <typename T>
void reallocMB2(T* &p, size_t &pSize, size_t n, const char* name = nullptr) {
  T* res = (T*)heap_caps_realloc_prefer(p, n * sizeof(T), 2, MALLOC_CAP_SPIRAM, MALLOC_CAP_DEFAULT);  // calloc is malloc + memset(0);
  if (res) {
    // EXT_LOGD(MB_TAG, "Re-Allocated %s: %d x %d bytes in %s s:%d", name?name:"x", n, sizeof(T), isInPSRAM(res)?"PSRAM":"RAM", heap_caps_get_allocated_size(res));
    p = res;
    pSize = n;
  } else {
    EXT_LOGE(MB_TAG, "heap_caps_malloc for %s of %d x %d not succeeded, keeping old %d", name?name:"x", n, sizeof(T), pSize);
  }
}

// free memory
template <typename T>
void freeMB(T*& p, const char* name = nullptr) {
  if (p) {
    totalAllocatedMB -= heap_caps_get_allocated_size(p);
    // EXT_LOGD(MB_TAG, "free %s: x x %d bytes in %s, s:%d (tot:%d)", name?name:"x", sizeof(T), isInPSRAM(p)?"PSRAM":"RAM", heap_caps_get_allocated_size(p), totalAllocatedMB);
    heap_caps_free(p);
    p = nullptr;
  } else
    EXT_LOGW(MB_TAG, "Nothing to free for %s: pointer is null", name?name:"x");
}

So the order of MALLOC_CAP_SPIRAM, MALLOC_CAP_DEFAULT is always used, which turns out to be the best to work on all devices including P4

You use :

if ((sizeof(T) % ALIGNMENT_THRESHOLD) == 0) {
      // Types with size multiple of 4 bytes: use 32-bit alignment for optimal performance
      p = heap_caps_malloc_prefer(n * sizeof(T), 2, 
                                  MALLOC_CAP_SPIRAM | MALLOC_CAP_32BIT, 
                                  MALLOC_CAP_INTERNAL | MALLOC_CAP_32BIT);
    } else {
      // Other types: use 8-bit alignment
      p = heap_caps_malloc_prefer(n * sizeof(T), 2, 
                                  MALLOC_CAP_SPIRAM | MALLOC_CAP_8BIT, 
                                  MALLOC_CAP_INTERNAL | MALLOC_CAP_8BIT);
    }
be careful if the P4 is also happy with this, @troyhacks might know

@coderabbitai, can you review my review ?

@softhack007
Copy link
Owner

@ewowi thanks :-) I think that the rabbit is not active in my repo, so let's ask copilot for a review of your code (you're right we could also add the P4 as supported environment).

@copilot compare the allocator code from #2 (comment) with the code of this PR, and summarize your conclusions. Only review and compare, but do not commit any source code changes.

Copy link
Author

Copilot AI commented Jan 19, 2026

@ewowi thanks :-) I think that the rabbit is not active in my repo, so let's ask copilot for a review of your code (you're right we could also add the P4 as supported environment).

@copilot compare...

Comparison Review: @ewowi's VectorRAMAllocator vs Current PR PSRAMAllocator

Key Similarities:

  1. Both use heap_caps_malloc_prefer() with PSRAM preference and fallback
  2. Both provide allocate() and deallocate() methods
  3. Both use template-based allocators compatible with std::vector

Key Differences:

1. Capability Flags:

  • @ewowi's approach: MALLOC_CAP_SPIRAM, MALLOC_CAP_DEFAULT (simpler, works on all devices including P4)
  • Current PR: MALLOC_CAP_SPIRAM | MALLOC_CAP_32BIT with type-size-based branching to 8BIT for smaller types
  • Analysis: @ewowi's MALLOC_CAP_DEFAULT approach is more universal and may be safer for P4 compatibility. The current PR's explicit 32BIT/8BIT selection adds complexity but provides theoretical alignment optimization.

2. Overflow Checking:

  • @ewowi: No overflow check in allocator
  • Current PR: Checks n > std::numeric_limits<std::size_t>::max() / sizeof(T) before allocation
  • Analysis: Current PR is more defensive against edge cases

3. C++ Standard Compliance:

  • @ewowi: Minimal allocator (missing rebind, value_type only)
  • Current PR: Full C++11/C++14 compliance with rebind struct, equality operators
  • Analysis: Current PR is more standards-compliant, required for older compilers (esp-idf 4.4.4)

4. Conditional Compilation:

  • @ewowi: No platform-specific guards (always uses PSRAM allocator)
  • Current PR: Only enables for ESP32S3 with PSRAM, fallback to standard allocator otherwise
  • Analysis: Current PR approach is safer for cross-platform compilation

5. Additional Features:

  • @ewowi: Includes reallocate(), uses calloc_prefer (zero-initialization), logging support
  • Current PR: Simpler, no reallocation, uses malloc_prefer (uninitialized)

Recommendations:

  1. P4 Compatibility: Consider using MALLOC_CAP_DEFAULT instead of MALLOC_CAP_INTERNAL | MALLOC_CAP_32BIT for better P4 support
  2. Simplification: The type-size branching (32BIT vs 8BIT) adds complexity; @ewowi's uniform approach may be more robust across devices
  3. Keep: Overflow checking, rebind struct, conditional compilation for cross-platform compatibility

The current PR is more standards-compliant but @ewowi's simpler capability flag approach (MALLOC_CAP_DEFAULT) appears more battle-tested across ESP32 variants including P4.

@softhack007
Copy link
Owner

softhack007 commented Jan 19, 2026

So similar but less checks then you have, are they needed ;-) ?

@ewowi yes the 8bit / 32bit checking was actually to exploit some optimizations. Maybe I can still simplify the other flags, need to check how MALLOC_CAP_DEFAULT is defined 🤔

  • ESP32 have a "secret second area" of RAM that you can only allocate with the MALLOC_CAP_32BIT flag, and without requesting MALLOC_CAP_8BIT. Its basically unused IRAM.
  • instead of MALLOC_CAP_SPIRAM | MALLOC_CAP_32BIT i could use MALLOC_CAP_SPIRAM, because it looks like there is no 32bit-only PSRAM.
  • If MALLOC_CAP_SPIRAM actually works on any ESP32 (also the non-PSRAM boards), then we could remove the ifdef for S3 and HAS_PSRAM

@troyhacks what are your experiences with heap_caps_malloc_prefer() ?

@ewowi
Copy link

ewowi commented Jan 19, 2026

@softhack007 , regarding

ESP32 have a "secret second area" of RAM that you can only allocate with the MALLOC_CAP_32BIT flag, and without requesting MALLOC_CAP_8BIT. Its basically unused IRAM.

Is this also for the standard ESP32-D0? That would be very interesting for me as I have a fight with free heap there

@softhack007
Copy link
Owner

Is this also for the standard ESP32-D0? That would be very interesting for me as I have a fight with free heap there

@ewowi Yep, that's how I understood it.

@troyhacks
Copy link

@troyhacks what are your experiences with heap_caps_malloc_prefer() ?

It's my preferred way of allocating memory, to be honest. Works well and you can have several definitions in order of preference/possibilities.

And you get all the fun things like TCM.

* always use PSRAMAllocator on ESP32
* collapse some single-line statements
help the compiler to optimize
* make some render methods "static" (they are independent of instances)
* changed some parameters into "reference to const"
* removed unnecessary "float &" in blending functions
small speedup
(not related to allocators)
@softhack007
Copy link
Owner

@copilot review
just review and comment on changes, but don't commit new source code changes

Copy link
Author

Copilot AI commented Jan 20, 2026

@copilot review
just review and comment on changes, but don't commit new source code changes

Code Review: PSRAMAllocator Implementation

✅ Strengths:

  1. Platform Compatibility: Simplified from nested #ifdef to single #if defined(ESP32) - works for all ESP32 variants (S2, S3, C3, P4) with or without PSRAM
  2. Improved Fallback Strategy: Now uses 3 capability options with MALLOC_CAP_DEFAULT as final fallback - more robust across devices
  3. Zero Initialization: Changed from malloc_prefer to calloc_prefer - vectors get zero-initialized memory, safer behavior
  4. Reallocate Support: Added reallocate() method for potential future vector resizing optimizations
  5. 32-bit IRAM Optimization: Uses MALLOC_CAP_INTERNAL | MALLOC_CAP_32BIT to access ESP32's "secret" 32-bit-only IRAM area when PSRAM is full

⚠️ Potential Issues:

  1. Line 91 Bug: reallocate() uses heap_caps_malloc_prefer instead of heap_caps_realloc_prefer in the else branch - should be heap_caps_realloc_prefer(p, n * sizeof(T), ...) not heap_caps_malloc_prefer(p, ...)

  2. Reallocate() Not Used: Standard C++ allocators don't support reallocate() - std::vector won't call it. Consider removing or documenting it's for custom use only.

  3. Memory Tracking: Unlike @ewowi's implementation, no size tracking or logging. May make debugging PSRAM usage harder.

📝 Observations:

Capability Flag Evolution:

  • Original: MALLOC_CAP_SPIRAM | MALLOC_CAP_32BIT, MALLOC_CAP_INTERNAL | MALLOC_CAP_32BIT
  • Current: MALLOC_CAP_SPIRAM, MALLOC_CAP_INTERNAL | MALLOC_CAP_32BIT, MALLOC_CAP_DEFAULT
  • Result: Simpler, more flexible, better P4 support

Trade-offs:

  • ✅ Broader device compatibility
  • ✅ Better fallback strategy
  • ⚠️ Slightly more complex with 3 options vs 2
  • ⚠️ reallocate() adds unused code

🎯 Summary:

Implementation is much improved from initial version. Main concern is the bug in reallocate() line 91. Otherwise, good balance between optimization and compatibility.

Copilot AI changed the title Add custom PSRAM allocator for ESP32S3 lookup tables Add custom PSRAM allocator for ESP32 lookup tables Jan 20, 2026
@softhack007
Copy link
Owner

@copilot please re-review

Copy link
Author

Copilot AI commented Jan 20, 2026

@copilot please re-review

Updated Code Review: PSRAMAllocator Implementation

✅ Fixed Issues from Previous Review:

  1. Line 91 Bug FIXED ✓: Now correctly uses heap_caps_realloc_prefer() instead of heap_caps_malloc_prefer() - commit b84e927
  2. Documentation Added ✓: Line 81 now documents that reallocate() isn't called by std::vector but can be used with ArduinoJSON

✅ Current Strengths:

  1. Universal ESP32 Support: Works on all ESP32 variants (S2, S3, C3, P4) without variant-specific guards
  2. Robust 3-Tier Fallback:
    • PSRAM first (if available)
    • 32-bit IRAM second (ESP32's "secret" area)
    • Default RAM last
  3. Zero-Initialized Memory: Uses calloc_prefer for safer behavior
  4. Type-Smart Alignment: Compile-time check for 4-byte multiples
  5. Standards Compliant: Full C++11/C++14 allocator with rebind, operators
  6. Overflow Protection: Checks allocation size before attempting
  7. Cross-Platform: Fallback to std::vector on non-ESP32 platforms

📝 Code Quality Observations:

Memory Strategy:

  • ✅ Simplified from nested conditionals to clean #if defined(ESP32)
  • ✅ Smart use of MALLOC_CAP_DEFAULT as final fallback (P4 compatible)
  • ✅ Exploits 32-bit IRAM area via MALLOC_CAP_INTERNAL | MALLOC_CAP_32BIT

Implementation Details:

  • ✅ Consistent overflow checking in both allocate() and reallocate()
  • ✅ Proper use of heap_caps_realloc_prefer() (fixed from previous bug)
  • ✅ Clear documentation explaining non-standard reallocate() method
  • ✅ Proper const-correctness with noexcept on operators

🎯 Final Assessment:

No critical issues remain. The implementation is production-ready with:

  • Excellent cross-platform compatibility
  • Intelligent memory allocation strategy
  • Proper error handling
  • Clear documentation
  • Standards-compliant design

The allocator successfully balances optimization (32-bit IRAM access) with robustness (3-tier fallback) while maintaining compatibility across all ESP32 variants and non-ESP32 platforms.

@softhack007 softhack007 merged commit 2d70d2d into OO_baseline Jan 20, 2026
1 check passed
@softhack007 softhack007 deleted the copilot/add-custom-allocator-psram branch January 20, 2026 12:48
softhack007 pushed a commit that referenced this pull request Jan 21, 2026
Add custom PSRAM allocator for ESP32 lookup tables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

add custom allocator to make sure that vector arrays go into PSRAM when availeable

5 participants