Fix PostgreSQL NUL byte errors and ONNX GPU memory allocation failures #244

Copilot · 2025-12-24T21:48:01Z

Two issues causing analysis instability: (1) PostgreSQL rejecting artist names containing NULL bytes from corrupted metadata, (2) ONNX Runtime BFCArena allocation failures from GPU memory fragmentation accumulating across track analyses.

Changes

New utility module tasks/memory_utils.py:

sanitize_string_for_db() - strips NULL bytes and control characters before database writes
cleanup_cuda_memory() - forces CUDA cache clearing and garbage collection
cleanup_onnx_session() - explicit session disposal with immediate GC
handle_onnx_memory_error() - detects allocation errors, triggers cleanup, enables retry
SessionRecycler - recreates sessions every N tracks to prevent cumulative leaks

Database string sanitization:

Applied to app_helper_artist.py before all artist name insertions
Example: "Tyler\x00YoungBoy\x00Ty Dolla" → "TylerYoungBoyTy Dolla"

Memory management in analyzers:

tasks/clap_analyzer.py, tasks/mulan_analyzer.py - error detection with single retry after cleanup
tasks/analysis.py - session recycling every 20 tracks, CUDA cleanup after albums

Usage:

from tasks.memory_utils import handle_onnx_memory_error, cleanup_cuda_memory

try:
    outputs = session.run(None, inputs)
except Exception as e:
    if handle_onnx_memory_error(e, "segment_processing"):
        cleanup_cuda_memory(force=True)
        outputs = session.run(None, inputs)  # Retry once
    else:
        raise

Root cause: GPU memory fragmenting from incomplete cleanup between inferences. BFCArena allocator fails to find contiguous blocks despite sufficient total memory. Python GC timing doesn't align with CUDA memory lifecycle. Solution: explicit cleanup at strategic points + periodic session recreation.

Documentation: See docs/MEMORY_FIXES.md for implementation details and monitoring guidance.

Original prompt

In @NeptuneHub/AudioMuse-AI in the devel branch we added some fix to try resolving some memory allocation error on clap model when running on gpu, but different user still sharing thet this is keep randomly happen snd even if then the analyssi o nother song continue is annoying. Last user shared just today this:

Seems to be working. These errors appear every so often but not breaking the analyzing process:

Failed to upsert artist mapping for 'Tyler, The CreatorYoungBoy Never Broke AgainTy Dolla $ign': A string literal cannot contain NUL (0x00) characters.

2025-12-23 18:57:48.061508042 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running FusedConv node. Name:'Conv__132' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 1125366016

DO RESEARCH On INTERNET
Analyize the error deeply! Try to think different and understood WHY this error leep happening. Is multiple reason ? The name of the file or some metadata that keep something crashing and then the memroy is not free up ? Peobably we are focusing kn the effect and not on the cause !

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Created memory_utils.py with CUDA cleanup and session recycling - Added string sanitization to prevent NUL byte errors in PostgreSQL - Updated app_helper_artist.py to sanitize artist names - Enhanced CLAP analyzer with memory error handling and retry logic - Added explicit CUDA memory cleanup after operations - Implemented session recycling to prevent memory accumulation - All changes tested and verified Co-authored-by: NeptuneHub <171392533+NeptuneHub@users.noreply.github.com>

- Enhanced mulan_analyzer.py with memory error handling - Added explicit CUDA cleanup after operations - Improved session cleanup in unload_mulan_model() - Added retry logic for memory allocation errors - Cleanup intermediate tensors during processing - All syntax validated and tests pass Co-authored-by: NeptuneHub <171392533+NeptuneHub@users.noreply.github.com>

- Created docs/MEMORY_FIXES.md with detailed explanations - Documented root causes of both issues - Provided usage examples and testing procedures - Added monitoring guidance and future improvements - Complete reference for understanding the fixes Co-authored-by: NeptuneHub <171392533+NeptuneHub@users.noreply.github.com>

- Created PR_SUMMARY.md with quick overview - Includes code examples and usage - Testing status and expected impact - Review checklist for maintainers - Ready for final review and testing Co-authored-by: NeptuneHub <171392533+NeptuneHub@users.noreply.github.com>

NeptuneHub · 2025-12-24T22:09:40Z

@copilot I would address this PR against devel branch

Copilot · 2025-12-24T22:11:32Z

@copilot I would address this PR against devel branch

I understand you'd like this PR targeted against the devel branch. However, I don't have access to modify the PR's target branch directly - that needs to be changed through the GitHub UI.

The current branch is based on commit 0e675db (tag v0.8.3) which shows "Merge pull request #241 from NeptuneHub/devel".

Could you please:

Update the PR target branch to devel through GitHub, OR
Let me know if you'd like me to create a new branch based on devel if that branch exists in the repository?

Note: The devel branch is not currently available in this clone - only origin/copilot/fix-memory-allocation-errors is present.

Copilot AI self-assigned this Dec 24, 2025

Initial plan

566f6b0

Copilot AI assigned NeptuneHub Dec 24, 2025

Copilot started work on behalf of NeptuneHub December 24, 2025 21:48 View session

Copilot AI and others added 4 commits December 24, 2025 21:57

Add PR summary document

f43df59

- Created PR_SUMMARY.md with quick overview - Includes code examples and usage - Testing status and expected impact - Review checklist for maintainers - Ready for final review and testing Co-authored-by: NeptuneHub <171392533+NeptuneHub@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Fix memory allocation errors on clap model for GPU~~ Fix PostgreSQL NUL byte errors and ONNX GPU memory allocation failures Dec 24, 2025

Copilot AI requested a review from NeptuneHub December 24, 2025 22:04

Copilot finished work on behalf of NeptuneHub December 24, 2025 22:04

Copilot started work on behalf of NeptuneHub December 24, 2025 22:10 View session

Copilot finished work on behalf of NeptuneHub December 24, 2025 22:12

Copilot AI mentioned this pull request Dec 24, 2025

Fix PostgreSQL NUL byte errors and ONNX GPU memory allocation failures #245

Merged

NeptuneHub closed this Dec 24, 2025

NeptuneHub deleted the copilot/fix-memory-allocation-errors branch December 25, 2025 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix PostgreSQL NUL byte errors and ONNX GPU memory allocation failures #244

Fix PostgreSQL NUL byte errors and ONNX GPU memory allocation failures #244

Uh oh!

Copilot AI commented Dec 24, 2025 •

edited

Loading

Uh oh!

NeptuneHub commented Dec 24, 2025

Uh oh!

Copilot AI commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix PostgreSQL NUL byte errors and ONNX GPU memory allocation failures #244

Fix PostgreSQL NUL byte errors and ONNX GPU memory allocation failures #244

Uh oh!

Conversation

Copilot AI commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

NeptuneHub commented Dec 24, 2025

Uh oh!

Copilot AI commented Dec 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 24, 2025 •

edited

Loading