Skip to content

Conversation

@TheBlackPitcher
Copy link

Overview

This PR implements critical stability improvements for Moonraker on RAM-constrained systems (e.g., 213 MB total memory). It prevents system-wide crashes, enables automatic recovery, and ensures print reliability even during Moonraker issues.

Problem Statement

On systems with limited RAM, Moonraker crashes can cause:

  • System-wide OOM (Out of Memory) events
  • Print failures and potential hardware damage
  • No automatic recovery requiring manual intervention
  • Excessive memory usage from unrestricted WebSocket connections

Solution

Auto-Restart Mechanism

  • Automatic recovery from crashes and OOM events
  • Smart crash detection with sliding window (max 5 crashes per 5 minutes)
  • Clean shutdown detection prevents unnecessary restarts
  • 10-second delay between restart attempts

Graceful Shutdown Handling

  • SIGTERM/SIGINT signal handlers for clean termination
  • 10-second grace period before forced kill
  • Proper resource cleanup

Memory Management

  • 80 MB virtual memory limit (ulimit) prevents system-wide OOM
  • Upload size limit: 1024 MB → 30 MB (safe for memory constraints)
  • Moonraker restarts on memory limit, Klipper unaffected

WebSocket Optimization

  • Connection limit: 50 → 5 (~180 MB potential allocation saved)
  • Ping timeout: 30s → 10s (faster connection cleanup)
  • Debug logging disabled

Safety Features

  • Process Isolation: Klipper runs separately, unaffected by Moonraker issues
  • Print Continuity: Active prints continue during Moonraker restart
  • No Data Loss: Only ~10 second UI downtime during restart
  • Comprehensive Logging: All events logged to app-moonraker.log

Testing

Tested on Anycubic Kobra 3 with 213 MB RAM:

  • ✅ Moonraker restart during active print job
  • ✅ Print continued without interruption
  • ✅ Auto-restart after manual kill
  • ✅ Memory limit enforcement
  • ✅ Graceful shutdown on stop command

Expected Impact

  • System load: 11.75 → <3.0
  • Available RAM: 67 MB → 120+ MB
  • Eliminates system-wide crashes
  • Enables stable long-duration browser connections

Files Changed

  • moonraker.sh: Auto-restart, graceful shutdown, memory limit
  • moonraker.conf: WebSocket limits, upload size, debug logging

@TheBlackPitcher TheBlackPitcher marked this pull request as draft November 30, 2025 10:36
@TheBlackPitcher TheBlackPitcher force-pushed the feature/moonraker-stability-v2 branch from ab99d6b to 5c413bd Compare November 30, 2025 10:38
This commit implements critical stability improvements for systems with
limited RAM (e.g., 213 MB total memory), preventing system-wide crashes
and ensuring print reliability.

Auto-Restart Mechanism:
- Automatic recovery from crashes and OOM events
- Crash counter with 5-minute sliding window (max 5 crashes)
- 10-second delay between restart attempts
- Clean shutdown detection (exit code 0) prevents unnecessary restarts
- Comprehensive logging for troubleshooting

Graceful Shutdown Handling:
- SIGTERM/SIGINT signal handlers for clean process termination
- 10-second grace period before forced kill
- Ensures proper cleanup of resources

Memory Management:
- 80 MB virtual memory limit (ulimit) prevents system-wide OOM
- Moonraker restart on memory limit, keeping Klipper unaffected
- Reduced max_upload_size: 1024 MB -> 30 MB (safe for 80 MB limit)

WebSocket Optimization:
- Connection limit: 50 -> 5 (saves ~180 MB potential allocation)
- Ping timeout: 30s -> 10s (faster connection cleanup)
- Debug logging disabled to reduce overhead

Safety Features:
- Klipper runs in separate process, unaffected by Moonraker issues
- Prints continue during Moonraker restart (~10 second UI downtime)
- No data loss or print interruption on Moonraker crashes

Expected Impact:
- System load reduction: 11.75 -> <3.0
- Available RAM increase: 67 MB -> 120+ MB
- Eliminates system-wide crashes from Moonraker memory issues
- Enables stable long-duration browser connections

Tested on Anycubic Kobra 3 with 213 MB RAM during active print job.
Moonraker restart verified to have no impact on ongoing prints.
@TheBlackPitcher TheBlackPitcher force-pushed the feature/moonraker-stability-v2 branch from 5c413bd to e26f4f7 Compare November 30, 2025 10:41
TheBlackPitcher and others added 6 commits November 30, 2025 21:53
Implements comprehensive memory optimizations to prevent unbounded RAM
growth on severely constrained systems (213 MB total). Combines periodic
garbage collection, aggressive GC settings, and intelligent webcam
resource management.

Memory Manager Component (NEW):
- Periodic explicit garbage collection every 5 minutes
- Memory statistics logging (uses /proc/self/status, no psutil needed)
- Prevents Python memory creep over long runtimes
- Configurable via [memory_manager] section in moonraker.conf

Aggressive Python GC Settings:
- PYTHONGC=1 environment variable enables GC
- PYTHONGCTHRESHOLD="500,5,5" for more frequent collection
- Collects generations more aggressively than default (700,10,10)
- Reduces memory pressure with minimal CPU overhead

Dual-Mode Webcam Streaming (SMART):
- LOW mode (default): 640x480 @ 10 FPS, quality 70
  - For AI features (spaghetti detection, monitoring)
  - Saves 10-15 MB compared to HD mode
- HIGH mode (on-demand): 1280x720 @ 15 FPS, quality 70
  - Activated when clients connect (detected via lsof)
  - Full quality when user actively watches
- Automatic switching every 10 seconds based on active connections
- No user restrictions - full quality available when needed

Expected Impact:
- Periodic GC: 4-10 MB saved (prevents memory creep)
- Aggressive GC: 2-5 MB additional savings
- Webcam dual-mode: 10-15 MB average savings (when not viewing)
- Total: 16-30 MB RAM savings (7-14% of total)
- Prevents unbounded RAM growth without limiting functionality

Files Modified:
- moonraker.conf: Add [memory_manager] section
- moonraker.sh: Copy memory_manager.py, add GC env vars
- memory_manager.py: NEW Moonraker component
- mjpg_monitor.sh: Implement dual-mode with client detection

Tested on Anycubic Kobra 3 with 213 MB RAM.
Preserves full functionality while preventing memory exhaustion.
Fixed two critical issues preventing mjpg_streamer from starting:

1. Missing APP_LOG variable definition
   - Added APP_LOG=$RINKHALS_LOGS/app-mjpg-streamer.log after APP_ROOT
   - Without this, script failed on first log write attempt

2. Missing v4l2-ctl command handling
   - v4l2-ctl command not available on system
   - Added fallback to use preset resolutions (1920x1080, 1280x720, 640x480)
   - Script now works with or without v4l2-ctl utility

Result:
- mjpg_streamer now starts successfully in LOW mode (640x480@10fps)
- Dual-mode switching ready for testing
- Webcam service responding on port 8080

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implemented intelligent webcam streaming with automatic quality management:

**Features:**
1. **Auto-upgrade to HIGH mode (1280x720@15fps):**
   - Detects active clients via netstat ESTABLISHED connections
   - Switches within 10 seconds when webcam is accessed

2. **Auto-downgrade after 5 minutes:**
   - HIGH mode automatically downgrades to LOW mode (640x480@10fps) after 5min
   - Prevents permanent HIGH mode if user forgets to close webcam
   - Saves ~10-15 MB RAM

3. **Smart timer reset:**
   - Timer resets when NEW clients connect (page refresh/new tab)
   - Tracks client_count to detect new connections

4. **No printer state blocking:**
   - Removed printer state check - works during printing too
   - Users can watch prints in HIGH quality

**Technical Changes:**
- Replaced lsof with netstat (BusyBox lsof doesn't support -sTCP:ESTABLISHED)
- Fixed local variable scoping in restart_mjpg_streamer()
- Added comprehensive logging for debugging
- Timeout configurable via HIGH_MODE_TIMEOUT variable (default: 300s)

**Memory Impact:**
- LOW mode: 11 MB
- HIGH mode: 22 MB
- Auto-downgrade ensures max 5min HIGH mode usage

**Testing:**
- Verified client detection works with Mainsail (127.0.0.1 connections)
- Confirmed AUTO upgrade LOW→HIGH on client connect
- Timer system ready for 5-minute timeout test

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed critical restart loop issue where mjpg_streamer constantly switched between HIGH and LOW modes.

**Problem:**
- When switching modes, mjpg_streamer restarts
- During restart, TCP connections briefly drop
- Client check sees "no clients" → switches to LOW
- Clients reconnect → sees clients → switches to HIGH
- Endless loop: HIGH → restart → LOW → restart → HIGH → ...

**Solution - Two anti-flapping mechanisms:**

1. **Grace Period (15 seconds after restart):**
   - Skip all client checks for 15 seconds after any restart
   - Gives clients time to reconnect after mode switch
   - Prevents immediate flip-flop

2. **No-Client-Streak (3 consecutive checks = 30 seconds):**
   - "No clients" must be detected 3 times in a row before downgrade
   - Short connection drops don't trigger mode switch
   - Only genuine "client closed webcam" triggers downgrade

**Technical Changes:**
- Added `last_restart_time` tracking
- Added `no_client_streak` counter
- Added `NO_CLIENT_THRESHOLD=3` (configurable)
- Grace period check: skip if `time_since_restart < 15s`
- Streak logic: increment on no-clients, reset on clients detected

**Result:**
- Stable HIGH mode when webcam open (no more flapping)
- Clean downgrade to LOW only after 30s with no clients
- Memory stable at ~22 MB in HIGH mode

**Testing:**
- Verified no restarts for 30+ seconds with active webcam
- Process PID remains stable
- 2 ESTABLISHED connections maintained

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Reverted mjpg_monitor.sh to original state from develop branch.

The smart switching feature caused instability with restart loops.
User requested to remove webcam quality switching entirely.

Memory optimizations (GC, memory_manager component) remain active.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Replace Python subprocess for env vars with direct shell echo
- Use /proc filesystem instead of 'ps | grep' for PID lookup
- Make shell() logging optional (disabled by default)
- Use 'is not None' instead of '!= None' (PEP 8)

These changes reduce subprocess spawning which saves memory on the
resource-constrained 213 MB RAM system.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@TheBlackPitcher TheBlackPitcher marked this pull request as ready for review December 1, 2025 10:53
Some Rinkhals builds don't include v4l2-ctl, causing mjpg_monitor.sh
to skip all cameras and fail to start mjpg_streamer.

Add a fallback that uses common default resolutions (1280x720, 640x480)
when v4l2-ctl is not available.

Fixes webcam not working on builds without v4l2-ctl.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant