Skip to content

Copyover (Hot-reload) for GoMud #405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

MorquinDevlar
Copy link
Contributor

@MorquinDevlar MorquinDevlar commented Jun 20, 2025

Summary

This PR implements a copyover (hot-reload) system for the GoMud server that preserves game state during server restarts.

Changes

  • Added state preservation for 9 game subsystems
  • Implemented event queue serialization using reflection
  • Added JavaScript VM state tracking
  • Created file descriptor preservation for network connections
  • Resolved import cycles by moving logic to dedicated copyover package

Technical Details

  • State Machine: Implements 6 states: Idle, Scheduled, Building, Transferring, Recovering, Aborted
  • File Organization: Centralized integrations in internal/copyover/integrations.go
  • State Files: Each subsystem writes temporary JSON files during copyover
  • Network Handling: Preserves listener file descriptors through environment variables

Subsystems

  • Combat: Saves aggro states, damage tracking, mob instance counters
  • Rooms: Saves active mutators, visitor lists
  • Events: Serializes pending events with type information
  • Scripts: Tracks VM IDs for rooms, mobs, items, spells, buffs
  • Economy: Saves shop inventory quantities
  • Parties: Saves group memberships
  • Pets: Saves charm relationships
  • Quests: Saves quest-prefixed timers
  • SpellBuff: Saves active spell casts

Module System

  • Modules implement SaveState() and RestoreState() methods
  • Module registration handled through RegisterModule()
  • Added copyover integration to auction module

Added documentation for
- API
- Combat
- Quests
- Economy
- Pets
- Spells
- Modules
…auto connect players back to their state

- Implemented copyover command for admins to restart server without
disconnecting players
- Preserved TCP connections across process restart using file descriptor
inheritance
- Added automatic build integration - rebuilds server before copyover
- Saved all player states and positions before restart
- Created templated messages for all copyover phases with ANSI color
support
- Added build number tracking to verify successful updates
- Skipped login commands when recovering from copyover
- Handled connection recovery with proper user session restoration

The copyover system allows live code updates to be deployed without
disrupting active players. Currently supports Unix-like systems (Linux,
macOS).

- `copyover now` - Immediate restart
- `copyover [seconds]` - Restart with countdown
- `copyover test` - Test system readiness
- Added event queue serialization/deserialization using reflection to
preserve pending events
- Implemented JavaScript VM state tracking for script preservation
across copyovers
- Added active mutator preservation to maintain room states during
hot-reload
- Created pet/charm relationship persistence to keep user-mob
connections intact
- Implemented quest timer preservation to maintain quest progress across
restarts
- Added spell cast state preservation for in-progress spell casting
- Fixed state machine transitions to properly handle idle->recovering
state
- Resolved import cycles by centralizing copyover logic in dedicated
package
- Fixed network listener ordering to ensure consistent FD assignment
- Consolidated all subsystem integrations into single integrations.go
file
- Added comprehensive error handling and logging throughout the system
- Cleaned up codebase by removing redundant code and fixing naming
inconsistencies

The copyover system now preserves all critical game state during
hot-reloads, allowing seamless server updates without disrupting player
experience.
- Fixed zombie user reconnection after copyover by correcting userId
variable in LoginUser
- Removed special prompt priority handling that caused prompts to appear
before room text
- Added proper telnet negotiation commands for recovered connections
(WILL ECHO, WONT LINE_MODE, DO NAWS)
- Improved zombie connection cleanup in RemoveZombieConnection to
prevent stale mappings
- Increased worker startup delay to 2 seconds to ensure event system is
ready
- Fixed copyover_recovery flag timing to not clear prematurely

The copyover system now correctly displays prompts after room
descriptions and allows users to reconnect if they disconnect
post-copyover.
@MorquinDevlar MorquinDevlar requested a review from Volte6 as a code owner June 20, 2025 11:39
Integrate copyover with the global mutex (util.LockMud/UnlockMud) to
ensure atomic state capture and prevent race conditions during hot
reload.

  Key changes:
  - Execute copyover outside event loop to avoid deadlock with mutex
  - Separate build phase from critical copyover operations to minimize
lock time
  - Add deferred recovery queue to eliminate 2-second sleep workaround
  - Ensure all template messages display correctly for immediate
copyover

  The mutex integration provides:
  - Guaranteed consistent state capture (no mid-update snapshots)
  - Protection against concurrent modifications during copyover
  - Proper synchronization with combat and other game systems
  - Atomic save operations for all player data

Performance impact is minimal as the build phase (longest operation) now
runs outside the mutex lock, reducing game freeze from ~5s to ~2s.
- Reduce state machine from 10 states to 5 (Idle, Scheduled,
Preparing, Executing, Recovering)
  - Replace polling-based scheduling with timer-based approach
  - Consolidate multiple entry points into single Copyover() method with
options
  - Fix mutex deadlock by using event system for copyover initiation
  - Fix state file creation and persistence issues
  - Fix connection recovery by preserving isRecovering flag until
completion
  - Move build phase before mutex acquisition to reduce game freeze time
  - Switch from full rebuilds to incremental builds during copyover
  - Fix executable name resolution (use go-mud-server instead of
WillowdaleMUD)
  - Add comprehensive logging for debugging copyover flow
  - Remove duplicate progress tracking mechanisms
  - Eliminate dual mutex pattern in favor of single mutex approach
  - Update world.go to handle pending copyover via event system
  - Add missing copyover-cancelled.template files
  - Fix recovery completion timing to ensure connections are restored
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant