-
Notifications
You must be signed in to change notification settings - Fork 116
feat: config hot-reload #106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this looks like it's in the right direction. I think the right design is for all reloading due to configuration changes be outside of ProxyManager. It is instantiated with a New(config *Config) and I want to keep the contract that the config comes in from the outside and is never modified inside.
|
Thanks for the review @mostlygeek, I've updated the PR to move all config reload logic out of ProxyManager. Key changes:
This maintains better separation of concerns while preserving all functionality. The changes also made the reload process more robust by properly coordinating server lifecycle. Let me know if you'd like me to explain any part in more detail or if there's any further changes you'd like made. |
|
It's looking good. I'm just pushed a big PR #109 which changes llama-swap.go a bit. I'm planning to land that first. I think there may be some small merge conflicts. Since ProxyManager's contract hasn't changed that part will be fine :). Right now Maybe something like: proxyManager := proxy.New(config)
srv := &http.Server{
Addr: ":8080",
Handler: proxyManager,
}
go func() {
// service connections
if err := srv.ListenAndServe(); err != nil && err != http.ErrServerClosed {
log.Fatalf("listen: %s\n", err)
}
}() ProxyManager is an http.Handler. Perhaps it's possible to change srv.Handler gracefully? I haven't looked too much into this. |
|
Yeah that would be nice actually, good thinking. What I'll do is update this branch first, then once you've merged your PR, I'll pull and fix conflicts etc... I'll mark it as draft until that's done :) |
|
OK that was a bit more challenging than I anticipated and pushed the levels of my go skills so I'm sorry if I've overcomplicated anything but this is now working well on my server @mostlygeek |
|
That took me a little bit to review. Most of it looks pretty good. I'm not entirely sure what was going on with test changes but I left comments. BTW: if this is you, thanks for following through 😄 ! |
|
I think this would be a good change to ProxyManager: func NewWithLoggers(
config Config,
muxLogger *LogMonitor,
proxyLogger *LogMonitor,
upstreamLogger *LogMonitor,
) *ProxyManager {
...
}This way:
|
Yeah they gave me a lot of grief after moving the process handling around. I'll have a read through later today.
Yep that's me! |
|
OK I've OK I've rebased from main now. I did make a couple of changes to your logging here: Just because with how the process handler works now, this would cause what looks to be an alarming message every time the config reloads, but I can revert those back if you wish @mostlygeek |
|
Fixed, but no rush to review :) |
Thanks. I'm too tired tonight to review anyways :) |
WalkthroughThe changes introduce live configuration reload support, improved server lifecycle management, and enhanced logging. A new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Server
participant ProxyManager
participant ConfigWatcher
participant OS
User->>Server: Start with --watch-config
Server->>ProxyManager: Initialize with config
Server->>ConfigWatcher: Start watching config file
loop On config file change
ConfigWatcher->>Server: Notify reload channel
Server->>ProxyManager: Gracefully stop old manager
Server->>ProxyManager: Initialize new manager with new config
Server->>Server: Swap in new handler
end
OS-->>Server: Send termination signal
Server->>ProxyManager: Gracefully shutdown
Server->>Server: Shutdown HTTP server
Suggested labels
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (1)
🚧 Files skipped from review as they are similar to previous changes (2)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
proxy/proxymanager.go (1)
209-226: Add clarification on the difference between StopProcesses and Shutdown.The new
StopProcessesmethod safely stops all running upstream processes concurrently, but the distinction between this method and the existingShutdownmethod is not explicitly documented. From the implementation,StopProcessescallsprocessGroup.stopProcesses()whileShutdowncallsprocessGroup.Shutdown(), suggesting different semantics.Add a comment to clarify the difference:
// StopProcesses acquires a lock and stops all running upstream processes. // This is the public method safe for concurrent calls. +// Unlike Shutdown, this method only stops the processes but doesn't perform +// a complete shutdown, allowing for process replacement without full termination. func (pm *ProxyManager) StopProcesses() {llama-swap.go (1)
148-166: Consider adding retry logic for transient config loading errors.Currently, if loading the new configuration fails, the error is logged but no retry is attempted. For transient issues (like a partial write that's quickly corrected), this could result in a missed reload.
Consider adding a simple retry mechanism with backoff for config loading failures:
if event.Name == configPath && event.Has(fsnotify.Write) { // Reset or start the debounce timer if debounceTimer != nil { debounceTimer.Stop() } debounceTimer = time.AfterFunc(debounceDuration, func() { log.Printf("Config file modified: %s, reloading...", event.Name) + // Try up to 3 times with exponential backoff + var newConfig proxy.Config + var err error + for retries := 0; retries < 3; retries++ { + // Load new configuration + newConfig, err = proxy.LoadConfig(configPath) + if err == nil { + break + } + log.Printf("Error loading new config (attempt %d/3): %v", retries+1, err) + if retries < 2 { + time.Sleep(time.Duration(1<<retries) * time.Second) + } + } + if err != nil { + log.Printf("Failed to load new config after retries: %v", err) + return + } - // Load new configuration - newConfig, err := proxy.LoadConfig(configPath) - if err != nil { - log.Printf("Error loading new config: %v", err) - return - } // Create new ProxyManager with new config newPM := proxy.New(newConfig) reloadChan <- newPM log.Println("Config reloaded successfully") }) }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (7)
README.md(4 hunks)go.mod(1 hunks)llama-swap.go(2 hunks)proxy/logMonitor_test.go(1 hunks)proxy/process.go(2 hunks)proxy/proxymanager.go(4 hunks)proxy/proxymanager_test.go(15 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
llama-swap.go (2)
proxy/proxymanager.go (2)
ProxyManager(26-38)New(41-88)proxy/config.go (1)
LoadConfig(90-97)
🪛 LanguageTool
README.md
[uncategorized] ~252-~252: Loose punctuation mark.
Context: ...efault: config.yaml). - --listen: Address and port to listen on (default:...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~253-~253: Loose punctuation mark.
Context: ...on (default: :8080). - --version: Show version information and exit. ...
(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~254-~254: Loose punctuation mark.
Context: ...rmation and exit. - --watch-config: Automatically reload the configuration ...
(UNLIKELY_OPENING_PUNCTUATION)
🪛 golangci-lint (1.64.8)
proxy/logMonitor_test.go
27-27: Error return value of logMonitor.Write is not checked
(errcheck)
28-28: Error return value of logMonitor.Write is not checked
(errcheck)
29-29: Error return value of logMonitor.Write is not checked
(errcheck)
proxy/process.go
359-359: Error return value of p.cmd.Process.Kill is not checked
(errcheck)
🔇 Additional comments (38)
go.mod (1)
6-6: Good addition of fsnotify dependency for configuration hot-reload support.The
fsnotifypackage is an appropriate choice for implementing file system event monitoring to support the configuration hot-reload feature.proxy/logMonitor_test.go (3)
24-24: LGTM: Waitgroup count correctly adjusted for two goroutines.This change properly sets up the test for two separate client goroutines.
26-30: Improved test reliability by writing messages before starting goroutines.Writing the messages before starting the collection goroutines ensures they won't be missed. Good change.
🧰 Tools
🪛 golangci-lint (1.64.8)
27-27: Error return value of
logMonitor.Writeis not checked(errcheck)
28-28: Error return value of
logMonitor.Writeis not checked(errcheck)
29-29: Error return value of
logMonitor.Writeis not checked(errcheck)
31-32: Well-structured concurrent message processing.The refactoring to use dedicated goroutines for each client improves test reliability and clarity, aligning with the implementation changes in the proxy manager.
Also applies to: 42-51
README.md (2)
198-217: Improved syntax highlighting with shell-specific code blocks.Converting generic code blocks to shell-specific code blocks enhances readability by providing proper syntax highlighting.
Also applies to: 234-239, 269-287
249-254: Comprehensive documentation of CLI flags including the new hot-reload functionality.Good addition of the
--watch-configflag documentation with a clear explanation of its purpose and behavior. This properly communicates the new feature to users.🧰 Tools
🪛 LanguageTool
[uncategorized] ~252-~252: Loose punctuation mark.
Context: ...efault:config.yaml). ---listen: Address and port to listen on (default:...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~253-~253: Loose punctuation mark.
Context: ...on (default::8080). ---version: Show version information and exit. ...(UNLIKELY_OPENING_PUNCTUATION)
[uncategorized] ~254-~254: Loose punctuation mark.
Context: ...rmation and exit. ---watch-config: Automatically reload the configuration ...(UNLIKELY_OPENING_PUNCTUATION)
proxy/process.go (3)
263-266: Improved logging clarity during startup phase.Downgrading connection refused and health check error messages to Debug level and marking them as "normal during startup" reduces log noise and prevents users from being alarmed by expected transient conditions.
348-349: Better logging during config reload process.Changing the log level for nil command/process conditions to Debug and indicating they're normal during config reload helps prevent users from being concerned about expected shutdown events.
353-354: Refined logging during shutdown sequences.These changes to log levels properly categorize expected shutdown behavior as debug-level events rather than warnings or errors, which makes logs cleaner and more focused on actual issues.
Also applies to: 358-359, 370-373
proxy/proxymanager_test.go (16)
37-37: Method name correctly updated to match http.Handler interface.The rename from
HandlerFunctoServeHTTPproperly aligns the test with the implementation changes inproxymanager.gowhere the handler now implements the standard Gohttp.Handlerinterface.
75-75: Method name correctly updated to match http.Handler interface.Similar to the previous instance, this correctly uses the renamed method to match the implementation changes.
117-117: Method name correctly updated to match http.Handler interface.Consistent update to use
ServeHTTPinstead of the previousHandlerFuncmethod.
160-160: Method name correctly updated to match http.Handler interface.Another consistent update to use the standard interface method name.
203-203: Method name correctly updated to match http.Handler interface.Consistent renaming to standard interface method name.
254-288: Good addition of test for startup failure scenario.This test case properly verifies that the proxy manager handles process startup failures gracefully by:
- Returning an appropriate 502 Bad Gateway status code
- Including a descriptive error message in the response
- Tracking the process in a failed state
This test is particularly valuable for the hot-reload feature to ensure reliability when loading potentially invalid configurations.
304-304: Method name correctly updated to match http.Handler interface.Consistent update to use standard interface method.
309-309: Method name correctly updated to match http.Handler interface.Another consistent update to use the standard method name.
345-345: Method name correctly updated to match http.Handler interface.Consistently updated method name.
363-363: Method name correctly updated to match http.Handler interface.Consistent update to use
ServeHTTP.
369-369: Method name correctly updated to match http.Handler interface.Another consistent update to use the standard method name.
423-423: Method name correctly updated to match http.Handler interface.Consistently updated to use the standard interface method.
493-493: Method name correctly updated to match http.Handler interface.Consistently updated method name.
504-565: Good reintroduction of UseModelName test functionality.This test properly verifies that the
UseModelNameconfiguration setting correctly overrides the requested model name when proxying requests to upstream services. The test covers both regular JSON requests (/v1/chat/completions) and multipart form data requests (/v1/audio/transcriptions), ensuring comprehensive coverage of this feature.
580-580: Method name correctly updated to match http.Handler interface.Final update to use the standard interface method name.
601-601: Method name correctly updated to match http.Handler interface.Consistent update to use standard interface method.
proxy/proxymanager.go (4)
40-88: Good refactoring of constructor to improve code organization.The extraction of the Gin HTTP engine setup into a separate
setupGinEnginemethod improves the code organization and separation of concerns. This refactoring maintains the same functionality while making the code more maintainable.
90-202: Clean implementation of private setup method.The extracted
setupGinEnginemethod contains all the previously inline setup code with no functional changes, keeping the same routing and middleware configuration. This is a good example of code reorganization that improves readability without changing behavior.
204-207: Correct implementation of standard http.Handler interface.Renaming the handler method to
ServeHTTPproperly implements the standard Gohttp.Handlerinterface, which allows theProxyManagerto be used with standard Go HTTP server implementations. This is a good practice that makes the code more idiomatic and flexible.
228-245: Simplified Shutdown method maintains parallel operation.The
Shutdownmethod has been simplified to focus on stopping all processes under lock while maintaining parallel operation through goroutines and a wait group. This preserves the previous behavior where shutdown operations occur concurrently for all process groups.llama-swap.go (9)
3-18: Good addition of required imports.The added imports support the new functionality:
contextfor shutdown contextlogfor structured loggingpath/filepathfor path handlingtimefor timer operationsfsnotifyfor file watchingAll these imports are directly used and necessary for the new features.
20-24: Good practice to declare version variables.Using package-level variables for version information allows them to be set during build time with
-ldflags, making it easier to track deployed versions.
31-31: Good addition of watch-config flag.Adding a CLI flag for watch-config functionality follows the existing pattern for configuration options and gives users explicit control over enabling the hot-reload feature.
58-68: Well-structured server initialization with channels.The code sets up a proper channel structure for coordinating server lifecycle events:
reloadChanfor signaling config reloadsexitChanfor coordinating graceful shutdownsigChanfor OS signals- Explicitly created HTTP server with the proxy manager as handler
This structure provides clean separation of concerns and proper coordination of different lifecycle events.
70-77: Appropriate server startup with error handling.The server is started asynchronously with proper error handling that closes the exit channel on fatal errors, allowing the main goroutine to detect and respond to server failures.
79-105: Robust config reload and signal handling.The handler goroutine properly manages both config reloads and termination signals:
- For config reloads, it gracefully stops old processes, performs a full shutdown, and updates the server handler
- For termination signals, it gracefully shuts down both the proxy manager and HTTP server with a timeout context
The approach of stopping old processes before updating the handler minimizes potential race conditions during reloads.
107-115: Proper conditional file watching setup.The file watching is only set up if explicitly requested via the flag, and includes robust error handling for path resolution failures.
117-119: Clean main goroutine blocking.Having the main goroutine wait on the exit channel until signaled provides a clean way to keep the program running until an explicit shutdown is triggered.
121-177: Well-implemented config file watching with debouncing.The implementation includes several good practices:
- Proper error handling for watcher creation and setup
- Debouncing of rapid file changes to prevent unnecessary reloads
- Clear logging of file watching status and events
- Clean handling of watcher shutdown with deferred Close
- Proper validation of file system events before triggering reloads
- Error handling during config loading to prevent loading invalid configurations
The 2-second debounce duration is a reasonable default that balances responsiveness with stability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
llama-swap.go (2)
121-189: Well-implemented file watching with debouncing and retry logic.The
watchConfigFileWithReloadfunction effectively monitors the config file and handles reloads with several good practices:
- Debouncing to prevent multiple reloads during rapid changes
- Retry logic with exponential backoff for loading the new config
- Proper error handling throughout the process
Consider standardizing logging approach throughout the codebase.
There's a mix of
fmt.Printfandlog.Printf/Printlnbeing used for logging throughout the file. Consider standardizing on one approach for consistency, preferably using structured logging with appropriate log levels.
98-100: Server shutdown timeout could be configurable.The 5-second timeout for server shutdown is reasonable, but consider making it configurable for environments with different requirements.
- ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second) + const serverShutdownTimeout = 5 * time.Second // Move to top-level const or make configurable + ctx, cancel := context.WithTimeout(context.Background(), serverShutdownTimeout)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
llama-swap.go(2 hunks)proxy/helpers_test.go(1 hunks)proxy/logMonitor_test.go(1 hunks)proxy/process.go(2 hunks)proxy/process_test.go(4 hunks)proxy/proxymanager.go(3 hunks)proxy/proxymanager_test.go(15 hunks)
✅ Files skipped from review due to trivial changes (2)
- proxy/helpers_test.go
- proxy/proxymanager_test.go
🚧 Files skipped from review as they are similar to previous changes (3)
- proxy/process.go
- proxy/logMonitor_test.go
- proxy/proxymanager.go
🔇 Additional comments (13)
proxy/process_test.go (6)
21-25: Improved test configuration comment clarity.The comments have been updated to make it clearer that tests are running with error-level logging rather than debug level. This change improves code readability without altering behavior.
31-33: Good use of specific ports for deterministic testing.Using a fixed port (12901) for the test instance rather than dynamically assigned ports makes the test more deterministic and repeatable. This is a good practice for improving test reliability.
35-35: Updated constructor call to include logger parameters.The constructor now includes debug loggers as parameters, which aligns with the updated
NewProcessfunction signature. This change helps with proper logging during config hot-reloading.
52-58: Added sleep delay and port reassignment to prevent race conditions.This change addresses a potential race condition by:
- Adding a sleep delay after stopping the process to ensure the port is released
- Using a different specific port (12902) for the second test instance
- Explicitly updating the process config to use the new port
These changes improve test reliability by preventing port conflicts that could occur during rapid restart scenarios, which is essential for testing hot-reload functionality.
67-68: Enhanced error message with contextual information.The error message now includes the request URL path and the proxy port, which makes test failures much easier to diagnose. This is a good practice for improving test maintainability.
76-78: Reduced port collision likelihood with explicit port assignment.Using a specific high port (12903) for this test reduces the chance of port conflicts with other tests or processes, improving test stability. The clear comment explains the reasoning behind this change.
llama-swap.go (7)
3-18: Good import organization and new dependency addition.The new imports are appropriately organized and necessary for the hot-reload functionality. The fsnotify package is a good choice for file system event notifications.
20-24: Version variables properly defined.Good practice to define version information variables. These variables are typically set during the build process with
-ldflags.
31-31: Clear flag definition for the new feature.The
--watch-configflag is well-documented and properly defaulted to false, making this feature opt-in.
58-77: Well-structured server setup with proper error handling.The code creates an HTTP server with the proxy manager as its handler and launches it asynchronously. The error handling appropriately closes the exit channel on fatal errors.
79-105: Robust reload and signal handling implementation.This goroutine effectively manages both config reloads and termination signals. The graceful shutdown sequence for the proxy manager is well-implemented, ensuring in-flight requests complete before swapping handlers.
107-115: Conditional file watching with proper error handling.Good implementation of the watch-config option with proper error handling for file path resolution.
167-167: Well-implemented exponential backoff.Good use of exponential backoff for retries when loading the new configuration.
|
I think I've addressed the remaining issues, but if my lack of go skills are causing you more grief than it's worth - I totally understand if you'd rather reject this and do it your own way. By the way - that CodeRabbit bot is good! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
proxy/process.go (3)
122-126: Consider also allowingFailed → Startingto remove an extra state hopAllowing
Failed → Stoppeddefinitely un-bricks a process, but the subsequent request has to perform a second transition (Stopped → Starting) before the restart actually happens.
If you deem a “retry immediately” path desirable, you could acceptFailed → Startingas well. It keeps the state-machine simpler (one transition instead of two) and saves one CAS attempt.case StateFailed: - return to == StateStopped // Allow resetting a failed process + return to == StateStopped || to == StateStarting
263-268: Relying on substring matching for “connection refused” is brittle
strings.Contains(err.Error(), "connection refused")will miss platform-specific phrasings (ECONNREFUSED, translated messages, wrapped errors, etc.).
Usingerrors.Is(Go 1.20+) withsyscall.ECONNREFUSEDor checking for*net.OpErrorprovides a future-proof, locale-independent check.- if strings.Contains(err.Error(), "connection refused") { + var opErr *net.OpError + if errors.As(err, &opErr) && errors.Is(opErr.Err, syscall.ECONNREFUSED) {
349-363: Differentiate “already terminated” from real termination errors
terminateProcess()may return errors unrelated to a process that has already exited (e.g. permission issues). Treating every error as “already terminated” hides actionable failures.-if err := p.terminateProcess(); err != nil { - p.proxyLogger.Debugf("<%s> Process already terminated: %v (normal during shutdown)", p.ID, err) +if err := p.terminateProcess(); err != nil { + if errors.Is(err, os.ErrProcessDone) { + p.proxyLogger.Debugf("<%s> Process already terminated (normal during shutdown)", p.ID) + } else { + p.proxyLogger.Warnf("<%s> Failed to terminate process: %v", p.ID, err) + } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
proxy/process.go(4 hunks)proxy/process_test.go(6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- proxy/process_test.go
🔇 Additional comments (1)
proxy/process.go (1)
418-431: Recover-on-request path looks good – just verifyp.cmdlifecycleResetting
Failed → Stoppedon the first incoming request is a neat way to self-heal.
One corner case: ifp.cmdstill references a defunct process,start()will overwrite it, but the oldWait()goroutine is still sending oncmdWaitChan. This should be safe (buffered channel), yet double-check for:
- Potential goroutine leaks (old
Wait()keeps running forever).- A second send on
cmdWaitChanafter a new command starts, which would dead-lock because the buffer size is 1.If either scenario is possible, you might want to nil out
p.cmdWaitChan(or create a fresh channel) when you reset fromFailed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @sammcj we're almost there.
The main changes are reverting things that are not related, or have unclear rational to the original intent of the PR. The functional changes to process.go/process_test.go are a good examples.
I am getting a bit of review fatigue but the finish line looks close. :)
proxy/process.go
Outdated
| // prevent new requests from being made while stopping or irrecoverable | ||
| currentState := p.CurrentState() | ||
| if currentState == StateFailed || currentState == StateShutdown || currentState == StateStopping { | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a chance here that the upstream os process is still running/resident in memory. Simply resetting the state to StateStopped to have p.start() is likely not enough. I think we'd need something that additionally makes sure it is no longer there with a SIGTERM, and with that doesn't work a SIGKILL.
I have been considering a more robust attempted recovery behaviour for failed processes. That enhancement should be in a different PR not this one.
Can you revert this behaviour change. We can also ensure that this PR stays more focused on the original intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I hit that issue when I was testing my PRs changes (hence my attempt to fix it), I've reverted this now to leave things as they were.
| var ( | ||
| debugLogger = NewLogMonitorWriter(os.Stdout) | ||
| ) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand these changes to the tests. I would suggest reverting them along with the behaviour changes in process.go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Reverted process_test.go (I can no longer make tests pass locally without the change so am relying Github Actions feedback now).
- The only changes left in process.go are to reduce the info level logging noise when handling config hot-reloads where the processes are expected be interrupted and stopped.
I couldn't get the process tests to pass locally without making changes, I found the TestProcess_AutomaticallyStartsUpstream had issues with port allocation.
=== RUN TestProcess_AutomaticallyStartsUpstream
process_test.go:43:
Error Trace: /Users/samm/git/llama-swap/proxy/process_test.go:43
Error: Not equal:
expected: "ready"
actual : "failed"
Diff:
--- Expected
+++ Actual
@@ -1,2 +1,2 @@
-(proxy.ProcessState) (len=5) "ready"
+(proxy.ProcessState) (len=6) "failed"
Test: TestProcess_AutomaticallyStartsUpstream
process_test.go:45:
Error Trace: /Users/samm/git/llama-swap/proxy/process_test.go:45
Error: Not equal:
expected: 200
actual : 502
Test: TestProcess_AutomaticallyStartsUpstream
Messages: Expected status code 200, got 502
process_test.go:46:
Error Trace: /Users/samm/git/llama-swap/proxy/process_test.go:46
Error: "unable to start process: upstream command exited unexpectedly: exit status 1\n" does not contain "testing91931"
Test: TestProcess_AutomaticallyStartsUpstream
process_test.go:59: Expected status code 200, got 503
--- FAIL: TestProcess_AutomaticallyStartsUpstream (0.25s)
As an aside - if it helps to explain my reasoning - I passed the PR diff with and without the process_test changes to a LLM to challenge my assumptions and got it to explain why the tests were failing without changes:
Q: "Do the changes in my PR require this test to change?"
A: "No, your PR's core features (config reloading) do not fundamentally break the Process type's existing contract in a way that necessitates a change to the test's logic to make it pass if the test were perfectly stable.
However, to achieve stable and reliable tests in the presence of your PR (and generally), TestProcess_AutomaticallyStartsUpstream (and other similar tests like TestProcess_WaitOnMultipleStarts) do benefit significantly from more robust port management (i.e., using distinct, hardcoded or carefully allocated ports for each simple-responder instance within the test). This makes the test less dependent on global state and timing.
You might need to explain to the reviewer that the reverted version of TestProcess_AutomaticallyStartsUpstream is inherently flaky due to its port allocation strategy, and the changes you had previously made (using getTestSimpleResponderConfigPort with distinct ports) were to ensure test reliability, not to work around a bug introduced by the config reloading feature. The PR might be making this flakiness more apparent, but the fix lies in making the test more robust."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
|
Thanks @sammcj! 🥳 |
This PR adds a new feature:
It is disabled by default and as per the additions to the readme can be enabled by running llama-swap with the
--watch-configflag.The behaviour is such that it will wait for any in-flight requests to complete before reloading the underlying llama server.
Also added some very minor readme linting fixes while I was in there 😅
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Tests