Skip to content

[bug] Race condition with elicitationEventsChannel access #1853

@docker-agent

Description

@docker-agent

🟡 medium - bug

File: pkg/runtime/runtime.go (line 894)

Code

func (r *LocalRuntime) elicitationHandler(ctx context.Context, req *mcp.ElicitParams) (tools.ElicitationResult, error) {
	slog.Debug("Elicitation request received from MCP server", "message", req.Message)

	// Get the current events channel
	r.elicitationEventsChannelMux.RLock()
	eventsChannel := r.elicitationEventsChannel
	r.elicitationEventsChannelMux.RUnlock()

	if eventsChannel == nil {
		return tools.ElicitationResult{}, fmt.Errorf("no events channel available for elicitation")
	}

	slog.Debug("Sending elicitation request event to client", "message", req.Message, "mode", req.Mode, "requested_schema", req.RequestedSchema, "url", req.URL)
	slog.Debug("Elicitation request meta", "meta", req.Meta)

	// Send elicitation request event to the runtime's client
	eventsChannel <- ElicitationRequest(req.Message, req.Mode, req.RequestedSchema, req.URL, req.ElicitationID, req.Meta, r.currentAgent)

	// Wait for response from the client
	select {
	case result := <-r.elicitationRequestCh:
		return tools.ElicitationResult{
			Action:  result.Action,
			Content: result.Content,
		}, nil
	case <-ctx.Done():
		slog.Debug("Context cancelled while waiting for elicitation response")
		return tools.ElicitationResult{}, ctx.Err()
	}
}

Problem

The elicitationEventsChannel is read with a read lock, but then it's used outside the critical section. If clearElicitationEventsChannel or setElicitationEventsChannel is called after the RLock() and RUnlock() but before the eventsChannel <- ... line, eventsChannel could become nil or point to a different channel, leading to potential incorrect behavior (sending to a stale channel).

Although the current implementation is safe from panicking due to the channel never being closed and the local eventsChannel variable holding a reference, it's a subtle race condition that can lead to events being sent to an unintended channel if it were replaced.

Suggested Fix

The read lock should ideally be held for the entire duration of the channel usage, or a mechanism to check the channel's validity (e.g., if it's not nil and is still the "current" channel) should be in place immediately before the send operation.

A more robust fix would involve:

  1. Extending the critical section to include the channel send operation
  2. Re-acquiring a lock before the send if necessary
  3. Using a select with a default case to handle a full channel and avoid blocking

For simply preventing sending to a stale channel, ensuring the channel is valid under a lock is key. A practical improvement is to guard the send operation with a write lock if setElicitationEventsChannel and clearElicitationEventsChannel also use a write lock. This would prevent the channel from being modified while a send is in progress.


Found by nightly codebase scan

Metadata

Metadata

Assignees

No one assigned

    Labels

    automatedIssues created by cagentkind/bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions