Skip to content

Conversation

@ricky-chaoju
Copy link
Contributor

Summary

  • Replaced LLM Agent-based auto-tuning with Bayesian Optimization (Optuna TPE)
  • Added MCP Agent Chat for natural language interaction with the platform
  • Extended MCP Server with new tools (benchmark, web search)

Changes

Auto-Tuning

  • New BayesianTuningService using Optuna TPE sampler for efficient hyperparameter search
  • Supports vLLM and SGLang engine comparison
  • Real-time logs display during tuning process
  • Fixed SGLang container startup command
  • Simplified UI (removed manual parameter inputs)

- Add BayesianTuningService using Optuna TPE sampler
- Fix SGLang container startup command
- Add real-time logs to tuning jobs
- Simplify AutoTuning UI
- Add Agent Chat panel with streaming support
- Add MCP client for backend
- Add benchmark and web search tools
- Refactor chat panel components
@ricky-chaoju ricky-chaoju requested a review from Copilot January 27, 2026 14:16
@ricky-chaoju ricky-chaoju merged commit 1ec9cb5 into main Jan 27, 2026
8 checks passed
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Bayesian Optimization–based auto-tuning and an MCP-powered “agent chat” experience, plus new MCP server tools (web search, benchmarking) to support more autonomous platform interaction.

Changes:

  • Introduces Optuna TPE-based Bayesian tuning service and wires auto-tuning runner to it.
  • Adds MCP agent chat (SSE streaming) with conversation persistence and a new chat panel “Agent/Chat” mode.
  • Extends MCP server with web_search, run_benchmark, and tuning job tools; updates deploy tool args (name, gpu_indexes, backend).

Reviewed changes

Copilot reviewed 29 out of 29 changed files in this pull request and generated 17 comments.

Show a summary per file
File Description
mcp-server/src/tools/webSearch.ts Adds SearxNG-based web search tool and an LLM-config–focused helper search.
mcp-server/src/tools/benchmark.ts Adds benchmark + tuning job helper functions for MCP tools.
mcp-server/src/index.ts Registers new MCP tools and updates deploy tool schema/handler to use gpu_indexes + backend.
mcp-server/src/formatters.ts Removes emoji-based status indicators in formatted outputs.
mcp-server/src/client.ts Extends client with deploy args changes + auto-tuning/benchmark endpoints.
frontend/src/contexts/ChatPanelContext.tsx Adds global chat panel context to open panel and send agent messages from anywhere.
frontend/src/components/chat/ChatInput.tsx Tweaks placeholder/disabled behavior during streaming.
frontend/src/components/chat-panel/useAgentChat.ts Implements agent chat SSE client with step/tool event tracking and conversation loading.
frontend/src/components/chat-panel/index.ts Exports agent chat view/hook and related types.
frontend/src/components/chat-panel/TuningJobView.tsx Enhances tuning job view with progress, benchmark result display, and better scrolling behavior.
frontend/src/components/chat-panel/ChatPanel.tsx Adds “Agent/Chat” mode switch; integrates agent chat view and context registration.
frontend/src/App.tsx Wraps app in ChatPanelProvider and removes tuning-job storage event handling.
backend/requirements.txt Adds Optuna dependency.
backend/app/services/tuning_agent.py Updates agent rules + progress/status updates; tracks benchmark results; tightens deployment stop behavior.
backend/app/services/mcp/types.py Adds MCP client/JSON-RPC and tool result type definitions.
backend/app/services/mcp/client.py Implements MCP stdio JSON-RPC client + a simple client pool.
backend/app/services/mcp/agent.py Implements MCP-driven agent that streams reasoning/tool execution events.
backend/app/services/mcp/init.py Exposes MCP client/agent public API.
backend/app/services/deployer.py Fixes SGLang startup command by explicitly running sglang.launch_server.
backend/app/services/bayesian_tuner.py Adds Bayesian auto-tuning implementation (Optuna TPE) with deployment+benchmark loop and logging.
backend/app/schemas/tuning.py Extends tuning schemas with tuning_config, richer progress fields, and logs.
backend/app/models/tuning.py Extends TuningJob model with tuning_config, conversation_id, and logs.
backend/app/models/conversation.py Extends Conversation/Message models for agent conversations and tool-call metadata.
backend/app/database.py Adds SQLite migrations for new conversation/message/tuning columns (partial).
backend/app/api/auto_tuning.py Adds log parsing to responses; switches auto-tuning runner to Bayesian tuning.
backend/app/api/agent.py Adds SSE agent chat API + conversation CRUD endpoints.
backend/app/api/init.py Registers /agent routes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

raise HTTPException(status_code=400, detail="Deployment is not running")

worker = deployment.worker
llm_base_url = f"http://{worker.host}:{deployment.port}/v1"
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worker.host is not a field on the Worker model/schema (workers expose address/effective_address). This will raise an AttributeError when using provider="system". Build the LLM base URL from worker.effective_address (or worker.address.split(':')[0]) instead.

Suggested change
llm_base_url = f"http://{worker.host}:{deployment.port}/v1"
effective_address = getattr(worker, "effective_address", None)
if effective_address:
llm_base_url = f"{effective_address.rstrip('/')}/v1"
else:
host = (worker.address or "").split(":")[0]
llm_base_url = f"http://{host}:{deployment.port}/v1"

Copilot uses AI. Check for mistakes.
Comment on lines +593 to +601
# Count total
count_query = (
select(Conversation)
.where(Conversation.user_id == current_user.id)
.where(Conversation.conversation_type == conversation_type)
)
count_result = await db.execute(count_query)
total = len(count_result.all())

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Total conversation count is computed by fetching all matching rows and doing len(...), which is inefficient for large histories. Use a SQL COUNT(*) (e.g., select(func.count()).select_from(...)) to compute total without loading all rows.

Copilot uses AI. Check for mistakes.
Comment on lines +115 to 116


Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TuningJob model now includes a logs JSON column and the tuner writes to self.job.logs, but the SQLite migration list here never adds a logs column to tuning_jobs. On upgraded databases this will cause runtime SQL errors when tuning tries to persist logs. Add a migration step to ALTER TABLE tuning_jobs ADD COLUMN logs JSON (and any other new tuning_job columns that must exist).

Suggested change
# Migration: Add logs to tuning_jobs (for tuning job log persistence)
if not await column_exists("tuning_jobs", "logs"):
logger.info("Adding 'logs' column to tuning_jobs table...")
await conn.execute(text("ALTER TABLE tuning_jobs ADD COLUMN logs JSON"))
logger.info("'logs' column added!")

Copilot uses AI. Check for mistakes.
let output = `## Benchmark Results\n\n`;
output += `**Deployment:** ${deployment.name}\n`;
output += `**Model:** ${deployment.model?.name || "Unknown"}\n`;
output += `**Engine:** ${deployment.engine || "vllm"}\n\n`;
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment objects from the backend use the backend field (see DeploymentResponse.backend), but this formatter references deployment.engine. As a result, the displayed engine will be incorrect/always fall back. Use deployment.backend when printing the engine/backend.

Suggested change
output += `**Engine:** ${deployment.engine || "vllm"}\n\n`;
output += `**Engine:** ${deployment.backend || "vllm"}\n\n`;

Copilot uses AI. Check for mistakes.
}

// Send the message (async, but we don't wait)
fn(message, model);
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sendAgentMessage invokes the async fn(message, model) without awaiting or handling rejections. If the send fails, this can surface as an unhandled promise rejection in the browser. Consider void fn(...).catch(...) (and optionally surface an error toast) to ensure failures are handled.

Suggested change
fn(message, model);
// Attach a catch handler to avoid unhandled promise rejections.
void fn(message, model).catch((error) => {
console.error("ChatPanel: Failed to send agent message", error);
});

Copilot uses AI. Check for mistakes.
if first_token_time is None:
first_token_time = time.perf_counter()
token_count += 1
except json.JSONDecodeError:
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
"tokens": token_count,
"total_time": end - start,
}
except Exception:
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
self._read_task.cancel()
try:
await self._read_task
except asyncio.CancelledError:
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except asyncio.CancelledError:
except asyncio.CancelledError:
# Expected when cancelling the read task during client cleanup; safe to ignore.

Copilot uses AI. Check for mistakes.
logger.error(f"Error processing response: {e}")

except asyncio.CancelledError:
pass
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
pass
# Expected during normal shutdown when the response reader task is cancelled.
logger.debug("Response reader task cancelled")
return

Copilot uses AI. Check for mistakes.
Comment on lines +876 to +877
except Exception:
pass
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
pass
except Exception as inner_exc:
logger.warning(
f"Failed to mark deployment {deployment_id} as stopped after stop failure: {inner_exc}"
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants