A VS Code extension that integrates with your local LLM (Ollama, LM Studio, vLLM) to provide intelligent code assistance, autonomous file operations, and chat capabilities directly in your editor.
π Contributing: See CONTRIBUTING.md for development guide.
- π€ Local LLM Chat - Chat with your local LLM without sending data to external servers
- π Agent Mode Commands - Autonomous file operations:
/read <path>- Read files from your workspace/write <path> [prompt]- Generate content and write to files via LLM/suggestwrite <path> [prompt]- LLM suggests changes, you approve before writing
- βοΈ Fully Configurable - Customize endpoint, model, temperature, max tokens, timeout
- π¬ Conversation Context - Maintains chat history for coherent multi-turn conversations
- π Quick Access - Open chat with a single click from the status bar
- π 100% Private - All processing stays on your machine
- β‘ Streaming Support - Real-time token streaming for responsive UX
- β Production-Ready - Comprehensive error handling, type safety, test coverage
Chat window showing /git-commit-msg and /git-review commands in action. The interface displays:
- Interactive chat messages with streaming responses
- Git integration commands for autonomous commit message generation and code review
- Light gray text styling for optimal readability in dark themes
- Real-time command execution with status feedback
v1.0.0 - First Stable Release
- β 23 commits - Clean, atomic git history showing full development progression
- β 92 tests - 100% passing (36 extension + 33 llmClient + 23 gitClient)
- β TypeScript strict mode - 0 type errors, full type safety
- β 4 core modules - extension, llmClient, gitClient, webviewContent
- β Published to VS Code Marketplace - v1.0.0 stable release
- β Production-Ready - Comprehensive error handling and documentation
Features included:
- Chat interface with streaming support
- File operations (
/read,/write,/suggestwrite) - Git integration (
/git-commit-msg,/git-review) - Performance optimizations (token buffering, DOM batching)
- Monochrome UI with WCAG AA accessibility
- Comprehensive error handling
Ready for:
- Portfolio showcase - professional-grade code
- Production use - tested and optimized
- Extension by others - clear architecture and test coverage
- Interview discussion - full git history and talking points
You need one of:
Ollama (Recommended)
ollama run mistral
# Server at: http://localhost:11434LM Studio
- Download: https://lmstudio.ai
- Start local server on: http://localhost:8000
vLLM
python -m vllm.entrypoints.openai.api_server \
--model mistral-7b-instruct-v0.2 \
--port 11434From VS Code Marketplace (Easiest):
code --install-extension odanree.llm-local-assistantOr search for "LLM Local Assistant" in VS Code Extensions marketplace: https://marketplace.visualstudio.com/items?itemName=odanree.llm-local-assistant
See docs/INSTALL.md for detailed platform-specific setup, troubleshooting, and development instructions.
- Open VS Code Extensions (
Ctrl+Shift+X) - Search for "LLM Local Assistant"
- Click "Install"
- Reload VS Code
- Download
llm-local-assistant-1.0.0.vsixfrom Latest Release - In VS Code, run:
code --install-extension llm-local-assistant-1.0.0.vsix- Or open Command Palette (
Ctrl+Shift+P) β "Extensions: Install from VSIX"
- Or open Command Palette (
- Reload VS Code
- Install & Compile
npm install
npm run compile
# Or development watch mode:
npm run watch- Launch in Debug Mode
- Press
F5in VS Code to open debug window with extension loaded
- Press
Open VS Code Settings (Ctrl+,) and set:
{
"llm-assistant.endpoint": "http://localhost:11434",
"llm-assistant.model": "mistral",
"llm-assistant.temperature": 0.7,
"llm-assistant.maxTokens": 2048,
"llm-assistant.timeout": 30000
}For custom ports:
{
"llm-assistant.endpoint": "http://127.0.0.1:9000"
}Click LLM Assistant in status bar β Run "Test Connection" command
Simply type messages and press Enter to chat with your LLM.
-
/read <path>- Read and display file contents/read src/main.ts -
/write <path> [prompt]- Generate file content via LLM and write to disk/write src/greeting.ts write a TypeScript function that greets usersIf no prompt provided, uses: "Generate appropriate content for this file based on its name."
-
/suggestwrite <path> [prompt]- LLM suggests changes, you review and approve before writing/suggestwrite src/config.ts add validation for the API endpoint
-
/git-commit-msg- Generate commit message from staged changes/git-commit-msgReads all staged diffs, analyzes changes, and generates a conventional commit message following the pattern:
<type>(<scope>): <description> -
/git-review- AI-powered code review of staged changes/git-reviewReviews all staged changes, identifies potential issues, suggests improvements, and provides specific feedback.
/help- Show available commands/help
The extension uses a deliberately simple, regex-based command parser instead of a formal CLI framework. Here's why:
- User-Centric: Commands work anywhere in messages -
/read file.tscan appear mid-conversation - Low Overhead: No dependency on heavyweight CLI libraries, keeping bundle size small
- Maintainability: Regex patterns are explicit and easy to audit in code review
- Extensibility: Easy to add new commands (e.g.,
/analyze,/refactor) without architecture changes
Trade-off: Less strict argument validation than formal parsers, but gained flexibility for natural interaction patterns.
The extension supports both streaming and non-streaming responses:
- Streaming (primary): Token-by-token display for real-time feedback
- Non-Streaming (fallback): For servers with streaming limitations (e.g., Ollama on non-standard ports)
Why this matters: Users get responsive, interactive feedback while typing long responses. The UI updates continuously instead of waiting for the full response.
The LLMClient maintains conversation history per-session, not persisted:
private conversationHistory: Array<{ role: string; content: string }> = [];Why:
- Simpler state management without database/file I/O
- Clear semantics: closing the chat panel resets history (expected behavior)
- Reduces complexity for MVP
- Future enhancement: optional persistence to disk/localStorage
Trade-off: Restarting VS Code or closing the chat panel loses context. This is intentional for simplicity; persistent history is a Phase 2 feature.
All user-triggered operations follow this pattern:
try {
const result = await llmClient.sendMessage(userInput);
// Display result
} catch (error) {
// Send user-friendly error message to chat
showError(`Error: ${error.message}`);
}Why: Consistent error propagation, easy to debug, and all errors surface in the chat UI for users to see.
All file operations use VS Code's URI-based workspace.fs API:
const uri = vscode.Uri.joinPath(workspaceFolder, relativePath);
await vscode.workspace.fs.writeFile(uri, encodedContent);Why:
- Cross-platform path handling (Windows \ vs Unix /)
- Respects workspace folder boundaries
- Works with remote development (SSH, Codespaces)
- Triggers VS Code's file watching automatically
- TypeScript strict mode enabled (
strict: truein tsconfig.json) - All code passes type checking: 0 errors, 0 warnings
- Explicit types on public APIs
- Specific error detection for HTTP status codes (404 β model not found, 503 β server busy)
- Helpful error messages guide users to settings or configuration
- Timeout handling with AbortController for clean cancellation
- 52 unit tests covering:
- LLMClient initialization, configuration, API contracts
- Command parsing (regex patterns for /read, /write, /suggestwrite)
- Error scenarios (connection failures, timeouts, invalid endpoints)
- File path validation and resolution
- Message formatting
- Run with:
npm test(100% pass rate)
Three clear extension points for Phase 2:
- New LLM Commands: Add regex pattern + handler in
extension.ts - LLM Client Enhancements: Extend
LLMClientclass with new capabilities - Webview Features: Enhance UI in
webviewContent.ts
See ROADMAP.md for planned enhancements.
| Setting | Type | Default | Description |
|---|---|---|---|
llm-assistant.endpoint |
string | http://localhost:11434 |
LLM server endpoint |
llm-assistant.model |
string | mistral |
Model name |
llm-assistant.temperature |
number | 0.7 |
Response randomness (0-1, higher=creative) |
llm-assistant.maxTokens |
number | 2048 |
Max response length in tokens |
llm-assistant.timeout |
number | 30000 |
Request timeout in milliseconds |
npm run compile # Single build
npm run watch # Auto-rebuild on changes
npm run package # Production bundlenpm test # Run all tests
npm run test:coverage # Coverage report
npm run test:ui # Interactive test UInpm run lint # ESLint validationPress F5 in VS Code to launch extension in debug mode with breakpoints.
See ROADMAP.md for planned features including:
- GitHub Copilot Agent Mode integration
- Persistent conversation history
- Custom system prompts
- Code-aware context injection
- ARCHITECTURE.md - Deep dive into component design
- PROJECT_STATUS.md - Development phase tracking
- QUICK_REFERENCE.md - Developer quick start
- CHANGELOG.md - Version history
- CONTRIBUTING.md - Contribution guidelines
For advanced topics, see /docs/ folder.
"Cannot connect to endpoint"
- Verify LLM server is running and accessible
- Check endpoint URL in settings
- Test manually:
curl http://localhost:11434/api/tags
"Model not found"
- Verify model exists:
ollama list - Download if needed:
ollama pull mistral - Update
llm-assistant.modelsetting
"Request timeout"
- Increase
llm-assistant.timeout(default 30000ms) - Try shorter prompts or smaller models
- Check server logs for errors
Slow responses?
- Reduce
maxTokensfor shorter responses - Try a smaller/faster model
- Ensure server has adequate resources
β 100% Local & Private
- Zero external API calls or cloud dependencies
- Your code and conversations never leave your machine
- Works completely offline after model is downloaded
- No telemetry or tracking
MIT License - See LICENSE file for details
Local β’ Private β’ Offline-First AI Assistant for VS Code π
