An LSP server providing AI-powered code completion and editing with flexible
model provider support. This project aims for API compatibility with
copilot-language-server, allowing it to work as a drop-in replacement while
providing flexibility in model provider choice. Supports multiple AI providers
(Google, Anthropic, OpenAI, Ollama, LM Studio) for both inline completions and
next-edit suggestions.
ai-lsp implements the same LSP endpoints as copilot-language-server:
Implemented:
textDocument/completion- Inline completions at cursor position (maps to copilot'stextDocument/inlineCompletion)textDocument/copilotInlineEdit- Next-edit suggestions for larger multi-line edits- Configuration via
initializationOptions
Not yet implemented:
textDocument/didFocus- Document focus notificationstextDocument/didShowCompletion- Telemetry when completions are showntextDocument/didPartiallyAcceptCompletion- Telemetry for partial accepts
See docs/json-rpc.md for detailed API documentation.
- Inline Completion: AI-powered code completions at cursor position
- Chat-based completion (default)
- Fill-in-the-Middle (FIM) format for efficient code completion
- Next Edit: Context-aware code edits based on full document context
- Prefix/suffix anchoring for precise, localized edits
- Line number context for better file understanding
- Multiple Providers: Google, Anthropic, OpenAI, Ollama, LM Studio, and OpenAI-compatible APIs
- Flexible Configuration: Per-mode model and prompt customization
- Benchmarking: Compare models and strategies to evaluate performance on your codebase
To install dependencies:
bun installIf you use Nix with flakes enabled:
# Enter development environment with all dependencies
nix develop
# Run the LSP server directly without cloning
nix run github:tommoa/ai-lspThe flake provides several convenience apps for development:
nix run .#test # Run all tests
nix run .#test-unit # Run unit tests only
nix run .#test-e2e # Run e2e tests
nix run .#lint # Check code style
nix run .#typecheck # Type check with tscSee nix flake show for all available apps.
To run the LSP server:
bun startOr directly:
bun run src/index.tsThe LSP server is configured via initializationOptions with the following
structure:
init_options = {
providers = {
-- Provider configuration (e.g., API keys, custom settings)
google = { ... },
anthropic = { ... },
},
-- Global default model used by all modes
model = "google/gemini-flash-latest",
-- Mode-specific configuration (optional)
next_edit = {
-- Optional: use a different model for next-edit
-- model = "anthropic/claude-3-5-sonnet-20241022",
-- Prompt type: "prefix-suffix" (default) or "line-number"
prompt = "prefix-suffix",
},
inline_completion = {
-- Optional: use a different model for inline completions
-- model = "google/gemini-flash-latest",
},
}The next_edit mode generates code edits based on the full document context.
model(optional): Override the global model for this modeprompt(optional): How the LLM receives file context"prefix-suffix"(default): LLM receives compact hints with prefix/suffix anchoring. Use this when you want precise, localized edits."line-number": LLM receives line numbers with full file content. Use this when you want the model to have better context or when prefix/suffix anchoring is unreliable.
The inline_completion mode generates completions at the cursor position. It can use either Fill-in-the-Middle (FIM) format for efficient code completion or chat-based completion.
-
model(optional): Override the global model for this mode -
prompt(optional): How completions are generated"chat"(default): Use chat-based completion (works with any model)"fim": Use Fill-in-the-Middle format (requires FIM-capable models)
Note: Not all model variants support FIM. Base/pretrained models typically support FIM (e.g.,
qwen2.5-coder:3b-base,codellama:7b-code), while instruction-tuned variants often don't (e.g.,qwen2.5-coder:3b,codellama:7b-instruct). If you encounter errors with FIM, try using the base model variant or switch to"chat"mode. -
fim_format(optional): FIM template to use whenprompt = "fim"- Can be a template name:
"openai","codellama","deepseek", or"qwen" - Can be a custom template object with
templateandstopproperties - If not specified, the format will be auto-detected from the model name
- If auto-detection fails, defaults to OpenAI format
- Can be a template name:
ai-lsp supports local model providers like Ollama and LM Studio out of the box.
-
Start Ollama:
ollama serve
-
Pull a code model:
ollama pull <model-name>
-
Configure in your LSP client (example:
examples/ollama-init.lua):init_options = { providers = { ollama = { -- Optional: override default endpoint baseURL = "http://localhost:11434/v1", }, }, model = "ollama/<model-name>", inline_completion = { prompt = "chat", -- or "fim" if your model supports it }, }
FIM Support by Model Variant:
Not all model variants support FIM. Typically, base/pretrained models support FIM while instruction-tuned variants often don't.
Base/pretrained models that support FIM:
codellama:7b-codedeepseek-coder:6.7b-baseqwen2.5-coder:3b-base
Instruction-tuned variants that typically don't support FIM:
codellama:7b-instructdeepseek-coder:6.7b-instructqwen2.5-coder:3b
If a model doesn't support FIM, the server will automatically fall back to
chat-based completion, or you can explicitly set prompt = "chat" in your
configuration.
LM Studio works the same way as Ollama:
init_options = {
providers = {
lmstudio = {
baseURL = "http://localhost:1234/v1", -- LM Studio default port
},
},
model = "lmstudio/<model-name>",
inline_completion = {
prompt = "chat", -- or "fim" if your model supports it
},
}Note: Model names in LM Studio may vary depending on how you've loaded them. Check the LM Studio UI for the exact model identifier.
The fim_format option in inline_completion allows you to specify which
Fill-in-the-Middle (FIM) format template to use. This is useful when
auto-detection doesn't work or when you want to use a custom format.
init_options = {
model = "ollama/codellama",
inline_completion = {
prompt = "fim",
-- Explicitly specify the CodeLlama format
fim_format = "codellama",
},
}If fim_format is not specified, the system will auto-detect based on the
model name:
- Models with
codellama→ CodeLlama format - Models with
deepseek→ DeepSeek format - Models with
qwen→ Qwen format - Everything else → OpenAI format (default)
init_options = {
model = "ollama/codegemma",
inline_completion = {
prompt = "fim",
-- No fim_format specified - will auto-detect or default to OpenAI
},
}For models with non-standard FIM formats, you can define a custom template.
The template uses ${prefix} and ${suffix} placeholders that will be
replaced with the code before and after the cursor:
init_options = {
model = "custom-provider/custom-model",
inline_completion = {
prompt = "fim",
fim_format = {
-- Template with ${prefix} and ${suffix} placeholders
template = "<|start|>${prefix}<|hole|>${suffix}<|end|>",
-- Stop sequences to halt generation
stop = {"<|hole|>", "<|start|>", "\n\n"},
-- Optional: name for debugging
name = "Custom Format",
},
},
}The fim_format custom template object has the following properties:
template(required): String with${prefix}and${suffix}placeholdersstop(required): Array of stop sequences to halt generationname(optional): Human-readable name for debuggingdefaults(optional): Default values for additional placeholders (e.g., Qwen uses${repo_name}and${file_path})
ai-lsp includes comprehensive benchmarking tools to compare models, prompt strategies, and measure performance metrics.
Benchmark next_edit generation to compare models and prompt strategies
(prefix/suffix vs line-number).
# Test both prompt strategies
bun run scripts/benchmark.ts \
--file tests/fixtures/small/simple-refactor.ts \
--models google/gemini-flash-latest
# Test a specific strategy
bun run scripts/benchmark.ts \
--file tests/fixtures/small/simple-refactor.ts \
--models google/gemini-flash-latest \
--approach prefix-suffix
# Compare multiple models
bun run scripts/benchmark.ts \
--file tests/fixtures/small/simple-refactor.ts \
--models google/gemini-flash-latest,anthropic/claude-3-5-sonnet-20241022 \
--approach both# Show colorized diffs with critic scoring
bun run scripts/benchmark.ts \
--file tests/fixtures/small/simple-refactor.ts \
--models google/gemini-flash-latest \
--approach both \
--preview --critic
# Export results for analysis
bun run scripts/benchmark.ts \
--file tests/fixtures/small/simple-refactor.ts \
--models google/gemini-flash-latest \
--runs 10 \
--export-json next-edit-results.json--file <path>- Input file to benchmark (required)--models <m1,m2>- Comma-separated models to test (required)--approach <prefix-suffix|line-number|both>- Prompt strategy (default: both)--runs N- Number of runs per model/approach (default: 3)--concurrency N- Parallel workers (default: 2)--preview- Show colorized diffs of changes--context N- Diff context lines (default: 3, only with --preview)--no-color- Disable colored diff output--critic- Enable critic scoring for quality assessment--critic-model <model>- Model to use for critic (default: first model)--export-json <path>- Export results to JSON file
Benchmark inline completions to compare models and completion strategies (chat vs FIM).
# Test both completion strategies
bun run scripts/inline-benchmark.ts \
--test-cases tests/fixtures/inline-completion-cases.json \
--models google/gemini-flash-latest
# Test a specific strategy
bun run scripts/inline-benchmark.ts \
--test-cases tests/fixtures/inline-completion-cases.json \
--models ollama/codegemma \
--approach fim
# Compare multiple models
bun run scripts/inline-benchmark.ts \
--test-cases tests/fixtures/inline-completion-cases.json \
--models ollama/codegemma,google/gemini-flash-latest \
--approach all# Show completion previews with critic scoring
bun run scripts/inline-benchmark.ts \
--test-cases tests/fixtures/inline-completion-cases.json \
--models google/gemini-flash-latest \
--approach all \
--preview --critic
# Export results for analysis
bun run scripts/inline-benchmark.ts \
--test-cases tests/fixtures/inline-completion-cases.json \
--models ollama/codegemma \
--runs 10 \
--export-json inline-results.json--test-cases <path>- JSON file with test cases (required)--models <m1,m2>- Comma-separated models to test (required)--approach <chat|fim|all>- Completion strategy (default: all)--runs N- Number of runs per model/approach (default: 3)--concurrency N- Parallel workers (default: 2)--preview- Show completion previews--no-color- Disable colored output--critic- Enable critic scoring for quality assessment--critic-model <model>- Model to use for critic (default: first model)--export-json <path>- Export results to JSON file
Both benchmark scripts support local providers like Ollama and LM Studio:
# Next-edit with Ollama
bun run scripts/benchmark.ts \
--file tests/fixtures/small/simple-refactor.ts \
--models ollama/codegemma \
--runs 3
# Inline completion with Ollama (FIM)
bun run scripts/inline-benchmark.ts \
--test-cases tests/fixtures/inline-completion-cases.json \
--models ollama/codegemma \
--approach fim \
--runs 5Note: Provider configuration (baseURL, apiKey) currently works through LSP
initializationOptions only. CLI flags for provider config will be added in a
future update.
After running benchmarks with --export-json, analyze the results:
bun run scripts/analyze-ab-results.ts --results results.jsonThis provides statistical analysis including:
- Latency metrics (mean, median, p95)
- Token usage and costs
- Quality scores (if critic enabled)
- Side-by-side comparison tables
Nix users: All commands below can be run via
nix run .#<app>for reproducible execution. See Installation with Nix for details.
bun startorbun run src/index.ts- Run the LSP serverbun test- Run all testsbun test tests/*.test.ts- Run unit testsbun test tests/e2e/**/*.test.ts- Run end-to-end testsbun test tests/benchmark-*.test.ts- Run benchmark testsbun run lint- Check code stylebun run lint:fix- Auto-fix linting issuesbun run format- Format code with Prettierbunx tsc --noEmit- Type-check code
To verify TypeScript types:
bunx tsc --noEmitI actually prefer writing things in Rust and C++, so those would have been more natural languages for me to pick. But there are a couple of reasons why this ended up being written in TypeScript.
- I wanted to learn TypeScript - all of my normal work is in more low-level languages.
- It seems (to me) to be relatively easy to arbitrarily import modules, which is helpful in the fast-moving AI space.