Skip to content

synw/locallm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LocalLm

An API to query local language models using different backends with a unified interface. LocalLm supports multiple inference engines and provides a consistent way to interact with various local LLM providers.

Version Name Description Doc
pub package @locallm/types The shared data types Api doc - Readme
pub package @locallm/api Run local language models using different backends Api doc - Readme
pub package @locallm/browser Run quantitized language models inside the browser Api doc - Readme

Why Use LocalLm?

LocalLm provides a unified interface for multiple local language model backends, allowing you to:

  • Switch between different inference engines without changing your code
  • Access advanced features like streaming, tool calling, and multimodal support
  • Work with consistent APIs across different providers
  • Get detailed statistics and progress tracking
  • Leverage TypeScript support for better development experience

Supported Backends

  • Llama.cpp - High-performance inference with C/C++ backend
  • Koboldcpp - Feature-rich inference with GPU support
  • Ollama - Easy-to-use local model management
  • Wllama - In-browser inference using WebAssembly
  • Any OpenAI compatible endpoint - Connect to custom or cloud OpenAI APIs

Features

  • Multiple Backend Support: Seamlessly switch between different inference engines
  • Streaming Responses: Real-time token streaming for interactive applications
  • Tool/Function Calling: Execute functions and tools during inference
  • Multimodal Support: Process both text and images (where supported)
  • Progress Tracking: Monitor model loading and inference progress
  • Detailed Statistics: Get comprehensive performance metrics
  • TypeScript Support: Full type definitions for better development
  • Error Handling: Robust error handling and recovery mechanisms

Quickstart

Prerequisites

  • Node.js 18 or higher
  • One of the supported backends running locally or accessible via network

Installation

# Install the API package
npm install @locallm/api

# Install types package (if needed separately)
npm install @locallm/types

Basic Usage

Example with Koboldcpp Provider

import { Lm } from "@locallm/api";

const lm = new Lm({
  providerType: "koboldcpp",
  serverUrl: "http://localhost:5001",
  onToken: (t) => process.stdout.write(t),
});

const template = "<s>[INST] {prompt} [/INST]";
const prompt = template.replace("{prompt}", "List the planets in our solar system");

// Run the inference query
const result = await lm.infer(prompt, {
  stream: true,
  temperature: 0.2,
  max_tokens: 200,
});

console.log("\nResult:", result.text);
console.log("Stats:", result.stats);

Example with Llama.cpp (OpenAI Compatible)

import { Lm } from "@locallm/api";

const lm = new Lm({
  providerType: "openai",
  serverUrl: "http://localhost:8080/v1",
  onToken: (t) => process.stdout.write(t),
});

// Handle graceful shutdown
process.on('SIGINT', () => {
  lm.abort().then(() => process.exit());
});

const prompt = "Explain quantum computing in simple terms";

const result = await lm.infer(prompt, {
  stream: true,
  temperature: 0.7,
  max_tokens: 300,
});

console.log("\nFull response:", result.text);

Advanced Usage

Loading Models

// Load a specific model with context size
await lm.loadModel("llama3:8b", 8192);

// Check loaded model info
console.log("Current model:", lm.model);
console.log("Available models:", lm.models);

Using Templates

// Using built-in templates
const prompt = lm.template.prompt("What is the capital of France?");

// Using custom templates
const customTemplate = "You are a helpful assistant. User: {prompt} Assistant:";
const formattedPrompt = customTemplate.replace("{prompt}", "Explain photosynthesis");

Tool/Function Calling

const weatherTool = {
  name: "getWeather",
  description: "Get current weather for a location",
  arguments: {
    location: {
      description: "The city and state, e.g. San Francisco, CA",
      required: true
    }
  }
};

const result = await lm.infer("What's the weather in London?", {
  stream: true,
  tools: [weatherTool]
});

// Handle tool calls
if (result.toolCalls) {
  for (const toolCall of result.toolCalls) {
    console.log("Tool called:", toolCall.name);
    console.log("Arguments:", toolCall.arguments);
  }
}

Multimodal Support (Ollama)

import { convertImageUrlToBase64 } from "@locallm/api";

// Convert image to base64
const imageBase64 = await convertImageUrlToBase64("https://example.com/image.jpg");

const result = await lm.infer("Describe this image", {
  stream: true,
  images: [imageBase64],
  max_tokens: 300
});

Working with History (Openai api)

const history = [
  { user: "Hello", assistant: "Hi there!" },
  { user: "How are you?", assistant: "I'm doing well, thanks!" }
];

const result = await lm.infer("What's your name?", {
  stream: true},
  { history: history }  
);

Configuration Options

Provider Parameters

const lm = new Lm({
  providerType: "ollama", // "llamacpp" | "koboldcpp" | "ollama" | "openai" | "browser"
  serverUrl: "http://localhost:11434",
  apiKey: "your-api-key-if-required", // Optional for most providers
  onToken: (token) => process.stdout.write(token), // Optional: streaming callback
  onStartEmit: (stats) => console.log("Started:", stats), // Optional: start callback
  onEndEmit: (result) => console.log("Completed:", result), // Optional: completion callback
  onError: (error) => console.error("Error:", error), // Optional: error callback
});

Inference Parameters

const params = {
  stream: true, // Stream response token by token
  model: { name: "llama3:8b", ctx: 8192 }, // Model configuration
  template: "chatml", // Template name (if supported)
  max_tokens: 500, // Maximum tokens to generate
  temperature: 0.7, // Randomness (0.0-1.0)
  top_p: 0.9, // Nucleus sampling threshold
  top_k: 50, // Limit to top K tokens
  repeat_penalty: 1.1, // Penalty for repeating tokens
  stop: ["</s>", "###"], // Stop sequences
  grammar: "root ::= 'hello' 'world';", // GBNF grammar for constrained generation
  images: ["base64-image-data"], // For multimodal models
  extra: { custom: "parameters" } // Provider-specific parameters
};

Examples

The examples directory contains comprehensive examples for each provider:

Example Description Provider
basic.js Basic text generation All providers
streaming.js Streaming responses All providers
ollama.js Ollama-specific features Ollama
ollama_img.js Image input with Ollama Ollama
ollama_tools.js Tool calling with Ollama Ollama
llamacpp.js Llama.cpp basic usage Llama.cpp
llamacpp_gnbf.js Grammar-based generation Llama.cpp
koboldcpp.js Koboldcpp basic usage Koboldcpp
openai_api.js OpenAI compatible endpoint OpenAI
openai_api_toolcall.js Tool calling with OpenAI OpenAI
openrouter.js Using OpenRouter service OpenAI

Running Examples

# Clone the repository
git clone https://github.com/synw/locallm
cd locallm

# Install dependencies
npm install

# Build the API package
cd packages/api
npm run build
cd ../..

# Install example dependencies
cd examples
npm install

# Run an example (make sure your LLM server is running)
node llamacpp.js

Provider-Specific Notes

Ollama

  • Use await lm.modelsInfo() to list available models
  • Models are loaded using await lm.loadModel(modelName, contextSize)
  • Supports multimodal models with the images parameter
  • Use raw: true in extra parameters for raw prompt mode

Llama.cpp

  • Compatible with OpenAI-compatible endpoints
  • Use grammar parameter for constrained generation
  • Stop sequences can be specified with the stop parameter
  • Server info available via await lm.info()

Koboldcpp

  • Template support with {prompt} placeholder
  • Uses /api/extra/generate/stream endpoint
  • Supports various inference parameters
  • Auto-retrieves model info on inference

OpenAI Compatible

  • Works with any OpenAI-compatible endpoint
  • Full support for tool/function calling
  • System messages via system parameter
  • History management for conversations

Error Handling

try {
  const result = await lm.infer(prompt, params);
  console.log("Success:", result.text);
} catch (error) {
  console.error("Inference failed:", error.message);
  // Handle specific error types
  if (error.message.includes("connection")) {
    // Handle connection errors
  } else if (error.message.includes("model")) {
    // Handle model-related errors
  }
}

Performance Monitoring

// Access detailed statistics
const result = await lm.infer(prompt, { stream: true });

console.log("Inference Statistics:");
console.log("- Total time:", result.stats.totalTime, "ms");
console.log("- Inference time:", result.stats.inferenceTime, "ms");
console.log("- Tokens per second:", result.stats.tokensPerSecond);
console.log("- Total tokens:", result.stats.totalTokens);
console.log("- Server stats:", result.serverStats);

Troubleshooting

Common Issues

Connection Errors

  • Ensure your LLM server is running and accessible
  • Check the server URL and port
  • Verify network connectivity

Model Loading Issues

  • Confirm the model name is correct
  • Check if the model is available on the server
  • Verify sufficient system resources

Performance Issues

  • Adjust temperature and top_p parameters
  • Consider reducing max_tokens for faster responses
  • Check system resources (CPU, memory, GPU)

Debug Mode

Enable debug output for troubleshooting:

const result = await lm.infer(prompt, {
    stream: true
  },
  { debug: true }
);

FAQ

Q: Can I use LocalLm with cloud providers? A: Yes, the OpenAI-compatible provider works with many cloud services that provide OpenAI-compatible APIs.

Q: How do I add a new provider? A: See the packages/api/src/providers directory for examples of implementing new providers.

Q: What's the difference between the packages? A: @locallm/api provides the main interface, @locallm/types contains shared type definitions, and @locallm/browser is for browser-based inference.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

Acknowledgments

  • Llama.cpp for the excellent C++ implementation
  • Koboldcpp for the feature-rich inference server
  • Ollama for easy local model management
  • OpenAI for the API standard that many providers follow

About

An api to query local language models using different backends or the browser

Resources

License

Stars

Watchers

Forks