Local Inference

Local Inference

Note

The commands in this guide are intended for Ubuntu Linux. If you are using a different platform (e.g., Windows or macOS), please refer to the official documentation of the tool for platform-specific instructions.

Tools

Backend

Further Reading: Ollama vs. vLLM: Choosing the Best Tool for AI Model Workflows

Ollama [Docs]
- Keep Model Alive in GPU Memory
- Handle Concurrent Requests
vLLM [Docs]
- Monitor Prometheus and Grafana

Frontend

Open Web UI
- Environment Variable Configuration

Monitor

Tip

Check out prometheus_grafana for more details.

Grafana [Docs]
Prometheus [Docs]

Others

OpenAI & other LLM API Pricing Calculator - Calculate the cost of using OpenAI and other Large Language Models (LLMs) APIs

Open Source Model Collections

Environment Setup

First, clone the repository:

git clone --recurse-submodules https://github.com/xxrjun/local-inference.git

Then, create a new Conda environment and install the required dependencies:

conda env -n local-inference python=3.12
conda activate local-inference

# Install Python dependencies
pip install -r requirements.txt

# Install Ollama on Linux
curl -fsSL https://ollama.com/install.sh | sh

Example Usage

It is recommended to use tmux to manage multiple sessions.

Ollama

tmux new -s ollama-serve
./examples/ollama_serve.sh

tmux new -s ollama-run
./examples/ollama_run.sh

vLLM

tmux new -s vllm-serve
./examples/vllm_serve.sh

Open Web UI

tmux new -s open-webui
./examples/open_webui.sh

Test OpenAI Compatible API

Copy .env.example to .env:

cp .env.example .env

Edit .env with the correct values, then run the test script:

python scripts/test_openai_client.py

If the API is working correctly, the output should resemble the following:

ChatCompletionMessage(content='Hello! How can I help you today? If you have any questions or need assistance, feel free to ask.', refusal=None, role='assistant', audio=None, function_call=None, tool_calls=[], reasoning_content=None)

Downstream Applications

Immersive Translate 沉浸式翻譯

Refer to My Immersive Translate Setup Guide or Offical Docs，

TTS (Text-to-Speech)

What is TTS?

Refer to the My TTS Setup Guide for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
applications		applications
assets		assets
examples		examples
prometheus_grafana		prometheus_grafana
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Local Inference

Tools

Backend

Frontend

Monitor

Others

Open Source Model Collections

Environment Setup

Example Usage

Ollama

vLLM

Open Web UI

Test OpenAI Compatible API

Downstream Applications

Immersive Translate 沉浸式翻譯

TTS (Text-to-Speech)

About

Releases

Packages

Languages

xxrjun/local-inference

Folders and files

Latest commit

History

Repository files navigation

Local Inference

Tools

Backend

Frontend

Monitor

Others

Open Source Model Collections

Environment Setup

Example Usage

Ollama

vLLM

Open Web UI

Test OpenAI Compatible API

Downstream Applications

Immersive Translate 沉浸式翻譯

TTS (Text-to-Speech)

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages