RunpodTextGenWebUI

Source code for the RunPod template: text-generation-webui-aio

This template runs text-generation-webui on RunPod.

It was inspired by TheBloke's deprecated DockerLLM project.

You can use this template to experiment with nearly any LLM hosted on Hugging Face.

⚠️ Disclaimer: This Dockerfile is intended for experimental use only and is not suitable for production workloads. I take no responsibility for any data loss or security issues. Review the code and proceed at your own risk.

Building the Dockerfile:

cd text-generation-webui-docker/
docker build -t text-generation-webui-aio .

Instructions

Set Up a RunPod Account

RunPod is a paid cloud GPU provider. Go to https://www.runpod.io/, create an account, and add funds.

For this example, we’ll use an RTX A5000, which costs around $0.21/hour + storage.

Setting Up SSH Key (Optional)

If you want SSH access to your pod, add your SSH public key in your RunPod account settings.

Create Network Volume (Optional)

To avoid re-downloading models every time, you can create a persistent network volume. Make sure the volume is in the same region as your pod.

Note: Storage isn't free — 200 GB costs about $15/month.

Create RunPod Secrets

Create these secrets in RunPod for authentication:

GRADIO_USERNAME and GRADIO_PASSWORD: Used to log into the web UI.
MY_OPENAI_KEY: Used for calling the OpenAI-compatible API.

Creating Your Pod

Your pod is the GPU-powered container that runs this template.

Go to Pods and select a GPU like RTX A5000. Make sure “Secure Cloud” is selected.

Click Change Template, search for text-generation-webui-aio, and select the one from mattipaivike321/runpod-text-generation-webui.

The correct template:

Click Edit Template.

If you don’t need SSH, remove port 22 from Expose TCP Ports.

You can also adjust the “Volume Disk” size. This is local storage for downloading models. For this guide, use at least 50 GB. Keep in mind that storage also incurs costs (~$0.10/GB/month).

More info: https://docs.runpod.io/pods/storage/types

Click Set Overrides and then Deploy On-Demand.

Deployment may take several minutes depending on your region.

Accessing text-generation-webui

Once the pod is ready:

Click Connect, then choose HTTP Service 7860.
(Port 5000 is reserved for API access.)

Log in using your GRADIO_USERNAME and GRADIO_PASSWORD.

Downloading and Loading a Model

You can run any model that fits into your GPU’s VRAM.

For this guide, we’ll use an EXL2 model:
MikeRoz/mistralai_Mistral-Small-24B-Instruct-2501-6.0bpw-h6-exl2

Make sure the total size of the .safetensors files is within your GPU’s capacity (RTX A5000 has 24 GB VRAM). Also some left over vram is needed for the context.

The same applies to GGUF models. Use this tool to check required VRAM for your model and context window:
LLM VRAM Calculator

Download the model:

Copy the model path from Hugging Face:

Go to the Model tab, paste the path, and click Download. For GGUF models, you must also enter the exact .gguf filename since most repos contain multiple quant versions

Load the model:

Refresh the model list and select your model from the dropdown.

Wait a few seconds — the loader will be auto-selected (e.g., ExLlamav2_HF for EXL2 models, different one for GGUF).
Increase context size (e.g., 32768) if desired, then click Load. Note that the context also eats away at the vram, so there is a limit!

Loading may take a while depending on model size.

Once loaded:

Go to the Chat tab and try it out:

Use chat-instruct or instruct modes — they provide the correct prompt format. Avoid chat mode due to bugs. For more info, see the official documentation.

Working example:

✅ Don’t forget to terminate your pod when you're done to avoid extra charges!

Calling text-generation-webui API:

There is a separate example Python script and README.md file in the api-call-example/ folder of this repository. It explains how to call the text-generation-webui API programmatically.

Loading a EXL2 model on multiple GPU's

See this instruction on how to load a large +70GB EXL2 model on multiple GPU's.

Loading a GGUF model on multiple GPU's

See this instruction on how to load a large +70GB GGUF model on multiple GPU's.

Connecting to Open WebUI:

See openwebui_example.md for instructions on how to connect text-generation-webui backend to Open WebUI.

Connecting text-generation-webui to SillyTavern:

See sillytavern_example.md for instructions on how to connect text-generation-webui backend to SillyTavern.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
api-call-example		api-call-example
images		images
text-generation-webui-docker		text-generation-webui-docker
LICENSE		LICENSE
README.md		README.md
exl2_multi_gpu_example.md		exl2_multi_gpu_example.md
gguf_multi_gpu_example.md		gguf_multi_gpu_example.md
openwebui_example.md		openwebui_example.md
sillytavern_example.md		sillytavern_example.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RunpodTextGenWebUI

Table of Contents

Instructions

Set Up a RunPod Account

Setting Up SSH Key (Optional)

Create Network Volume (Optional)

Create RunPod Secrets

Creating Your Pod

Accessing text-generation-webui

Downloading and Loading a Model

Download the model:

Load the model:

Calling text-generation-webui API:

Loading a EXL2 model on multiple GPU's

Loading a GGUF model on multiple GPU's

Connecting to Open WebUI:

Connecting text-generation-webui to SillyTavern:

About

Uh oh!

Releases

Packages

Languages

License

MattiPaivike/RunPodTextGenWebUI

Folders and files

Latest commit

History

Repository files navigation

RunpodTextGenWebUI

Table of Contents

Instructions

Set Up a RunPod Account

Setting Up SSH Key (Optional)

Create Network Volume (Optional)

Create RunPod Secrets

Creating Your Pod

Accessing text-generation-webui

Downloading and Loading a Model

Download the model:

Load the model:

Calling text-generation-webui API:

Loading a EXL2 model on multiple GPU's

Loading a GGUF model on multiple GPU's

Connecting to Open WebUI:

Connecting text-generation-webui to SillyTavern:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages