The goal of this package is to provide an AI assistant to the world of Pokémon.
It consists in a stack of services orchestrated by Kubernetes.
In a nutshell, it encompasses an UI and an inference service. A custom agentic proxy intercepts the requests between these services, processes them, and eventually augments them with information from a vector DB.
The models have been selected with respect to their minimalism, performance and multilingualism.
The project has been set-up such as French is the privileged language of the AI assistant.
This project can also be seen as a natural language processing exercice with relatively limited resources, i.e. a gaming computer. It requires a Nvidia GPU and it is designed for a GNU/Linux server.
To make use of the later, the Nvidia container toolkit is needed.
Start by cloning the repo:
git clone https://github.com/almarch/pokedex.git
cd pokedexThe project is designed to run with k3s, a light distribution of kubernetes.
# install brew
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
echo 'eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"' >> ~/.bashrc
eval "$(/home/linuxbrew/.linuxbrew/bin/brew shellenv)"
brew install kubectl k9s helm
# install & start k3s
curl -sfL https://get.k3s.io | \
K3S_KUBECONFIG_MODE=644 \
INSTALL_K3S_EXEC="--disable traefik" \
sh -
sudo systemctl stop k3s
sudo systemctl start k3sTo load kubectl, k9s & helm:
export KUBECONFIG=/etc/rancher/k3s/k3s.yamlGenerate all secrets:
echo "WEBUI_SECRET_KEY=$(cat /dev/urandom | tr -dc 'A-Za-z0-9' | fold -w 32 | head -n 1)" > .env
kubectl create secret generic all-secrets \
--from-env-file=.env \
--dry-run=client -o yaml > k8s/secrets.yaml
kubectl apply -f k8s/secrets.yamlInstall ingress and cert-manager:
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.kind=DaemonSet \
--set controller.hostNetwork=true \
--set controller.hostPort.enabled=true \
--set controller.dnsPolicy=ClusterFirstWithHostNet \
--set controller.service.type=ClusterIP
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--set crds.enabled=trueThen set-up the nvidia plugin:
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.5/nvidia-device-plugin.yml
kubectl patch daemonset -n kube-system nvidia-device-plugin-daemonset \
--type merge \
-p '{"spec":{"template":{"spec":{"runtimeClassName":"nvidia"}}}}'
kubectl rollout restart daemonset/nvidia-device-plugin-daemonset -n kube-system
kubectl describe node | grep -i nvidiaBuild the custom images and provide them to k3s:
docker build -t poke-agent:latest -f dockerfile.agent .
docker build -t poke-notebook:latest -f dockerfile.notebook .
docker save poke-agent:latest | sudo k3s ctr images import -
docker save poke-notebook:latest | sudo k3s ctr images import -Mount the log & notebook volumes:
sudo mkdir -p /mnt/k3s/logs
sudo mkdir -p /mnt/k3s/notebook
sudo mount --bind "$(pwd)/logs" /mnt/k3s/logs
sudo mount --bind "$(pwd)/notebook" /mnt/k3s/notebookK3s use docker latest images automatically. Load and deploy all services:
kubectl apply -R -f k8s/Check the installation status:
k9sThe services need to be exposed to localhost either for local use, either to tunnel them to a VPS. For instance, to expose both the notebook, ollama and qdrant:
screen
trap "kill 0" SIGINT
kubectl port-forward svc/notebook 8888:8888 &
kubectl port-forward svc/ollama 11434:11434 &
kubectl port-forward svc/qdrant 6333:6333 &
waitThen Ctrl+A+D to leave the port-forward screen. The webui should not be port-forwarded as its access is managed by ingress.
An Ollama inference service is included in the stack.
kubectl get podsPull the models from an Ollama pod:
kubectl exec -it <pod-name> -- ollama pull mistral-nemo:12b-instruct-2407-q4_0
kubectl exec -it <pod-name> -- ollama pull embeddinggemma:300mA Qdrant vector DB is included in the stack.
It must be filled using the Jupyter Notebook service, accessible at https://localhost:8888/lab/workspaces/auto-n/tree/pokemons.ipynb.
Pokémon data come from this repo.
On this figure, we can observe how the Pokémons have been ordered on a 2D plane from the embedding space.
Open-WebUI is included in the stack.
Reach https://localhost and parameterize the interface. Deactivate the encoder model, and make the LLM accessible to all users. If needed, make accounts to the family & friends you would like to share the app with.
This framework can readily adapt to other RAG/agentic projects.
- The data base should be filled with relevant collections.
- The custom agentic logics is centralised in
myAgent/myAgent/Agent.py.
Say we need to tunnel the server using a VPS. In other terms, we want some services from the GPU server, let's call it A, to be accessible from anywhere, including from machine C. In the middle, B is the VPS used as a tunnel.
| Name | A | B | C |
|---|---|---|---|
| Description | GPU server | VPS | Client |
| Role | Host the services | Host the tunnel | Use the Pokédex |
| User | userA | root | doesn't matter |
| IP | doesn't matter | 11.22.33.44 | doesn't matter |
The services we need are:
- The web UI, available at ports 80/443. This port will be exposed on the web.
- The notebook, available at port 8888. This port will remain available for private use only.
- A SSH endpoint. Port 22 of the gaming machine (A) will be exposed through port 2222 of the VPS (B).
The VPS must allow gateway ports. In /etc/ssh/sshd_config:
AllowTcpForwarding yes
GatewayPorts yes
PermitRootLogin yes
Then:
sudo systemctl restart sshTo access ports 80 and 443, the VPS user must be root. If no root user exists, from the VPS:
sudo passwd rootThe ports are then pushed to the VPS from the GPU server:
screen
sudo ssh -N -R 80:localhost:80 -R 443:localhost:443 -R 8888:localhost:8888 -R 2222:localhost:22 root@11.22.33.44The VPS firewall has to be parameterized:
sudo ufw allow 2222
sudo ufw allow 443
sudo ufw allow 80
sudo ufw reloadThe UI is now available world-wide at https://11.22.33.44, using self-signed certificates.
The jupyter notebook is pulled from the VPS:
ssh -N -L 8888:localhost:8888 root@11.22.33.44The notebook is now available for the client at https://localhost:8888.
And the VPS is a direct tunnel to the gaming machine A:
ssh -p 2222 userA@11.22.33.44This work is licensed under GPL-2.0.


