Skip to content

MaidReal/brain

Repository files navigation

🧠 brain

ROS 2 bridge/wrapper for llama.cpp with models like GPT-OSS-20B for reasoning-based motor control.

Features

  • ROS 2 node exposing /llama_input and /llama_output topics
  • Uses llama.cpp backend
  • Supports GPU (CUDA) and CPU-only inference
  • Builds and handles large models in GGUF format

Dependencies

  • Ubuntu 24
  • ROS 2 (Jazzy recommended)
  • Python 3.12 (or compatible)
  • Models: huggingface (Tested with GPT-OSS-20B)

Environment Setup

Set up ROS 2 is not already

sudo apt install software-properties-common curl
sudo add-apt-repository universe
curl -sSL https://raw.githubusercontent.com/ros/rosdistro/master/ros.key | sudo tee /usr/share/keyrings/ros-archive-keyring.gpg >/dev/null
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/ros-archive-keyring.gpg] http://packages.ros.org/ros2/ubuntu $(. /etc/os-release && echo $UBUNTU_CODENAME) main" | sudo tee /etc/apt/sources.list.d/ros2.list >/dev/null
sudo apt update

sudo apt install ros-jazzy-desktop

Clone into your ROS 2 workspace

mkdir ~/ros2_ws/src # Or whatver you'd like to call it
cd ~/ros2_ws/src

git clone https://github.com/MaidReality/brain.git

Source venv

cd ..

# Make python ve outside of src if you don't have one yet
python3 -m venv venv

source venv/bin/activate

Install dependencies

python -m pip install --upgrade pip wheel setuptools
cd src/brain
pip install -r requirements.txt

Caution

There might be verison mismatch, check that you have the exact ones like:
pip install "setuptools>=68,<70" "setuptools-scm<8"

Export venv's python so ROS 2 can see it

export PYTHONPATH=$HOME/MaidReal/venv-maid/lib/python3.12/site-packages:$PYTHONPATH

Clone all the submodules

GIT_LFS_SKIP_SMUDGE=1 git submodule update --init --recursive

Warning

Since the submodules are externally sourced, add COLCON_IGNORE to them so colcon does not attempt to build them.
ie. touch src/brain/llama.cpp/COLCON_IGNORE
ie. touch src/brain/gpt-oss-20b/COLCON_IGNORE
And touch the venv as well: touch venv/COLCON_IGNORE


Model Setup

Install llama.cpp

Important

To enable GPU support:
First check if you have CUDA installed: sudo apt install nvidia-cuda-toolkit
Then check if your drivers are installed: nvidia-smi
Then run cmake with additional command -DGGML_CUDA=ON
Or if you're only using CPU with RAM, skip this and run the following:

Warning

If running on WSL or Ubuntu make sure to run:

 sudo apt-get install ninja-build
 sudo apt-get install build-essential
cd llama.cpp

cmake -S . -B build -G Ninja \
  -DCMAKE_BUILD_TYPE=Release \
  -DLLAMA_BUILD_TESTS=OFF \
  -DLLAMA_BUILD_EXAMPLES=ON \
  -DLLAMA_BUILD_SERVER=ON \
 #-DGGML_CUDA=ON

cmake --build build --config Release
sudo cmake --install build --config Release

# Refresh cache of shared libraries
sudo ldconfig

# Test
llama-cli --help

Download the model (was split into 3 parts) + the tokenizer.json -> here

cd gpt-oss-20b

Replace the pointers with the downloaded files (cut & paste)

Install dependencies

cd .. # you should be in src/brain
python -m pip install --upgrade -r llama.cpp/requirements/requirements-convert_hf_to_gguf.txt
CMAKE_ARGS="-DLLAMA_CUDA=on" pip install llama-cpp-python --no-cache-dir # For CUDA (NVIDIA GPU) support

Convert the model to gguf

python llama.cpp/convert_hf_to_gguf.py gpt-oss-20b/ --outfile models/gpt-oss-20b.gguf

ROS 2 Usage Option 1: Self Host and Run

Run

Build workspace and then launch

cd ~/ros2_ws
colcon build --symlink-install
source install/setup.bash

ros2 launch brain self_llama.launch.py

Test

In another terminal publish a test prompt:

source /opt/ros/jazzy/setup.bash
ros2 topic pub /llama_input std_msgs/String "data: 'hello llama-chan'" -1

In another terminal echo the output:

ros2 topic echo /llama_output

ROS 2 Usage Option 2: Server Connection

Run Server

Follow the guide below for Web Server (GUI) to start the llama server.

Run Bridge

cd ~/ros2_ws
colcon build --symlink-install
source install/setup.bash

ros2 launch brain llama_bridge.launch.py

Test

In another terminal publish a test prompt:

source /opt/ros/jazzy/setup.bash
ros2 topic pub /llama_input std_msgs/String "data: 'hello llama-chan'" -1

In another terminal echo the output:

ros2 topic echo /llama_output --truncate-length 0

CLI / Web Testing (optional)

In CLI

Replace # with amount of layers you want to store on cpu (rest will be allocated to gpu)

cd ..
llama-cli -m models/gpt-oss-20b.gguf --jinja -ngl 99 -fa --n-cpu-moe #

Web Server (GUI)

Replace # with amount of layers you want to store on cpu (rest will be allocated to gpu)

cd ..
llama-server -m models/gpt-oss-20b.gguf --jinja -ngl 99 -fa --n-cpu-moe #

http://127.0.0.1:8080

Tested and optimal for speed on 6GB VRAM: # = 16 -> gives around 30 tps


Web Server API Endpoints

http://127.0.0.1:8080/completion

http://localhost:8080/v1/chat/completions https://github.com/ggml-org/llama.cpp/tree/master/tools/server#post-v1chatcompletions-openai-compatible-chat-completions-api

More information: https://github.com/ggml-org/llama.cpp/tree/master/tools/server#post-completion-given-a-prompt-it-returns-the-predicted-completion

Credits

https://huggingface.co/openai/gpt-oss-20b
https://github.com/ggml-org/llama.cpp
https://blog.steelph0enix.dev/posts/llama-cpp-guide/
ggml-org/llama.cpp#15396

About

LLM wrapper for ROS 2

Resources

Stars

Watchers

Forks

Contributors 2

  •  
  •  

Languages