Skip to content

HCIS-Lab/home-nlp

Repository files navigation

home-nlp

Python CUDA cuDNN

A ROS 2 package for speech-driven human-robot interaction, combining

  • mic_node: captures live microphone audio
  • asr_node: performs real-time transcription with Whisper (via faster-whisper)
  • llm_node: processes transcribed text with an LLM and generates robotic behavior trees

This pipeline enables a home robot to understand spoken input and respond with context-aware actions.

┌─────────────┐    /audio_data    ┌─────────────┐    /transcription    ┌─────────────┐
│  mic_node   │ ──────────────→   │  asr_node   │ ──────────────────→  │  llm_node   │
│             │                   │             │     /llm_input       │             │
│ Microphone  │                   │   Whisper   │                      │ LLM + BT    │
│   Audio     │                   │ Streaming   │                      │ Generation  │
│  Capture    │                   │     ASR     │                      │             │
└─────────────┘                   └─────────────┘                      └─────────────┘

Quickstart

Using Launch File (Recommended)

Launch all nodes together with default parameters:

ros2 launch home_nlp launch.py

Customize parameters:

ros2 launch home_nlp launch.py \
  sample_rate:=48000 \
  device:="USB Composite Device" \
  language:="en" \
  asr_model:="large-v2" \
  llm_model:="google/gemma-3-4b-it"

View all available parameters:

ros2 launch home_nlp launch.py --show-args

Running Individual Nodes

Launch the mic_node:

ros2 run home_nlp mic_node --ros-args \
  -p sample_rate:=48000 \
  -p block_duration:=1.0 \
  -p num_channel:=1 \
  -p device:="USB Composite Device"

Launch the asr_node:

ros2 run home_nlp asr_node --ros-args \
  -p language:="zh" \
  -p model:="large-v2" \
  -p sample_rate:=48000 \
  -p block_duration:=1.0 \
  -p period:=1.0 \
  -p max_empty_count:=0

Launch the llm_node:

ros2 run home_nlp llm_node --ros-args \
  -p period:=1.0 \
  -p model:="google/gemma-3-1b-it"

Running in Docker Containers

Build the image:

docker build -t lnfu/home_nlp

Run individual nodes:

docker run --rm lnfu/home_nlp ros2 run home_nlp mic_node
docker run --rm lnfu/home_nlp ros2 run home_nlp asr_node
docker run --rm -e HF_TOKEN="${HF_TOKEN}" lnfu/home_nlp ros2 run home_nlp llm_node

LLM Model Comparison

Valid XML indicates the percentage of runs (out of 100) producing syntactically valid XML.
Note: This does not verify semantic correctness of the behavior tree.

Model Loading Time (s) Response Time (s) VRAM Usage (MB) RAM Usage (MB) Valid XML (%)
gemma-3-1b-it 4.19 3.22 2482 2363 66
gemma-3-4b-it 7.25 4.34 9480 5455 99
deepseek 6.7b-it 15.64 3.03 14096 10094 76
Phi-4-mini-it 8.26 3.34 8735 5251 60
Mistral-7B-it-v0.3 5.68 2.02 14135 5344 98
LLama-3.1-8B-it 6.08 1.80 15623 5350 99

Whisper Streaming

This project integrates ideas and components from
ufal/whisper_streaming,
which provides the foundation for real-time Whisper transcription.

Reference

@inproceedings{machacek-etal-2023-turning,
    title = "Turning Whisper into Real-Time Transcription System",
    author = "Mach{\'a}{\v{c}}ek, Dominik  and
      Dabre, Raj  and
      Bojar, Ond{\v{r}}ej",
    editor = "Saha, Sriparna  and
      Sujaini, Herry",
    booktitle = "Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = nov,
    year = "2023",
    address = "Bali, Indonesia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.ijcnlp-demo.3",
    pages = "17--24",
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •