Skip to content

elephantrobotics/jobot-ai-elephant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

jobot-ai-elephant

myCobot280 RISCV Smart Retail Scene System

Install the Code

  • Use git
git clone  https://github.com/elephantrobotics/jobot-ai-elephant.git

Environment Setup

sudo apt install -y \
    spacemit-ollama-toolkit \
    portaudio19-dev \
    python3-dev \
    libopenblas-dev \
    ffmpeg \
    python3-venv \
    python3-spacemit-ort \
    libceres-dev \
    libopencv-dev

Large Model Dependency Installation

cd ~/jobot-ai-elephant/spacemit_audio
bash ollama.sh

Python Dependency Installation

cd ~/jobot-ai-elephant
python3 -m venv ~/asr_env
source ~/asr_env/bin/activate
pip install -r requirements.txt

Add User to Audio Group

sudo usermod -aG audio $USER

Using the Code

Check Recording Devices

Supports automatic recognition. If automatic recognition fails, manual Settings are required

arecord -l

Sample output:

Devices with "camera" in the name are camera-related and should not be selected. Card 3 is usable.

**** List of CAPTURE Hardware Devices ****
card 1: Camera [USB Camera], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0
card 2: Camera_1 [USB 2.0 Camera], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0
card 3: Device [USB PnP Sound Device], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0

Modify the recording device index to 3 in the jobot-ai-pipeline/smart_main_asr.py file:

...
record_device = 3  # Recording device index, needs to be changed
rec_audio = RecAudioThreadPipeLine(vad_mode=1, sld=2, max_time=2, channels=1, rate=48000, device_index=record_device)
...

Control Maximum Recording Duration

rec_audio.max_time_record = 3  # Maximum recording time in seconds

Recording runs in non-blocking mode by default. For most applications, serial execution is more common—use join() to wait for recording to finish:

# Start recording user audio
rec_audio.max_time_record = 3
rec_audio.frame_is_append = True
rec_audio.start_recording()
rec_audio.thread.join()  # Wait for recording to complete

Check Playback Devices

Supports automatic recognition. If automatic recognition fails, manual Settings are required

aplay -l

Sample output:

(asr_env) jobot-ai-pipeline git:(main) aplay -l
card 0: sndes8326 [snd-es8326], device 0: i2s-dai0-ES8326 HiFi ES8326 HiFi-0 []
    Subdevices: 1/1
    Subdevice #0: subdevice #0
card 2: Device [USB Audio Device], device 0: USB Audio [USB Audio]
    Subdevices: 1/1
    Subdevice #0: subdevice #0

The USB speaker corresponds to card 2. Therefore, set:
play_device = 'plughw:2,0'

Update the following files accordingly:

# smart_main_asr.py
play_device='plughw:0,0'  # Playback device

Run the Code

cd ~/jobot-ai-elephant
source ~/asr_env/bin/activate  # Run within the virtual environment
python smart_main_asr.py

After pressing Enter with no input, it enters recording mode. Default is 3 seconds.

Fuzzy command matching supported examples:

"Grab the orange", "Grab the apple", "Checkout", etc.

Commands supported by the large language model:

  1. "Give me an apple", "And an orange" ...
  2. The large model can recognize object names.

Startup script description

smart_main_asr.py: Chinese voice input, including the entire process of speech-to-text conversion, LLM, object detection, capture, QR code recognition, and OCR text recognition

smart_main.py: English text input, including the entire process of LLM, object detection, crawling, QR code recognition, and OCR text recognition

smart_simple_asr.py: Chinese voice input, only including speech-to-text conversion, LLM, object detection, and capture processes, used for quick demonstration

Smart_simple.py: English text input, only including LLM, object detection, and crawling processes, used for quick demonstration

Project Directory Structure

├── spacemit_audio          # Audio module: recording, playback, ASR
├── spacemit_cv             # Computer vision module
├── spacemit_llm            # Large language model module
├── spacemit_orc            # OCR module
├── tools                   # Utilities
├── feedback_wav            # Feedback audio
├── cv_robot_arm_demo.py
├── ocr_demo.py             # Standalone OCR test
├── README_EN.md            # English Use Documentation
├── README.md               # Chinese Use Documentation
├── smart_main_asr.py       # Main retail program(Voice interaction)
├── smart_main.py       	# Without voice interaction
├── smart_simple_asr.py 	# Simple recognition and capture examples (voice interaction)
├── smart_simple.py     	# Simple recognition and capture examples
├── test_asr.py     # The recording can be tested.
├── test_llm.py     # Test the large model separately
├── test_match.py   # Test the function matching separately
├── test_play.py    # Test playback alone
└── to_zero.py      # The robotic arm returns to the recognition zero point

About

myCobot280 RISCV Smart Retail Scene System

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages