Meta-Llama-3-8B-Instruct-GGUF - Streaming response by sentences #12633

eric-patton-bam · 2025-03-28T22:31:02Z

eric-patton-bam
Mar 28, 2025

I'm incredibly new to running local models, and this may be a basic question, but I can't find a solution for it. I tried creating a shell script using llama-cli that will listen to a response coming back from a prompt and then do something (generate some audio) each time it finds and end of a sentence. I can't seem to get it to work this way though. It goes into an interactive mode and won't do anything until I press Ctrl+C.

Is there any documentation for how to accomplish something like this? I'm very used to doing this kind of thing in C#, but I'm wanting to run this on a Linux OS using an Orange Pi 5, and I want to squeeze out as much performance as I can with it, so I'm going with llama.cpp and trying to learn how to use it.

Here's what my last attempt looked like (I removed some of it that isn't relevant, as it is getting a bit long):

#!/bin/bash

# Check if a message was provided.
if [ "$#" -eq 0 ]; then
  echo "Usage: $0 \"your question here\""
  exit 1
fi

# Combine all arguments into one message in case the message contains spaces.
MESSAGE="$*"

# Other setup went here for paths

echo "Processing query: $MESSAGE"

# Set library path before executing command
export LD_LIBRARY_PATH="$LIBRARY_PATH:$LD_LIBRARY_PATH"

echo "Generating response..."

# Alternative approach - use named pipes for partial output processing
FIFO_PATH="/tmp/llama_fifo"
rm -f "$FIFO_PATH"
mkfifo "$FIFO_PATH"

# Start llama-cli in the background, sending output to the pipe
"$LLAMA_CLI" -m "$MODEL_PATH" --jinja --chat-template llama3 \
    -p "$MESSAGE" \
    -n 300 \
    --temp 0.7 \
    --top-p 0.9 > "$FIFO_PATH" &

# Read from the pipe character by character to build sentences
current_sentence=""
started_response=false
buffer=""

cat "$FIFO_PATH" | while IFS= read -n1 char; do
    # Skip until we find the start of the assistant's response
    if ! $started_response; then
        buffer="$buffer$char"
        if [[ "$buffer" == *"assistant"* ]]; then
            started_response=true
            buffer=""
            continue
        fi
        # Keep a reasonable buffer size
        if [ ${#buffer} -gt 100 ]; then
            buffer="${buffer:1}"
        fi
        continue
    fi

    # Add character to current sentence
    current_sentence="$current_sentence$char"

    # Check if we've reached the end of a sentence (., !, ?)
    if [[ "$char" == "." || "$char" == "!" || "$char" == "?" ]]; then
        # Give a chance for space after punctuation
        read -t 0.1 -n1 next_char < "$FIFO_PATH"
        if [[ "$next_char" == " " || "$next_char" == $'\n' ]]; then
            # Process the sentence
            echo "SENTENCE: $current_sentence"
            echo "$current_sentence" | "$PIPER_CMD" --model "$PIPER_MODEL" --output_file "$TTS_OUTPUT" && aplay "$TTS_OUTPUT"
            # Reset for next sentence
            current_sentence=""
        else
            # Not a sentence end (e.g., "U.S.A."), so add the character and continue
            current_sentence="$current_sentence$next_char"
        fi
    fi
done

# Handle any remaining text
if [ -n "$current_sentence" ]; then
    echo "FINAL TEXT: $current_sentence"
    echo "$current_sentence" | "$PIPER_CMD" --model "$PIPER_MODEL" --output_file "$TTS_OUTPUT" && aplay "$TTS_OUTPUT"
fi

# Clean up
rm -f "$FIFO_PATH" "$TTS_OUTPUT"
# Make sure no background processes are left
wait

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meta-Llama-3-8B-Instruct-GGUF - Streaming response by sentences #12633

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Meta-Llama-3-8B-Instruct-GGUF - Streaming response by sentences #12633

eric-patton-bam Mar 28, 2025

Replies: 0 comments

eric-patton-bam
Mar 28, 2025