Skip to content

/parallel often produces truncated outputs #5601

Closed
@k-gyuhak

Description

@k-gyuhak

I encountered an unexpected behavior when running in the following command:

./parallel -m ./models/llama_7b/llama-2-7b/ggml-model-f16.gguf -t 1 -ngl 100 -c 4096 -b 512 -s 1 -np 8 -ns 128 -n 100 -cb,

following the instruction for Serving multiple clients with parallel decoding and continuous batching (#3749 (comment)).

The model truncates the outputs for some prompts. For instance, the model stoped generating outputs after "... for gettting started:" as shown in the image below:
Screenshot 2024-02-19 at 4 30 03 PM

A similar behavior is observed with mixtral-8x7b-instruct using the following command:

./parallel -m ./models--mistralai--Mixtral-8x7B-Instruct-v0.1/ggml-model-Q4_K_M.gguf -c 8192 -ngl 100 -f ~/data1_10.txt -n 2000 --temp 0.5 --top-p 0.9 --color --in-prefix "[INST]" --in-suffix "[/INST]" -b 8192 -np 2 -ns 10 -cb -t 1.

As shown in the image below, the model truncates output after "... The transcript '".

Screenshot 2024-02-19 at 4 40 20 PM

This behavior disappears when I provide the same prompt to ./main.

I am using 4 A100s.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions