Closed
Description
Expected Behavior
When using ./main
with the --grammar
flag, llama.cpp successfully generates an output according to the grammar string.
It is expected that this behavior transfers to ./parallel
as well.
Current Behavior
./parallel <args> ... --grammar <grammar_string>
doesn't respect the grammar, so llama.cpp generates free-form text.
Environment and Context
MacBook Pro, M1 Pro chip, macOS Sonoma
- Operating System, e.g. for Linux:
$ uname -a
Darwin <my_username>.local 23.0.0 Darwin Kernel Version 23.0.0: Fri Sep 15 14:41:43 PDT 2023; root:xnu-10002.1.13~1/RELEASE_ARM64_T6000 arm64
- SDK version, e.g. for Linux:
$ python3 --version
$ make --version
GNU Make 3.81
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
This program built for i386-apple-darwin11.3.0
$ g++ --version
Apple clang version 15.0.0 (clang-1500.0.40.1)
Target: arm64-apple-darwin23.0.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
Failure Information (for bugs)
I'm not sure if it's related, but I noticed parallel
decoding treats each line of the prompt as a separate prompt (for separate sequences).
Also, parallel
decoding seems to take place ina chat settings, not completion settings.
Steps to Reproduce
For example, try this:
./parallel --prompt 'What's your favorite number?' --in-prefix '' --in-suffix '' --model <model_path> --ctx-size 8192 --color --n-predict 128 --keep 0 --temp 0.8 --repeat-penalty 1.1 --repeat-last-n 64 --grammar '# `root` specifies the pattern for the overall output
root ::= (
value
)
value ::= "1" | "2" | "3"
' --parallel 1 --sequences 1 --threads 10 --n-gpu-layers 128 --main-gpu 0
Metadata
Metadata
Assignees
Labels
No labels