llama_cpp - Multiple calls to 'choice' generator do not return results. #1109

willkurt · 2024-08-21T17:46:56Z

Describe the issue as clearly as possible:

When using outlines.models.llama_cpp and making repeated calls to an instances of outlines.generate.choice only the first call returns a results. This can be resolved by re-instantiating the generate for every call, but this is not an ideal solution.

The model I use in the example code is directly from the Cookbook CoT example, but this issue arose with multiple different models I had attempted earlier.

The example code will produce the following output when I run it:

result: clothing
result: 
result:

I am running this on a Mac M2 and an M3 Macbook

Steps/code to reproduce the bug:

import llama_cpp
from outlines import generate, models
from textwrap import dedent

llama_tokenizer = llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
            "NousResearch/Hermes-2-Pro-Llama-3-8B"
            )
tokenizer = llama_tokenizer.hf_tokenizer

model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
            tokenizer=llama_tokenizer,
            n_gpu_layers=-1,
            flash_attn=True,
            n_ctx=8192,
            verbose=False)

complaint_data = [{'message': 'Hi, my name is Olivia Brown.I recently ordered a knife set from your wellness range, and it arrived earlier this week. Unfortunately, my satisfaction with the product has been less than ideal.My order was A123456',
  'order_number': 'A12-3456',
  'department': 'kitchen'},
 {'message': 'Hi, my name is John Smith.I recently ordered a dress for an upcoming event, which was alleged to meet my expectations both in fit and style. However, upon arrival, it became apparent that the fabric was of subpar quality, leading to a less than satisfactory appearance.The order number is A12-3456',
  'order_number': 'A12-3456',
  'department': 'clothing'},
 {'message': 'Hi, my name is Sarah Johnson.I recently ordered the ultimate ChefMaster 8 Drawer Cooktop. However, upon delivery, I discovered that one of the burners is malfunctioning.My order was A458739',
  'order_number': 'A45-8739',
  'department': 'kitchen'}]

departments = ["clothing","electronics","kitchen","automotive"]

def create_prompt(complaint):
    prompt_messages = [
        {
            "role": "system",
            "content": "You are as agent designed to help label complaints."
        },
        {
        "role": "user",
        "content": dedent("""
        I'm going to provide you with a consumer complaint to analyze.
        The complaint is going to be regarding a product from one of our
        departments. Here is the list of departments:
            - "clothing"
            - "electronics"
            - "kitchen"
            - "automotive"
        Please reply with *only* the name of the department.
        """)
    },{
        "role": "assistant",
        "content": "I understand and will only answer with the department name"
    },{
        "role": "user",
        "content": f"Great! Here is the complaint: {complaint['message']}"
    }
                       
                      ]
    prompt = tokenizer.apply_chat_template(prompt_messages, tokenize=False)
    return prompt


if __name__ == "__main__":
    generator_struct = generate.choice(model,departments)
    for complaint in complaint_data:
        prompt = create_prompt(complaint)
        result = generator_struct(prompt)
        print(f"result: {result}")

Expected result:

result: clothing
result: clothing
result: electronics

Error message:

No response

Outlines/Python version information:

Version information

0.0.46 Python 3.11.0 (main, Jul 6 2024, 12:54:41) [Clang 15.0.0 (clang-1500.3.9.4)] aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 attrs==24.2.0 certifi==2024.7.4 charset-normalizer==3.3.2 cloudpickle==3.0.0 datasets==2.21.0 dill==0.3.8 diskcache==5.6.3 filelock==3.15.4 frozenlist==1.4.1 fsspec==2024.6.1 huggingface-hub==0.24.6 idna==3.7 interegular==0.3.3 Jinja2==3.1.4 jsonschema==4.23.0 jsonschema-specifications==2023.12.1 lark==1.2.2 llama_cpp_python==0.2.89 llvmlite==0.43.0 MarkupSafe==2.1.5 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.3 numba==0.60.0 numpy==1.26.4 outlines==0.0.46 packaging==24.1 pandas==2.2.2 pyairports==2.1.1 pyarrow==17.0.0 pycountry==24.6.1 pydantic==2.8.2 pydantic_core==2.20.1 python-dateutil==2.9.0.post0 pytz==2024.1 PyYAML==6.0.2 referencing==0.35.1 regex==2024.7.24 requests==2.32.3 rpds-py==0.20.0 safetensors==0.4.4 six==1.16.0 sympy==1.13.2 tokenizers==0.19.1 torch==2.4.0 tqdm==4.66.5 transformers==4.44.1 typing_extensions==4.12.2 tzdata==2024.1 urllib3==2.2.2 xxhash==3.5.0 yarl==1.9.4

Context for the issue:

This issue arose while putting together an Outlines workshop for ODSC. I had originally hoped to use llama_cpp for the workshop but this (and another soon to be posted bug) were blockers (I ended up using transformers instead).

The text was updated successfully, but these errors were encountered:

cpfiffer · 2024-08-22T01:05:34Z

I had the same issue on a different application, but I figured it was mostly inexperience. I believe I ended up recreating the generator each time, which is a temporary workaround for people who stumble on the issue.

Note that this will be slow and (I think) requires rebuilding the FSM each time.

lapp0 · 2024-09-16T01:42:58Z

The SequenceGeneratorAdapter should be creating a new logits processor each run, but it isn't.

Should be an easy fix.

willkurt added the bug label Aug 21, 2024

lapp0 mentioned this issue Sep 16, 2024

Outlines examples not working #1103

Open

cpfiffer mentioned this issue Sep 16, 2024

llama_cpp - JSON fails to generate when using Pydantic model with models.llama_cpp #1110

Open

lapp0 mentioned this issue Sep 17, 2024

Don't re-use logits processors in SequenceGeneratorAdapter, copy them #1160

Merged

rlouf closed this as completed in #1160 Sep 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_cpp - Multiple calls to 'choice' generator do not return results. #1109

llama_cpp - Multiple calls to 'choice' generator do not return results. #1109

willkurt commented Aug 21, 2024

cpfiffer commented Aug 22, 2024

lapp0 commented Sep 16, 2024

llama_cpp - Multiple calls to 'choice' generator do not return results. #1109

llama_cpp - Multiple calls to 'choice' generator do not return results. #1109

Comments

willkurt commented Aug 21, 2024

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

cpfiffer commented Aug 22, 2024

lapp0 commented Sep 16, 2024