Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama_cpp - Multiple calls to 'choice' generator do not return results. #1109

Closed
willkurt opened this issue Aug 21, 2024 · 2 comments · Fixed by #1160
Closed

llama_cpp - Multiple calls to 'choice' generator do not return results. #1109

willkurt opened this issue Aug 21, 2024 · 2 comments · Fixed by #1160
Labels

Comments

@willkurt
Copy link
Contributor

Describe the issue as clearly as possible:

When using outlines.models.llama_cpp and making repeated calls to an instances of outlines.generate.choice only the first call returns a results. This can be resolved by re-instantiating the generate for every call, but this is not an ideal solution.

The model I use in the example code is directly from the Cookbook CoT example, but this issue arose with multiple different models I had attempted earlier.

The example code will produce the following output when I run it:

result: clothing
result: 
result: 

I am running this on a Mac M2 and an M3 Macbook

Steps/code to reproduce the bug:

import llama_cpp
from outlines import generate, models
from textwrap import dedent

llama_tokenizer = llama_cpp.llama_tokenizer.LlamaHFTokenizer.from_pretrained(
            "NousResearch/Hermes-2-Pro-Llama-3-8B"
            )
tokenizer = llama_tokenizer.hf_tokenizer

model = models.llamacpp("NousResearch/Hermes-2-Pro-Llama-3-8B-GGUF",
            "Hermes-2-Pro-Llama-3-8B-Q4_K_M.gguf",
            tokenizer=llama_tokenizer,
            n_gpu_layers=-1,
            flash_attn=True,
            n_ctx=8192,
            verbose=False)

complaint_data = [{'message': 'Hi, my name is Olivia Brown.I recently ordered a knife set from your wellness range, and it arrived earlier this week. Unfortunately, my satisfaction with the product has been less than ideal.My order was A123456',
  'order_number': 'A12-3456',
  'department': 'kitchen'},
 {'message': 'Hi, my name is John Smith.I recently ordered a dress for an upcoming event, which was alleged to meet my expectations both in fit and style. However, upon arrival, it became apparent that the fabric was of subpar quality, leading to a less than satisfactory appearance.The order number is A12-3456',
  'order_number': 'A12-3456',
  'department': 'clothing'},
 {'message': 'Hi, my name is Sarah Johnson.I recently ordered the ultimate ChefMaster 8 Drawer Cooktop. However, upon delivery, I discovered that one of the burners is malfunctioning.My order was A458739',
  'order_number': 'A45-8739',
  'department': 'kitchen'}]

departments = ["clothing","electronics","kitchen","automotive"]

def create_prompt(complaint):
    prompt_messages = [
        {
            "role": "system",
            "content": "You are as agent designed to help label complaints."
        },
        {
        "role": "user",
        "content": dedent("""
        I'm going to provide you with a consumer complaint to analyze.
        The complaint is going to be regarding a product from one of our
        departments. Here is the list of departments:
            - "clothing"
            - "electronics"
            - "kitchen"
            - "automotive"
        Please reply with *only* the name of the department.
        """)
    },{
        "role": "assistant",
        "content": "I understand and will only answer with the department name"
    },{
        "role": "user",
        "content": f"Great! Here is the complaint: {complaint['message']}"
    }
                       
                      ]
    prompt = tokenizer.apply_chat_template(prompt_messages, tokenize=False)
    return prompt


if __name__ == "__main__":
    generator_struct = generate.choice(model,departments)
    for complaint in complaint_data:
        prompt = create_prompt(complaint)
        result = generator_struct(prompt)
        print(f"result: {result}")

Expected result:

result: clothing
result: clothing
result: electronics

Error message:

No response

Outlines/Python version information:

Version information

0.0.46 Python 3.11.0 (main, Jul 6 2024, 12:54:41) [Clang 15.0.0 (clang-1500.3.9.4)] aiohappyeyeballs==2.4.0 aiohttp==3.10.5 aiosignal==1.3.1 annotated-types==0.7.0 attrs==24.2.0 certifi==2024.7.4 charset-normalizer==3.3.2 cloudpickle==3.0.0 datasets==2.21.0 dill==0.3.8 diskcache==5.6.3 filelock==3.15.4 frozenlist==1.4.1 fsspec==2024.6.1 huggingface-hub==0.24.6 idna==3.7 interegular==0.3.3 Jinja2==3.1.4 jsonschema==4.23.0 jsonschema-specifications==2023.12.1 lark==1.2.2 llama_cpp_python==0.2.89 llvmlite==0.43.0 MarkupSafe==2.1.5 mpmath==1.3.0 multidict==6.0.5 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.3 numba==0.60.0 numpy==1.26.4 outlines==0.0.46 packaging==24.1 pandas==2.2.2 pyairports==2.1.1 pyarrow==17.0.0 pycountry==24.6.1 pydantic==2.8.2 pydantic_core==2.20.1 python-dateutil==2.9.0.post0 pytz==2024.1 PyYAML==6.0.2 referencing==0.35.1 regex==2024.7.24 requests==2.32.3 rpds-py==0.20.0 safetensors==0.4.4 six==1.16.0 sympy==1.13.2 tokenizers==0.19.1 torch==2.4.0 tqdm==4.66.5 transformers==4.44.1 typing_extensions==4.12.2 tzdata==2024.1 urllib3==2.2.2 xxhash==3.5.0 yarl==1.9.4

Context for the issue:

This issue arose while putting together an Outlines workshop for ODSC. I had originally hoped to use llama_cpp for the workshop but this (and another soon to be posted bug) were blockers (I ended up using transformers instead).

@willkurt willkurt added the bug label Aug 21, 2024
@cpfiffer
Copy link
Contributor

I had the same issue on a different application, but I figured it was mostly inexperience. I believe I ended up recreating the generator each time, which is a temporary workaround for people who stumble on the issue.

Note that this will be slow and (I think) requires rebuilding the FSM each time.

@lapp0
Copy link
Contributor

lapp0 commented Sep 16, 2024

The SequenceGeneratorAdapter should be creating a new logits processor each run, but it isn't.

Should be an easy fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants