Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TransformersChat code to figure out correct role start and end tokens #791

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ilmarinen
Copy link

No description provided.

@codecov-commenter
Copy link

codecov-commenter commented May 2, 2024

Codecov Report

Attention: Patch coverage is 26.31579% with 14 lines in your changes are missing coverage. Please review.

Project coverage is 62.13%. Comparing base (acb38d1) to head (815b465).

Files Patch % Lines
guidance/models/transformers/_transformers.py 26.31% 14 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #791      +/-   ##
==========================================
+ Coverage   55.37%   62.13%   +6.75%     
==========================================
  Files          55       55              
  Lines        4074     4088      +14     
==========================================
+ Hits         2256     2540     +284     
+ Misses       1818     1548     -270     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -1,5 +1,7 @@
import os
import re
import uuid
import jinja2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need jinja2?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do -- the chat templates are written in jinja2 as a standard

@Harsha-Nori
Copy link
Collaborator

Harsha-Nori commented May 3, 2024

Thanks @ilmarinen, I think this looks good. I'd like to use this as a basis for eventually minimizing TransformersChat subclasses (which are hard to discover in the codebase). Eventually I see a hierarchy:

  1. Built-in cache on TransformersChat object for role tag assignments on popular models (llama 2/3, mistral, phi, etc.)

  2. Fall back on this style of auto-loader based on the built-in jinja2 chat template. This may not be perfect but can get us closer.

  3. If we can't parse the chat template, or if it doesn't exist, fall back to ChatML style and throw a warning to the user?

@marcotcr @slundberg

@slundberg
Copy link
Collaborator

Sounds good to me @Harsha-Nori . Do you want to merge this before making more changes? If so we should add a basic unit test first

@Harsha-Nori
Copy link
Collaborator

I think we can skip merging this and get an initial release out first, and then I'll pull this in as part of my broader proposal of changes for the subsequent release. We'll get some tests added in that phase.

paulbkoch added a commit that referenced this pull request May 15, 2024
…emplate API, fix Anthropic API (#820)

This PR should significantly reduce the number of user-facing classes we
have in Guidance, and reduce subtle bugs introduced by using a
mis-specified Chat Template (models currently silently default to the
ChatML syntax, which many of the latest models don't adhere to). It
should also make it easier for users to add new models to guidance,
either via PR or in their own codebases.

Before:

```python
from guidance.models.transformers import Transformers, TransformersChat

class Llama(Transformers):
    pass

# Users have to do this for most chat models in guidance
class LlamaChat(TransformersChat, Llama):
    def get_role_start(self, role_name, **kwargs):
        if role_name == "system":
            return self._system_prefex + "<<SYS>>\n"
        elif role_name == "user":
            if str(self).endswith("\n<</SYS>>\n\n"):
                return ""  
            else:
                return "[INST] "
        else:
            return " "

    def get_role_end(self, role_name=None):
        if role_name == "system":
            return "\n<</SYS>>\n\n"
        elif role_name == "user":
            return " [/INST]"
        else:
            return " "

lm = LlamaChat(path_to_llama)
```

After:
```python
from guidance.models import Transformers

lm = Transformers(path_to_llama) # automagically works
```

If you're using a rare model and the auto import doesn't automatically
work...

After pt2:
```python
# users can copy paste from https://huggingface.co/meta-llama/Llama-2-7b-chat-hf/blob/main/tokenizer_config.json#L12 
llama2_template = "{% if messages[0]['role'] == 'system' %}{% set loop_messages = messages[1:] %}{% set system_message = messages[0]['content'] %}{% else %}{% set loop_messages = messages %}{% set system_message = false %}{% endif %}{% for message in loop_messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if loop.index0 == 0 and system_message != false %}{% set content = '<<SYS>>\\n' + system_message + '\\n<</SYS>>\\n\\n' + message['content'] %}{% else %}{% set content = message['content'] %}{% endif %}{% if message['role'] == 'user' %}{{ bos_token + '[INST] ' + content.strip() + ' [/INST]' }}{% elif message['role'] == 'assistant' %}{{ ' '  + content.strip() + ' ' + eos_token }}{% endif %}{% endfor %}"

lm = Transformers(path_to_llama, chat_template=llama2_template)
```

or, in the worst case for maximal robustness and customizability:
```python
from guidance._chat import ChatTemplate, UnsupportedRoleException

class Llama2ChatTemplate(ChatTemplate):
    template_str = llama2_template

    def get_role_start(self, role_name):
        if role_name == "system":
            return "[INST] <<SYS>>\n"
        elif role_name == "user":
            return "<s>[INST]"
        elif role_name == "assistant":
            return " "
        else:
            raise UnsupportedRoleException(role_name, self)
        
    def get_role_end(self, role_name=None):
        if role_name == "system":
            return "\n<</SYS>"
        elif role_name == "user":
            return " [/INST]"
        elif role_name == "assistant":
            return "</s>"
        else:
            raise UnsupportedRoleException(role_name, self)

lm = Transformers(path_to_llama, chat_template=Llama2ChatTemplate)
```

The first big change is the removal of the `Chat` and `Instruct` mixins,
and an introduction of a new `guidance._chat.ChatTemplate` class, which
handles the same responsibilities as those mixins used to.

Users can construct a subclass of `ChatTemplate` and pass it to models
with a new `chat_template` argument (defaulted to None).

The way this works for local models is to leverage the `chat_template`
property in `huggingface transformers` and `llamacpp`'s GGUF files. When
a user tries to load a model, guidance now follows the following order
of operations:

1. See if the user passed in a `ChatTemplate` -- if so, use that
directly.
2. If the user passed string "chat_template", set that as a
"template_str". If the user did not pass in anything, set the
"template_str" based on metadata from the huggingface.AutoTokenizer or
gguf metadata fields.
3. Check the `template_str` against a local cache in guidance which
maintains template converters for the most popular models on
huggingface/in llama.cpp. We index this cache based on the actual
chat_template string, so any model that uses one of these chat templates
-- even if it isn't explicitly listed in the documentation -- will
automatically load the right guidance class.
4. If we don't have anything in the cache, try to automatically convert
the jinja2 template into the new guidance._chat.ChatTemplate syntax.
Warn the user if we attempt this. [NOTE: This is not yet implemented and
may come in a future PR.]
5. Default to the `ChatML` syntax, with a warning to the user.

Currently this PR updates the following user facing `guidance.Models`
classes:

- Transformers (removing TransformersChat and the Llama/Mistral
subclasses)
- LlamaCpp (removing LlamaCppChat and Mistral subclasses)
- Anthropic

For now, `Anthropic` should be representative of how grammarless classes
will work. I wanted to start with OpenAI, but many other guidance.models
classes inherit from OpenAI, so I'm saving that for later. Also while I
was at it I upgraded the `Anthropic` class to use their latest SDK, so
`guidance.models.Anthropic` should now work with the latest Claude3
models.

# TODO

A decent amount left to do here. In no particular order...

1. Unrelated to this change, but guidance cannot properly handle
`llama3` or `phi-3`'s true tokenizers/chat templates. We need to fix
that independently.
2. Add better warnings to users when we fall back to using the ChatML
template.
3. Extend support through the rest of the guidance.models classes
(mostly just remote ones left, starting with OpenAI).
4. Write out more templates for popular models in the ChatTemplateCache,
and also add an alias system so that we can look up models in the cache
by common names (e.g. "llama3").
5. Add a deprecation warning to people trying to use very old models on
`Anthropic`.
6. Much more testing and documentation. We should, for example, add
documentation on how to import/initialize a new ChatTemplate and use it
for your own models.
7. Write the auto-converter from huggingface `jinja2` to guidance
ChatTemplate. A battery of unit tests here that compare against the
original `transformers.apply_chat_template` method would make this more
robust. Can be in a future PR as this is complex logic. A start to this
was attempted in #791 by
@ilmarinen, and we could eventually pull this in and expand its
coverage.
8. Probably get rid of the folders in `guidance.models.llama_cpp` and
`guidance.models.transformers` because we don't need to maintain a bunch
of subclasses for them anymore.

Would appreciate any and all feedback, particularly on the logical flow
and new user facing (simpler) API. @marcotcr @paulbkoch @slundberg
@riedgar-ms @hudson-ai

---------

Co-authored-by: Richard Edgar (Microsoft) <riedgar@microsoft.com>
Co-authored-by: Paul Koch <code@koch.ninja>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants