-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Instruct/Chat versions of models & introduce a new ChatTemplate API, fix Anthropic API #820
Conversation
…methods. Add some basic tests on caching functionality.
…ort all local models now.
…ew guidance ChatTemplate API to Anthropic models.
for ( | ||
byte | ||
) in ( | ||
node.keys() | ||
): # we update all the children since the parser knows the full mask | ||
|
||
# we update all the children since the parser knows the full mask | ||
for byte in node.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some of these changes are just undoing particularly egregious dissections that black
did...e.g. it's wild that this loop iterator was split into 5 lines!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
realizing now that it makes this particular PR much harder to parse though, apologies for that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm guessing that that was done by the long trailing comment. That seems to be something that black
is not overly keen on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you just shift the comment and re-run black
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I ended up manually moving trailing comments up a line, but it'd be good to see if there's an automatic way to make black handle it.
class Llama2ChatTemplate(ChatTemplate): | ||
# available_roles = ["system", "user", "assistant"] | ||
template_str = llama2_template | ||
|
||
def get_role_start(self, role_name): | ||
if role_name == "system": | ||
return "[INST] <<SYS>>\n" | ||
elif role_name == "user": | ||
return "<s>[INST]" | ||
elif role_name == "assistant": | ||
return " " | ||
else: | ||
raise UnsupportedRoleException(role_name, self) | ||
|
||
def get_role_end(self, role_name=None): | ||
if role_name == "system": | ||
return "\n<</SYS>" | ||
elif role_name == "user": | ||
return " [/INST]" | ||
elif role_name == "assistant": | ||
return "</s>" | ||
else: | ||
raise UnsupportedRoleException(role_name, self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this template is likely an oversimplification of the hf template string, I'll need to debug more and extend this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to add the tests you develop during your debugging.
Hmm, test failures are related to needing to authenticate on huggingface to load some model tokenizers. I added these new tests in to check the ChatTemplateCache. We could disable these tests, or (better) figure out a way to pass a huggingface auth token here....
@riedgar-ms thoughts here? |
self._cache = {} | ||
|
||
def __getitem__(self, key): | ||
key_compact = key.replace(" ", "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point: will this collapse multiple consecutive spaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nah it's a good catch, I'm actually not sure we need to be doing this. I didn't want minor differences in jinja formats (which I believe are whitespace agnostic for parts of them) to cause different mappings in the cache, but maybe there are places where an extra space is actually a meaningful difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was a question I asked before I caught sight of the actual keys you were using....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the changes to Anthropic
, does that mean we now have access to API keys which we can use in the builds?
for ( | ||
byte | ||
) in ( | ||
node.keys() | ||
): # we update all the children since the parser knows the full mask | ||
|
||
# we update all the children since the parser knows the full mask | ||
for byte in node.keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you just shift the comment and re-run black
?
I just made a personal account and paid for $20 of credits out of pocket to test it...not sure it's wise to depend on that for our CI/CD (especially given how frequently they run :| ). |
I'm quite happy to spend your money..... |
You can add the HF auth token to the repo as a secret (like we did for the AzureAI studio secrets), and then access it as an environment variable within the test. I believe I have an |
Nice thanks, I changed it to use the env variable and your helper function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have suggested a way of getting some of those Phi3 tests running in the gates
|
||
lm += lm.get_role_start(role_name, **kwargs) | ||
|
||
# TODO [HN]: Temporary change while I instrument chat_template in transformers only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the case (and below)?
|
||
tokenizer = transformers.AutoTokenizer.from_pretrained(model_id, token=hf_token) | ||
model_chat_template = tokenizer.chat_template | ||
if should_pass: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would separating things out and using pytest's XFAIL
be better here?
api_key="blah", | ||
) | ||
assert isinstance(initialized_model, model_class) | ||
# This is all redundant with the class unification |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better deleted than commented out
tests/models/test_transformers.py
Outdated
# directly passing in newlines next to special tokens for a tokenizer that does rstrip on those tokens | ||
# (like phi-3) will cause a tokenization mismatch issue. | ||
# We're leaving this test in so that we can reliably reproduce and debug this in the future. | ||
with pytest.raises(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please can you assert something about the exception?
tests/models/test_transformers.py
Outdated
assert "5" in lm["five"] | ||
|
||
|
||
@pytest.mark.skip("Don't overload the build machines") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm getting nervous about the number of tests added which are also skipped. Can you do something like:
guidance/tests/models/test_llama_cpp.py
Lines 10 to 15 in 13270bf
@pytest.fixture(scope="module") | |
def llamacpp_model(selected_model, selected_model_name): | |
if selected_model_name in ["hfllama7b", "hfllama_7b_gpu"]: | |
return selected_model | |
else: | |
pytest.skip("Requires Llama-Cpp model") |
So that these tests will run whenever there's a Phi3 model active in selected_model
?
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #820 +/- ##
==========================================
+ Coverage 57.02% 57.34% +0.32%
==========================================
Files 57 56 -1
Lines 4193 4194 +1
==========================================
+ Hits 2391 2405 +14
+ Misses 1802 1789 -13 ☔ View full report in Codecov by Sentry. |
@paulbkoch I think that everything is going to pass eventually on this. Per the discussion thread, I'll have to see about converting some more models over to the new paradigm |
…emplate on remote models
This PR should significantly reduce the number of user-facing classes we have in Guidance, and reduce subtle bugs introduced by using a mis-specified Chat Template (models currently silently default to the ChatML syntax, which many of the latest models don't adhere to). It should also make it easier for users to add new models to guidance, either via PR or in their own codebases.
Before:
After:
If you're using a rare model and the auto import doesn't automatically work...
After pt2:
or, in the worst case for maximal robustness and customizability:
The first big change is the removal of the
Chat
andInstruct
mixins, and an introduction of a newguidance._chat.ChatTemplate
class, which handles the same responsibilities as those mixins used to.Users can construct a subclass of
ChatTemplate
and pass it to models with a newchat_template
argument (defaulted to None).The way this works for local models is to leverage the
chat_template
property inhuggingface transformers
andllamacpp
's GGUF files. When a user tries to load a model, guidance now follows the following order of operations:ChatTemplate
-- if so, use that directly.template_str
against a local cache in guidance which maintains template converters for the most popular models on huggingface/in llama.cpp. We index this cache based on the actual chat_template string, so any model that uses one of these chat templates -- even if it isn't explicitly listed in the documentation -- will automatically load the right guidance class.ChatML
syntax, with a warning to the user.Currently this PR updates the following user facing
guidance.Models
classes:For now,
Anthropic
should be representative of how grammarless classes will work. I wanted to start with OpenAI, but many other guidance.models classes inherit from OpenAI, so I'm saving that for later. Also while I was at it I upgraded theAnthropic
class to use their latest SDK, soguidance.models.Anthropic
should now work with the latest Claude3 models.TODO
A decent amount left to do here. In no particular order...
Anthropic
.jinja2
to guidance ChatTemplate. A battery of unit tests here that compare against the originaltransformers.apply_chat_template
method would make this more robust. Can be in a future PR as this is complex logic. A start to this was attempted in Add TransformersChat code to figure out correct role start and end tokens #791 by @ilmarinen, and we could eventually pull this in and expand its coverage.guidance.models.llama_cpp
andguidance.models.transformers
because we don't need to maintain a bunch of subclasses for them anymore.Would appreciate any and all feedback, particularly on the logical flow and new user facing (simpler) API. @marcotcr @paulbkoch @slundberg @riedgar-ms @hudson-ai