Fix and test cases for unstable tokenization schemes like phi3 #830

Harsha-Nori · 2024-05-16T08:02:35Z

Some models like phi-3 have unstable tokenizers that aren't reversible in select circumstances. This can cause stability issues with guidance's attempt to stay on distribution and retokenize before prompt inputs. I don't believe we have a choice but to skip this step under malformed tokenizers.

This isn't really in a merge-ready state, but is a bandaid fix and some reproducible tests to start a discussion. If we expect this to be a trend, we should formalize the exception process, but if it's a small number of models, the hack in TransformersEngine and Engine classes may be the minimal code change we need.

(Following a discussion with @slundberg and @paulbkoch -- tagging for your awareness)

tests/models/test_transformers.py

riedgar-ms · 2024-05-16T20:12:47Z

tests/models/test_transformers.py

+        lm += "\n" + gen(name="five", max_tokens=1)
+        lm += "\n" + gen(name="six", max_tokens=1)
+
+    assert True


I assume that without this fix, something above will throw an exception?

Yes the newlines cause special issues with phi-3's tokenizer.

Can you add a comment to that effect?

slundberg

I think this is great. It is clearly a patch/hack, but it is really fixing a bug with phi-3 so I don't think there is a nice way around it looking a bit hacky.

…n't change position encoding (luckily).

guidance/models/_model.py

tests/models/test_transformers.py

riedgar-ms · 2024-05-17T16:47:53Z

tests/models/test_transformers.py

+    with assistant():
+        lm += gen(name="five", max_tokens=10)
+
+    assert "5" in lm["five"]


What is it about this test which is specific to Llama3? Shouldn't all models be able to pass?

Nothing in particular, I was just using it to debug llama3. removed the test

technically all the phi-3 tests should run on any model too, phi is just what they were designed for (as likely to result in errors due to tokenizer choices). the llama3 test was identical to phi tests so we should just generalize those in future PRs

riedgar-ms · 2024-05-17T16:48:44Z

tests/models/test_transformers.py

+        lm += "\n" + gen(name="five", max_tokens=1)
+        lm += "\n" + gen(name="six", max_tokens=1)
+
+    assert True


Can you add a comment to that effect?

riedgar-ms · 2024-05-17T16:49:27Z

tests/models/test_transformers.py

+    # Bad user tokens, but we should still generate /something/
+    lm += f"""<|use\n\nYou are a counting bot. Just keep counting numbers.<|end|><|assistant|>1,2,3,4,"""
+    lm += gen("five", max_tokens=10)
+    assert len(str(lm)) > 0


Surely you mean len(lm["five"]))>0 ?

nking-1 · 2024-05-17T17:09:16Z

tests/models/test_transformers.py

+def test_phi3_chat_unrolled(phi3_model: models.Model):
+    lm = phi3_model
+    # Manually convert the chat format into completions style
+    lm += f"""<|user|>\nYou are a counting bot. Just keep counting numbers.<|end|><|assistant|>1,2,3,4,"""


I think you need \n<|assistant|>\n here. Here's the template for reference:

{{ bos_token }}{% for message in messages %}{% if (message['role'] == 'user') %}{{'<|user|>' + '
' + message['content'] + '<|end|>' + '
' + '<|assistant|>' + '
'}}{% elif (message['role'] == 'assistant') %}{{message['content'] + '<|end|>' + '
'}}{% endif %}{% endfor %}

Phi-3 just has a really weird tokenizer. While the chat template inserts newlines, the tokenizer immediately strips them due to an rstrip they enabled on the special tokens. So functionally they are identical, and the rest of the tests with newlines/whitespaces everywhere are to check for this (and the instability it leads to). That said, this test should pass either way, but as-written is the closest thing to what gets passed to the model

updated to be safe!

Harsha-Nori · 2024-05-18T04:55:08Z

test failures appear unrelated.

Fix and test cases for unstable tokenization schemes like phi3

fb9b338

Harsha-Nori requested review from slundberg and paulbkoch May 16, 2024 08:02

riedgar-ms reviewed May 16, 2024

View reviewed changes

tests/models/test_transformers.py Outdated Show resolved Hide resolved

riedgar-ms reviewed May 16, 2024

View reviewed changes

adjust tests to use fixture

f1f5393

slundberg approved these changes May 16, 2024

View reviewed changes

Harsha-Nori changed the title ~~[Not for merging] Fix and test cases for unstable tokenization schemes like phi3~~ Fix and test cases for unstable tokenization schemes like phi3 May 16, 2024

Harsha-Nori added 5 commits May 16, 2024 14:36

removing a failure test that we now fixed :)

cd5629a

flaky test change

5c789ec

promote _chat.py to public chat.py

0a33c54

still retokenize to stay on distribution but disable check. this does…

9a88495

…n't change position encoding (luckily).

add fixtures back into transformers tests.

dc1fdf2

riedgar-ms reviewed May 17, 2024

View reviewed changes

nking-1 reviewed May 17, 2024

View reviewed changes

Updating transformers tests to add more documentation.

fe677d7

Harsha-Nori merged commit b488431 into main May 18, 2024
61 of 62 checks passed

Harsha-Nori deleted the unstabletokenizers branch May 18, 2024 04:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix and test cases for unstable tokenization schemes like phi3 #830

Fix and test cases for unstable tokenization schemes like phi3 #830

Harsha-Nori commented May 16, 2024 •

edited

Loading

riedgar-ms May 16, 2024

Harsha-Nori May 16, 2024

riedgar-ms May 17, 2024

slundberg left a comment

riedgar-ms May 17, 2024

Harsha-Nori May 18, 2024

Harsha-Nori May 18, 2024

riedgar-ms May 17, 2024

riedgar-ms May 17, 2024

Harsha-Nori May 18, 2024

nking-1 May 17, 2024

Harsha-Nori May 17, 2024

Harsha-Nori May 18, 2024

Harsha-Nori commented May 18, 2024

Fix and test cases for unstable tokenization schemes like phi3 #830

Fix and test cases for unstable tokenization schemes like phi3 #830

Conversation

Harsha-Nori commented May 16, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

slundberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Harsha-Nori commented May 18, 2024

Harsha-Nori commented May 16, 2024 •

edited

Loading