-
Notifications
You must be signed in to change notification settings - Fork 9.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detokenizer fixes #8039
Detokenizer fixes #8039
Commits on Jun 20, 2024
-
jaime-m-p committed
Jun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for eea8dfa - Browse repository at this point
Copy the full SHA eea8dfaView commit details -
Using llama_tokenize() in tests
jaime-m-p committedJun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for d779bab - Browse repository at this point
Copy the full SHA d779babView commit details -
Using llama_tokenize() in tests
jaime-m-p committedJun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 40a6660 - Browse repository at this point
Copy the full SHA 40a6660View commit details -
jaime-m-p committed
Jun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 16a7503 - Browse repository at this point
Copy the full SHA 16a7503View commit details -
minor: confusing hexadecimal codepoint
jaime-m-p committedJun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 03dbcc8 - Browse repository at this point
Copy the full SHA 03dbcc8View commit details -
Clean old known problematic codepoints
jaime-m-p committedJun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 071bf42 - Browse repository at this point
Copy the full SHA 071bf42View commit details -
Update bruteforce random tests
Add detokenizer checks New generator: ascii_lr_strip New generator: apostrophe Add more vocabs files
jaime-m-p committedJun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 064b35e - Browse repository at this point
Copy the full SHA 064b35eView commit details -
Fix add_space_prefix, set false by default
jaime-m-p committedJun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 503b753 - Browse repository at this point
Copy the full SHA 503b753View commit details -
jaime-m-p committed
Jun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 0cc6593 - Browse repository at this point
Copy the full SHA 0cc6593View commit details -
jaime-m-p committed
Jun 20, 2024 Configuration menu - View commit details
-
Copy full SHA for 6d233bc - Browse repository at this point
Copy the full SHA 6d233bcView commit details
Commits on Jun 21, 2024
-
Add tokenizer flag: clean_up_tokenization_spaces
jaime-m-p committedJun 21, 2024 Configuration menu - View commit details
-
Copy full SHA for b452e82 - Browse repository at this point
Copy the full SHA b452e82View commit details
Commits on Jun 23, 2024
-
tests: unexpected vocab type as test fail instead of error
Useful when automating tests: - If you don't know in advance the vocab type. - Differenciate other loading errors.
jaime-m-p committedJun 23, 2024 Configuration menu - View commit details
-
Copy full SHA for 9af762c - Browse repository at this point
Copy the full SHA 9af762cView commit details -
tests: gracefully exit threads
Using exit() is throwing random exceptions
jaime-m-p committedJun 23, 2024 Configuration menu - View commit details
-
Copy full SHA for 0cf2989 - Browse repository at this point
Copy the full SHA 0cf2989View commit details -
tets: skip unicode surrogaes and undefined
jaime-m-p committedJun 23, 2024 Configuration menu - View commit details
-
Copy full SHA for 38d54b3 - Browse repository at this point
Copy the full SHA 38d54b3View commit details -
UNKNOWN and CONTROL are 'special pieces'. Remove space after UNKNOWN and CONTROL. Refactor llama_token_to_piece().
jaime-m-p committedJun 23, 2024 Configuration menu - View commit details
-
Copy full SHA for 44c8648 - Browse repository at this point
Copy the full SHA 44c8648View commit details
Commits on Jun 24, 2024
-
Do not remove space when decoding special tokens
jaime-m-p committedJun 24, 2024 Configuration menu - View commit details
-
Copy full SHA for 9eb0fca - Browse repository at this point
Copy the full SHA 9eb0fcaView commit details -
style: remove trailing whitespace
jaime-m-p committedJun 24, 2024 Configuration menu - View commit details
-
Copy full SHA for 12e2c31 - Browse repository at this point
Copy the full SHA 12e2c31View commit details -
Bugfix: custom regexs splits undefined unicode codepoints
jaime-m-p committedJun 24, 2024 Configuration menu - View commit details
-
Copy full SHA for 95a0df5 - Browse repository at this point
Copy the full SHA 95a0df5View commit details -
Detokenize special tokens. Replace errors with '\uFFFD' when detokenizing to 'utf-8'. More edge cases. Better detokenization results check.
jaime-m-p committedJun 24, 2024 Configuration menu - View commit details
-
Copy full SHA for 4a28063 - Browse repository at this point
Copy the full SHA 4a28063View commit details
Commits on Jun 25, 2024
-
Symetric params for llama_tokenize() and llama_detokenize()
jaime-m-p committedJun 25, 2024 Configuration menu - View commit details
-
Copy full SHA for 9854a9c - Browse repository at this point
Copy the full SHA 9854a9cView commit details -
jaime-m-p committed
Jun 25, 2024 Configuration menu - View commit details
-
Copy full SHA for 107923c - Browse repository at this point
Copy the full SHA 107923cView commit details -
jaime-m-p committed
Jun 25, 2024 Configuration menu - View commit details
-
Copy full SHA for 68220fe - Browse repository at this point
Copy the full SHA 68220feView commit details
Commits on Jul 4, 2024
-
jaime-m-p committed
Jul 4, 2024 Configuration menu - View commit details
-
Copy full SHA for 98fc182 - Browse repository at this point
Copy the full SHA 98fc182View commit details -
Merge commit 'f8c4c073' into detokenizer
jaime-m-p committedJul 4, 2024 Configuration menu - View commit details
-
Copy full SHA for 8072089 - Browse repository at this point
Copy the full SHA 8072089View commit details -
'viking' detokenizer clean spaces
jaime-m-p committedJul 4, 2024 Configuration menu - View commit details
-
Copy full SHA for 8f5e1e0 - Browse repository at this point
Copy the full SHA 8f5e1e0View commit details -
jaime-m-p committed
Jul 4, 2024 Configuration menu - View commit details
-
Copy full SHA for 2f15019 - Browse repository at this point
Copy the full SHA 2f15019View commit details -
Update bruteforce test: header files location
jaime-m-p committedJul 4, 2024 Configuration menu - View commit details
-
Copy full SHA for 11ac641 - Browse repository at this point
Copy the full SHA 11ac641View commit details -
Update bruteforce test: add more models
jaime-m-p committedJul 4, 2024 Configuration menu - View commit details
-
Copy full SHA for 4db8c0d - Browse repository at this point
Copy the full SHA 4db8c0dView commit details -
jaime-m-p committed
Jul 4, 2024 Configuration menu - View commit details
-
Copy full SHA for 906476f - Browse repository at this point
Copy the full SHA 906476fView commit details
Commits on Jul 5, 2024
-
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for 0137683 - Browse repository at this point
Copy the full SHA 0137683View commit details