Skip to content

Commit

Permalink
[bug] Use PreTrainedTokenizerFast.from_pretrained instead of `PreTr…
Browse files Browse the repository at this point in the history
…ainedTokenizerFast.__init__` (ServiceNow#75)
  • Loading branch information
tscholak authored Dec 2, 2024
1 parent 484e2f6 commit 7d6c2cc
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion fast_llm/data/tokenizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ class Tokenizer:

def __init__(self, config: TokenizerConfig):
log_main_rank(f"> loading tokenizer from {config.path} ...")
self.tokenizer = PreTrainedTokenizerFast(tokenizer_file=config.path, errors="replace", max_len=None)
self.tokenizer = PreTrainedTokenizerFast.from_pretrained(
pretrained_model_name_or_path=config.path, errors="replace", max_len=None
)
if self.tokenizer.eos_token_id is None:
raise ValueError("Tokenizer does not have an EOS token.")
self.eod_id = self.tokenizer.eos_token_id
Expand Down

0 comments on commit 7d6c2cc

Please sign in to comment.