diff --git a/articles/.images/tiktoken/gpt-4o.png b/articles/.images/tiktoken/french.gpt4o.png similarity index 100% rename from articles/.images/tiktoken/gpt-4o.png rename to articles/.images/tiktoken/french.gpt4o.png diff --git a/articles/tokun.md b/articles/tokun.md index 9788c07..e0033f7 100644 --- a/articles/tokun.md +++ b/articles/tokun.md @@ -55,7 +55,7 @@ This process has several stages: encoding, tokenization and embedding. For now, consider the [end result from the tokenizer `o200k`][tiktokenizer-o200k] (used in `GPT-4o`): - + The sentence is split into chunks called "tokens", which have a 1:1 match with an ID. Each tokenizer has its own vocabulary and `o200k` contains 200k identified tokens.