From 40d8695e2a99b921a45a231b4487fb8b12ee0413 Mon Sep 17 00:00:00 2001 From: apehex Date: Mon, 2 Sep 2024 20:38:09 +0200 Subject: [PATCH] Rename the screenshots to differentiate from future images --- .../tiktoken/{gpt-4o.png => french.gpt4o.png} | Bin articles/tokun.md | 2 +- 2 files changed, 1 insertion(+), 1 deletion(-) rename articles/.images/tiktoken/{gpt-4o.png => french.gpt4o.png} (100%) diff --git a/articles/.images/tiktoken/gpt-4o.png b/articles/.images/tiktoken/french.gpt4o.png similarity index 100% rename from articles/.images/tiktoken/gpt-4o.png rename to articles/.images/tiktoken/french.gpt4o.png diff --git a/articles/tokun.md b/articles/tokun.md index 9788c07..e0033f7 100644 --- a/articles/tokun.md +++ b/articles/tokun.md @@ -55,7 +55,7 @@ This process has several stages: encoding, tokenization and embedding. For now, consider the [end result from the tokenizer `o200k`][tiktokenizer-o200k] (used in `GPT-4o`): - + The sentence is split into chunks called "tokens", which have a 1:1 match with an ID. Each tokenizer has its own vocabulary and `o200k` contains 200k identified tokens.