Skip to content

Commit

Permalink
Rename the screenshots to differentiate from future images
Browse files Browse the repository at this point in the history
  • Loading branch information
apehex committed Sep 2, 2024
1 parent e6400a7 commit 40d8695
Show file tree
Hide file tree
Showing 2 changed files with 1 addition and 1 deletion.
File renamed without changes
2 changes: 1 addition & 1 deletion articles/tokun.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ This process has several stages: encoding, tokenization and embedding.

For now, consider the [end result from the tokenizer `o200k`][tiktokenizer-o200k] (used in `GPT-4o`):

<img src=".images/tiktoken/gpt-4o.png" width="75%" style="margin: auto;"/>
<img src=".images/tiktoken/french.gpt4o.png" width="75%" style="margin: auto;"/>

The sentence is split into chunks called "tokens", which have a 1:1 match with an ID.
Each tokenizer has its own vocabulary and `o200k` contains 200k identified tokens.
Expand Down

0 comments on commit 40d8695

Please sign in to comment.