-
Notifications
You must be signed in to change notification settings - Fork 22
Remove tiktoken from the codebase #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍(Review updated until commit c5dbcbf)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Certainly! Let's optimize the function.
- The major cost here is calculating `int(0.75 * len(s))`.
- Multiplying floating-point numbers and casting to int is a minor but measurable cost in a tight loop.
- Instead, use integer multiplication and floor division to avoid float arithmetic. (i.e., `len(s) * 3 // 4`)
- Slicing cost is minimal and can't be improved.
- No need to further optimize or use extra imports as slicing is already a C-level operation.
**Optimized Code:**
This approach completely eliminates the float multiplication and the int cast, making it faster, especially when called many times. The semantics are unchanged for all practical string lengths.
Here’s an optimized version of your program. The current code repeatedly computes `int(0.75 * len(s))` and slices the string, which is already fast, but can be micro-optimized. - Avoid the float multiplication and casting by directly calculating `(len(s) * 3) // 4`, which is faster and avoids possible floating-point artifacts. Rewritten code. This version removes the float operation and is slightly faster, especially for large strings. The return value is exactly the same as before.
|
Persistent review updated to latest commit c5dbcbf |
PR Code Suggestions ✨No code suggestions found for the PR. |
|
the logic to encode_str into a half length str and then do the Write a function |
PR Type
Bug fix, Enhancement
Description
Added encode_str stub for token calculation
Removed tiktoken import and usage
Updated token counts to use encode_str
Ensures code runs without tiktoken dependency
Changes walkthrough 📝
code_utils.py
Add encode_str stub functioncodeflash/code_utils/code_utils.py
encode_strstub returning half-length stringcode_context_extractor.py
Replace tiktoken calls with encode_strcodeflash/context/code_context_extractor.py
tiktokenimport and tokenizer setuptokenizer.encodecalls withencode_strencode_str