Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add max_context_length to TextEncode node for LLM max tokens - experimental use #289

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fblissjr
Copy link

@fblissjr fblissjr commented Jan 19, 2025

Add max_context_length to TextEncode node for LLM max tokens - experimental use

This PR adds max_context_length as a parameter (default of 256 as before) to enable additional tokens to be used for the LLM input prompt + prompt template + special tokens.

  • max_context_length is now a node input with default of 256: You can set it in the DownloadAndLoadHyVideoTextEncoder node.
  • Better handling of short max_context_length:
    • Added some debugs and a warning if you try to set max_context_length less than or equal to the crop_start value.
  • **Some debug logging to sanity check the input prompt, context length, shapes, attention mask, etc.

Testing Performed:

  • max_context_length = 256, Long Prompt: Works as it did before, prompt gets truncated, no errors.
  • max_context_length = 5, Long Prompt: Used to error out, now it runs with a warning and truncates.
  • max_context_length = 512, Long Prompt > 512 tokens: Works, prompt gets truncated (with a log message).
  • max_context_length = 2048, Long Prompt of ~1700 tokens: Works, prompt goes through fully.

In short: This makes the text encoder more flexible and easier to experiment with for longer input prompts and opens the door for few-shot examples. It's highly likely (with near certainty) that the model was trained with shorter inputs, likely right around the 256 default it was before. That said, autoregressive LLMs are always surprising, and it's very possible to end up in the same embedding space with a longer prompt with examples as you would have with a shorter one. Reminder changing this from the 256 default is experimental - it's locked at the max context window from the model config (8192) to avoid accidental way-too-high ones. It might be unstable with larger values, so test with caution. VRAM usage will increase if you increase it - only for a short burst, but if you're right on the edge of maxing out your VRAM, leave a little room before changing and only incrementally increase. Hopefully will see some cool possibilities from this. Thanks to Kijai as always for making a fantastic project for us all to experiment with.

edit: removed token counting due to time constrains / complexity + bug with custom prompt templates.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant