Add max_context_length to TextEncode node for LLM max tokens - experimental use #289

fblissjr · 2025-01-19T23:21:37Z

Add max_context_length to TextEncode node for LLM max tokens - experimental use

This PR adds max_context_length as a parameter (default of 256 as before) to enable additional tokens to be used for the LLM input prompt + prompt template + special tokens.

max_context_length is now a node input with default of 256: You can set it in the DownloadAndLoadHyVideoTextEncoder node.
- Defaults to 256, maxes out at 8192 (see: https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers/blob/main/config.json).
- There's a tooltip with warnings of it being experimental and its impact on memory usage + possible performance issues in generation - but the memory impacts should be low given it's only doing a single inference
Better handling of short max_context_length:
- Added some debugs and a warning if you try to set max_context_length less than or equal to the crop_start value.
**Some debug logging to sanity check the input prompt, context length, shapes, attention mask, etc.

Testing Performed:

max_context_length = 256, Long Prompt: Works as it did before, prompt gets truncated, no errors.
max_context_length = 5, Long Prompt: Used to error out, now it runs with a warning and truncates.
max_context_length = 512, Long Prompt > 512 tokens: Works, prompt gets truncated (with a log message).
max_context_length = 2048, Long Prompt of ~1700 tokens: Works, prompt goes through fully.

In short: This makes the text encoder more flexible and easier to experiment with for longer input prompts and opens the door for few-shot examples. It's highly likely (with near certainty) that the model was trained with shorter inputs, likely right around the 256 default it was before. That said, autoregressive LLMs are always surprising, and it's very possible to end up in the same embedding space with a longer prompt with examples as you would have with a shorter one. Reminder changing this from the 256 default is experimental - it's locked at the max context window from the model config (8192) to avoid accidental way-too-high ones. It might be unstable with larger values, so test with caution. VRAM usage will increase if you increase it - only for a short burst, but if you're right on the edge of maxing out your VRAM, leave a little room before changing and only incrementally increase. Hopefully will see some cool possibilities from this. Thanks to Kijai as always for making a fantastic project for us all to experiment with.

edit: removed token counting due to time constrains / complexity + bug with custom prompt templates.

…rimental use

fblissjr added 2 commits January 19, 2025 17:07

Added max_context_length to TextEncode node for LLM max tokens - expe…

e0483fb

…rimental use

Removed token counting for now due to added complexity

f38bc59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add max_context_length to TextEncode node for LLM max tokens - experimental use #289

Add max_context_length to TextEncode node for LLM max tokens - experimental use #289

fblissjr commented Jan 19, 2025 •

edited

Loading

Add max_context_length to TextEncode node for LLM max tokens - experimental use #289

Are you sure you want to change the base?

Add max_context_length to TextEncode node for LLM max tokens - experimental use #289

Conversation

fblissjr commented Jan 19, 2025 • edited Loading

Add max_context_length to TextEncode node for LLM max tokens - experimental use

fblissjr commented Jan 19, 2025 •

edited

Loading