-
Notifications
You must be signed in to change notification settings - Fork 34
Closed
Labels
BugSomething isn't workingSomething isn't working
Description
I/O tensors are allocated in the InitNetwork function and never deallocated (hence, they basically have an infinite lifetime). I/O tensor dimensions are known at compile time, and we should allocate and deallocate them
For the first time, I/we should display them in the memory allocation visualization and make sure to raise an error if we go above the memory capacity limit. Then, we should perform static memory allocation for them.
This is especially an issue for Llama where the KV cache is considered as an input and then an output.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
BugSomething isn't workingSomething isn't working