Skip to content

Reduce memory overhead of CUDA graphs #205

Closed
@pommedeterresautee

Description

@pommedeterresautee

Current CUDA graph wrapper create a static input and static output per call to the model.
In decoder, it may create a bunch of tensors, we may want to limit those creations and try to recycle them.

Metadata

Metadata

Labels

performancemake things faster, always

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions