Skip to content

Conversation

@maleksan85
Copy link

for cards with small memory and models with big gemms, there might no be extra memory to convert outputs to f32 from fp16 for instance. Every conversion is happening as copy. So triggers OOM on Navi. This change fixes the problem

@maleksan85 maleksan85 requested a review from gshtras August 14, 2024 18:06
@maleksan85 maleksan85 self-assigned this Aug 14, 2024
@maleksan85 maleksan85 merged commit 4132cbe into main Aug 14, 2024
@maleksan85 maleksan85 deleted the gemm_tunner_memory_usage_umprovement branch August 16, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants