Skip to content

[MPT-30B] OutOfMemoryError: CUDA out of memory #372

Closed
@mspronesti

Description

@mspronesti

Hi vllm dev team,
is vllm supposed to work with MPT-30B ? I tried loading it on AWS SageMaker using a ml.g5.12xlarge and even a ml.g5.48xlarge instance.

from vllm import LLM, SamplingParams

llm = LLM(model="mosaicml/mpt-30b")

However in both cases I run into this error:

OutOfMemoryError: CUDA out of memory. Tried to allocate 294.00 MiB (GPU 0; 22.19 GiB total capacity; 21.35 GiB already allocated; 46.50 MiB free; 21.35 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions