-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MPT-30B] OutOfMemoryError: CUDA out of memory #372
Comments
does vLLM support pytorch dataparallel: |
@mspronesti Can you try distributed inference following this guide? https://vllm.readthedocs.io/en/latest/serving/distributed_serving.html |
i ran into same problem, and fix it by using LLM(model="",tokenizer_mode="slow") |
@zhuohan123 thanks for your quick reply. Installing ray and setting RayActorError: The actor died because of an error raised in its creation task, ray::Worker.__init__() (pid=22291, ip=172.16.94.76, actor_id=9773ec3febd4775b0cb0bed101000000, repr=<vllm.worker.worker.Worker object at 0x7f4511af7e80>)
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/vllm/worker/worker.py", line 40, in __init__
_init_distributed_environment(parallel_config, rank,
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/vllm/worker/worker.py", line 307, in _init_distributed_environment
torch.distributed.all_reduce(torch.zeros(1).cuda())
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1451, in wrapper
return func(*args, **kwargs)
File "/home/ec2-user/anaconda3/envs/pytorch_p310/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1700, in all_reduce
work = default_pg.allreduce([tensor], opts)
torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
ncclInternalError: Internal check failed.
Last error:
Cuda failure 'peer access is not supported between these two devices' @Joejoequ setting ValueError: Tokenizer class GPTNeoXTokenizer does not exist or is not currently imported. |
Same for me. Actually, setting |
Also got the same |
Same here. I am able to load LLaMa 65-b locally using this notebook:-https://twitter.com/m_ryabinin/status/1679217067310960645?s=20 But I am unable to run it on vLLMs |
@mspronesti , setting this os.environ["NCCL_IGNORE_DISABLED_P2P"] = '1', should resolve the issue |
Closing as stale. Original issue was due to insufficient GPU memory on single GPU, should be solvable using tensor parallel as mentioned by @zhuohan123. |
Hi vllm dev team,
is vllm supposed to work with MPT-30B ? I tried loading it on AWS SageMaker using a
ml.g5.12xlarge
and even aml.g5.48xlarge
instance.However in both cases I run into this error:
The text was updated successfully, but these errors were encountered: