-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when training #44
Comments
The easiest route would be to use wsl Ubuntu distro for windows. You can do that by using PowerShell. ` USER root RUN apt update && apt install -y --no-install-recommends RUN python3 -m pip install git+https://github.com/enhuiz/vall-e.git VOLUME /data/models ENTRYPOINT ["/bin/bash", "--login", "-c"] |
Has anyone tried this on M1 Mac? I had similar error as described by the OP. I will try later this week or next, to run this docker container and update the result here.. |
I cannot manage to make this work on windows.
Running the following command
python -m vall_e.train yaml=config/test/ar.yml
First, I was getting error
RuntimeError: Distributed package doesn't have NCCL built in
Seems like NCCL backend of pytorch distributed pacakages is not working on windows.
Found out a workaround to use
gloo
backend and added the following code indata.py
:Then it returns the following error:
RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
This is where I hit the brickwall
Platform: windows 11
Python: 3.10.9
torch: 1.11.0+cu113
The text was updated successfully, but these errors were encountered: