RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #131

kosmels · 2024-06-06T14:17:59Z

Hello,

I am trying to train on custom dataset (where I have already prepared 1 - 1 image pairs and my seeds.json looks like this [["0000000", ["0"]], ["0000001", ["1"]], ... ) with 3x NVIDIA TITAN RTX 24GB. Initialization of all the models works fine but during validation sanity check I am getting this error:

...
[rank0]:   File "/code/instruct-pix2pix/./stable_diffusion/ldm/models/diffusion/ddpm_edit.py", line 892, in forward
[rank0]:     return self.p_losses(x, c, t, *args, **kwargs)
[rank0]:   File "/code/instruct-pix2pix/./stable_diffusion/ldm/models/diffusion/ddpm_edit.py", line 1043, in p_losses
[rank0]:     logvar_t = self.logvar[t].to(self.device)
[rank0]: RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)

Do you know where it can come from? I did not change anything in source code. Just prepared the data and updated paths in train config.

Thanks in advance!

The text was updated successfully, but these errors were encountered:

kosmels · 2024-06-07T09:06:33Z

UPDATE: Solved here CompVis/stable-diffusion#851

After few steps of debugging I have found out that self.logvar has device==cpu (initialized here https://github.com/timothybrooks/instruct-pix2pix/blob/main/stable_diffusion/ldm/models/diffusion/ddpm_edit.py#L123) but t has device==cuda.

I made a small workaround and moved t to cpu during this indexing:

logvar_t = self.logvar[t.to(self.logvar.device)].to(self.device)

but I am not sure if this is ok. If yes, self.logvar should be somewhere moved to cuda, because it seems that during initialization self.device==cpu.

Another question of course is why am I getting this error at all? You did not have this type of issue during development?

Evangade · 2024-07-04T06:11:49Z

Same Problem, thank you very much!

LitaoLiu01 · 2024-08-01T09:00:07Z

Actually, I may know the reason, I think this error may not happen in specified cuda env, such as cu113. I reproduced ip2p last two month based on cu113, there is no such error, but recently I use H100 to train the data, and must use cu118+ in H100 gpus, and the error comes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #131

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #131

kosmels commented Jun 6, 2024

kosmels commented Jun 7, 2024 •

edited

Loading

Evangade commented Jul 4, 2024

LitaoLiu01 commented Aug 1, 2024

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #131

RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu) #131

Comments

kosmels commented Jun 6, 2024

kosmels commented Jun 7, 2024 • edited Loading

Evangade commented Jul 4, 2024

LitaoLiu01 commented Aug 1, 2024

kosmels commented Jun 7, 2024 •

edited

Loading