Skip to content

Possibility of running text encoder on CPU? #23

Closed
@congdm

Description

@congdm

From inference.py I can see that the T5Encoder is loaded into GPU with float16 format:
t5_encoder = T5TextEmbedder().to(pipe.device, dtype=torch.float16)
And during the inference step, the output embeddings from T5Encoder are converted into the same format as SD pipeline:
prompt_embeds = t5_encoder(prompt, max_length=128).to(pipe.device, pipe.dtype)

So in order to save VRAM, I tried experimenting with let the T5 model stay on CPU, by changing the load model line:
t5_encoder = T5TextEmbedder()

It ran fine, however the result was totally different, the prompt wasn't working well. So it turns out that running the model in FP32, then converting the embeddings to FP16 is not the same thing as running the model directly in FP16.
Also when I tried loading pipeline in BF16, but still keeping the text encoder in FP16, the result was also different too.

So in order to use this ella-sd1.5-tsc-t5xl model properly, both the SD model and the T5Encoder must be in FP16, am I understanding right?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions