Closed
Description
Describe the bug
(venv) C:\ai1\LTX-Video>python inference.py
Traceback (most recent call last):
File "C:\ai1\LTX-Video\inference.py", line 23, in <module>
text_encoder = T5EncoderModel.from_pretrained(
File "C:\ai1\LTX-Video\venv\lib\site-packages\transformers\modeling_utils.py", line 3779, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory Lightricks/LTX-Video.
(venv) C:\ai1\LTX-Video>python inference.py
Traceback (most recent call last):
File "C:\ai1\LTX-Video\inference.py", line 23, in <module>
text_encoder = T5EncoderModel.from_pretrained(
File "C:\ai1\LTX-Video\venv\lib\site-packages\transformers\modeling_utils.py", line 3779, in from_pretrained
raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory Lightricks/LTX-Video.
Reproduction
Install diffusers from source and use the code mentioned here
https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx_video
Logs
C:\ai1\LTX-Video\Lightricks>tree /F
Folder PATH listing for volume Windows-SSD
Volume serial number is CE9F-A6AE
C:.
└───LTX-Video
│ ltx-video-2b-v0.9.1.safetensors
│ model_index.json
│
├───text_encoder
│ config.json
│ model-00001-of-00004.safetensors
│ model-00002-of-00004.safetensors
│ model-00003-of-00004.safetensors
│ model-00004-of-00004.safetensors
│
├───tokenizer
│ added_tokens.json
│ special_tokens_map.json
│ spiece.model
│ tokenizer_config.json
│
├───transformer
│ config.json
│ diffusion_pytorch_model-00001-of-00002.safetensors
│ diffusion_pytorch_model-00002-of-00002.safetensors
│ diffusion_pytorch_model.safetensors.index.json
│
└───vae
config.json
diffusion_pytorch_model.safetensors
System Info
Windows 11/ Python 3.10.11
(venv) C:\ai1\LTX-Video>pip list
Package Version
------------------ ------------
accelerate 1.2.1
certifi 2024.12.14
charset-normalizer 3.4.0
colorama 0.4.6
diffusers 0.32.0.dev0
einops 0.8.0
filelock 3.16.1
fsspec 2024.12.0
gguf 0.13.0
huggingface-hub 0.25.2
idna 3.10
importlib_metadata 8.5.0
Jinja2 3.1.4
MarkupSafe 3.0.2
mpmath 1.3.0
networkx 3.4.2
numpy 2.2.0
packaging 24.2
pillow 11.0.0
pip 23.0.1
psutil 6.1.1
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.3
safetensors 0.4.5
sentencepiece 0.2.0
setuptools 65.5.0
sympy 1.13.1
tokenizers 0.21.0
torch 2.5.1+cu124
torchvision 0.20.1+cu124
tqdm 4.67.1
transformers 4.47.1
typing_extensions 4.12.2
urllib3 2.2.3
wheel 0.45.1
zipp 3.21.0
Who can help?
import torch
from diffusers import LTXPipeline
from transformers import T5EncoderModel, T5Tokenizer
single_file_url = "Lightricks/LTX-Video/ltx-video-2b-v0.9.1.safetensors"
text_encoder = T5EncoderModel.from_pretrained(
"Lightricks/LTX-Video", subfolder="text_encoder", torch_dtype=torch.bfloat16
)
tokenizer = T5Tokenizer.from_pretrained(
"Lightricks/LTX-Video", subfolder="tokenizer", torch_dtype=torch.bfloat16
)
pipe = LTXPipeline.from_single_file(
single_file_url, text_encoder=text_encoder, tokenizer=tokenizer, torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
prompt = "A woman with long brown hair and light skin smiles at another woman with long blonde hair. The woman with brown hair wears a black jacket and has a small, barely noticeable mole on her right cheek. The camera angle is a close-up, focused on the woman with brown hair's face. The lighting is warm and natural, likely from the setting sun, casting a soft glow on the scene. The scene appears to be real-life footage"
negative_prompt = "worst quality, inconsistent motion, blurry, jittery, distorted"
video = pipe(
prompt=prompt,
negative_prompt=negative_prompt,
width=704,
height=480,
num_frames=161,
num_inference_steps=50,
).frames[0]
export_to_video(video, "output_ltx.mp4", fps=24)