The CogVideoX1.5-5B-I2V of diffusers

### System Info / 系統信息

diffusers                 0.32.0.dev0
torch                     2.4.1+cu121
python                   3.10.14

### Information / 问题信息

- [X] The official example scripts / 官方的示例脚本
- [ ] My own modified scripts / 我自己修改的脚本和任务

### Reproduction / 复现过程


I tried using CogVideoX1.5-5B-I2V and CogVideoX-5B-I2V based on CogVideoXImageToVideoPipeline(diffusers). 
For CogVideoX-5B-I2V, width= 720, height = 480, num_frames = 49,  num_inference_steps = 50.
For CogVideoX1.5-5B-I2V,  width=1360, height=768, num_frames = 77, num_inference_steps = 50. 

The generated videos of CogVideoX-5B-I2V are good.
In the generated videos of CogVideoX1.5-5B-I2V, **the brightness of** the first few frames is inconsistent with the images, and the latter part of the video exhibits **blurriness and temporal inconsistency**.

The image:
![5281642-hd_1920_1080_30fps](https://github.com/user-attachments/assets/a490c609-8d00-4c31-8e5b-5055d893b58f)


The result of CogVideoX-5B-I2V:

https://github.com/user-attachments/assets/563b9d69-a706-47a1-9cc5-a6d6608456a6

The result of CogVideoX1.5-5B-I2V:

https://github.com/user-attachments/assets/77195766-75a4-4144-a867-53ad44c24e78




### Expected behavior / 期待表现

The brightness of videos generated byCogVideoX1.5-5B-I2V is consistent with the images.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The CogVideoX1.5-5B-I2V of diffusers #605

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The CogVideoX1.5-5B-I2V of diffusers #605

Description

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions