Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .github/ISSUE_TEMPLATE/bug_report.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,14 @@ body:
If you have code snippets, error messages, stack traces, please provide them here as well.
Please format your code correctly using code tags. See https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
Do not use screenshots, as they are difficult to read and (more importantly) do not allow others to copy and paste your code.

请提供能重现您遇到的问题的代码示例,最好是最小复现单元。
如果您有代码片段、错误信息、堆栈跟踪,也请在此提供。
请使用代码标签正确格式化您的代码。请参见 https://help.github.com/en/github/writing-on-github/creating-and-highlighting-code-blocks#syntax-highlighting
请勿使用截图,因为截图难以阅读,而且(更重要的是)不允许他人复制粘贴您的代码。
placeholder: |
Steps to reproduce the behavior/复现Bug的步骤:

1.
2.
3.
Expand All @@ -48,4 +48,4 @@ body:
required: true
attributes:
label: Expected behavior / 期待表现
description: "A clear and concise description of what you would expect to happen. /简单描述您期望发生的事情。"
description: "A clear and concise description of what you would expect to happen. /简单描述您期望发生的事情。"
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/feature-request.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,6 @@ body:
attributes:
label: Your contribution / 您的贡献
description: |

Your PR link or any other link you can help with.
您的PR链接或者其他您能提供帮助的链接。
您的PR链接或者其他您能提供帮助的链接。
28 changes: 28 additions & 0 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Contribution Guide

We welcome your contributions to this repository. To ensure elegant code style and better code quality, we have prepared the following contribution guidelines.

## What We Accept

+ This PR fixes a typo or improves the documentation (if this is the case, you may skip the other checks).
+ This PR fixes a specific issue — please reference the issue number in the PR description. Make sure your code strictly follows the coding standards below.
+ This PR introduces a new feature — please clearly explain the necessity and implementation of the feature. Make sure your code strictly follows the coding standards below.

## Code Style Guide

Good code style is an art. We have prepared a `pyproject.toml` and a `pre-commit` hook to enforce consistent code formatting across the project. You can clean up your code following the steps below:

1. Install the required dependencies:
```shell
pip install ruff pre-commit
```
2. Then, run the following command:
```shell
pre-commit run --all-files
```
If your code complies with the standards, you should not see any errors.

## Naming Conventions

- Please use **English** for naming; do not use Pinyin or other languages. All comments should also be in English.
- Follow **PEP8** naming conventions strictly, and use underscores to separate words. Avoid meaningless names such as `a`, `b`, `c`.
34 changes: 0 additions & 34 deletions .github/PULL_REQUEST_TEMPLATE/pr_template.md

This file was deleted.

27 changes: 27 additions & 0 deletions .github/workflows/python-lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Python Linting

on:
push:
branches: [main]
pull_request:
branches: [main]

jobs:
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.10'
cache: 'pip'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install pre-commit

- name: Run pre-commit
run: pre-commit run --all-files
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ logs/
.idea
output*
test*
img
img
19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.5
hooks:
- id: ruff
args: [--fix, --respect-gitignore, --config=pyproject.toml]
- id: ruff-format
args: [--config=pyproject.toml]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml
- id: check-case-conflict
- id: check-merge-conflict
- id: debug-statements
49 changes: 18 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@
</div>

<p align="center">
<a href="https://huggingface.co/spaces/THUDM-HF-SPACE/CogView4" target="_blank"> 🤗 HuggingFace Space</a>
<a href="https://modelscope.cn/studios/ZhipuAI/CogView4" target="_blank"> 🤖ModelScope Space</a>
<a href="https://huggingface.co/spaces/THUDM-HF-SPACE/CogView4" target="_blank"> 🤗 HuggingFace Space</a>
<a href="https://modelscope.cn/studios/ZhipuAI/CogView4" target="_blank"> 🤖ModelScope Space</a>
<a href="https://zhipuaishengchan.datasink.sensorsdata.cn/t/4z" target="_blank"> 🛠️ZhipuAI MaaS(Faster)</a>
<br>
<a href="resources/WECHAT.md" target="_blank"> 👋 WeChat Community</a> <a href="https://arxiv.org/abs/2403.05121" target="_blank">📚 CogView3 Paper</a>
Expand All @@ -19,7 +19,8 @@

## Project Updates

- 🔥🔥 ```2025/03/04```: We've adapted and open-sourced the [diffusers](https://github.com/huggingface/diffusers) version
- 🔥🔥 ```2025/03/24```: We are launching [CogKit](https://github.com/THUDM/CogKit), a powerful toolkit for fine-tuning and inference of the **CogView4** and **CogVideoX** series, allowing you to fully explore our multimodal generation models.
- ```2025/03/04```: We've adapted and open-sourced the [diffusers](https://github.com/huggingface/diffusers) version
of **CogView-4** model, which has 6B parameters, supports native Chinese input, and Chinese text-to-image generation.
You can try it [online](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView4).
- ```2024/10/13```: We've adapted and open-sourced the [diffusers](https://github.com/huggingface/diffusers) version of
Expand All @@ -31,9 +32,9 @@

## Project Plan

- [X] Diffusers workflow adaptation
- [ ] Cog series fine-tuning kits (coming soon)
- [ ] ControlNet models and training code
- [X] Diffusers workflow adaptation
- [X] Cog series fine-tuning kits (coming soon)
- [ ] ControlNet models and training code

## Community Contributions

Expand Down Expand Up @@ -160,7 +161,7 @@ python prompt_optimize.py --api_key "Zhipu AI API Key" --prompt {your prompt} --

### Inference Model

Run the model with `BF16` precision:
Run the model `CogView4-6B` with `BF16` precision:

```python
from diffusers import CogView4Pipeline
Expand All @@ -185,37 +186,23 @@ image = pipe(

image.save("cogview4.png")
```

For more inference code, please check:

1. For using `BNB int4` to load `text encoder` and complete inference code annotations,
check [here](inference/cli_demo_cogview4.py).
2. For using `TorchAO int8 or int4` to load `text encoder & transformer` and complete inference code annotations,
check [here](inference/cli_demo_cogview4_int8.py).
3. For setting up a `gradio` GUI DEMO, check [here](inference/gradio_web_demo.py).
## Installation
```
git clone https://github.com/THUDM/CogView4
cd CogView4
git clone https://huggingface.co/THUDM/CogView4-6B
pip install -r inference/requirements.txt
```
## Quickstart
12G VRAM
```
MODE=1 python inference/gradio_web_demo.py
```
24G VRAM 32G RAM
```
MODE=2 python inference/gradio_web_demo.py
```
24G VRAM 64G RAM
```
MODE=3 python inference/gradio_web_demo.py
```
48G VRAM 64G RAM
```
MODE=4 python inference/gradio_web_demo.py
```


## Fine-tuning

This repository does not contain fine-tuning code, but you can fine-tune using the following two approaches, including both LoRA and SFT:

1. [CogKit](https://github.com/THUDM/CogKit), our officially maintained system-level fine-tuning framework that supports CogView4 and CogVideoX.
2. [finetrainers](https://github.com/a-r-r-o-w/finetrainers), a low-memory solution that enables fine-tuning on a single RTX 4090.
3. If you want to train ControlNet models directly, you can refer to the [training code](https://github.com/huggingface/diffusers/tree/main/examples/cogview4-control) and train your own models.

## License

Expand Down
69 changes: 27 additions & 42 deletions README_ja.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@

</div>
<p align="center">
<a href="https://huggingface.co/spaces/THUDM-HF-SPACE/CogView4" target="_blank"> 🤗 HuggingFace Space</a>
<a href="https://modelscope.cn/studios/ZhipuAI/CogView4" target="_blank"> 🤖ModelScope Space</a>
<a href="https://huggingface.co/spaces/THUDM-HF-SPACE/CogView4" target="_blank"> 🤗 HuggingFace Space</a>
<a href="https://modelscope.cn/studios/ZhipuAI/CogView4" target="_blank"> 🤖ModelScope Space</a>
<a href="https://zhipuaishengchan.datasink.sensorsdata.cn/t/4z" target="_blank"> 🛠️ZhipuAI MaaS(Faster)</a>
<br>
<a href="resources/WECHAT.md" target="_blank"> 👋 WeChat Community</a> <a href="https://arxiv.org/abs/2403.05121" target="_blank">📚 CogView3 Paper</a>
Expand All @@ -20,7 +20,9 @@

## プロジェクトの更新

- 🔥🔥 ```2025/03/04```: [diffusers](https://github.com/huggingface/diffusers) バージョンの **CogView-4**
- 🔥🔥 ```2025/03/24```: [CogView4-6B-Control](https://huggingface.co/THUDM/CogView4-6B-Control) モデルをリリースしました![トレーニングコード](https://github.com/huggingface/diffusers/tree/main/examples/cogview4-control) を使用して、自身でトレーニングすることも可能です。
さらに、**CogView4** および **CogVideoX** シリーズのファインチューニングと推論を簡単に行えるツールキット [CogKit](https://github.com/THUDM/CogKit) も公開しました。私たちのマルチモーダル生成モデルを存分に活用してください!
- ```2025/03/04```: [diffusers](https://github.com/huggingface/diffusers) バージョンの **CogView-4**
モデルを適応し、オープンソース化しました。このモデルは6Bのパラメータを持ち、ネイティブの中国語入力と中国語のテキストから画像生成をサポートしています。オンラインで試すことができます [こちら](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView4)。
- ```2024/10/13```: [diffusers](https://github.com/huggingface/diffusers) バージョンの **CogView-3Plus-3B**
モデルを適応し、オープンソース化しました。オンラインで試すことができます [こちら](https://huggingface.co/spaces/THUDM-HF-SPACE/CogView3-Plus-3B-Space)。
Expand All @@ -31,7 +33,7 @@
## プロジェクト計画

- [X] Diffusers ワークフローの適応
- [ ] Cogシリーズのファインチューニングスイート (近日公開)
- [X] Cogシリーズのファインチューニングスイート (近日公開)
- [ ] ControlNetモデルとトレーニングコード

## コミュニティの取り組み
Expand Down Expand Up @@ -85,12 +87,12 @@

DITモデルは `BF16` 精度と `batchsize=4` でテストされ、結果は以下の表に示されています:

| 解像度 | enable_model_cpu_offload OFF | enable_model_cpu_offload ON | enable_model_cpu_offload ON </br> Text Encoder 4bit |
|-------------|------------------------------|-----------------------------|-----------------------------------------------------|
| 512 * 512 | 33GB | 20GB | 13G |
| 1280 * 720 | 35GB | 20GB | 13G |
| 1024 * 1024 | 35GB | 20GB | 13G |
| 1920 * 1280 | 39GB | 20GB | 14G |
| 解像度 | enable_model_cpu_offload OFF | enable_model_cpu_offload ON | enable_model_cpu_offload ON </br> Text Encoder 4bit |
|-------------|------------------------------|-----------------------------|-----------------------------------------------------|
| 512 * 512 | 33GB | 20GB | 13G |
| 1280 * 720 | 35GB | 20GB | 13G |
| 1024 * 1024 | 35GB | 20GB | 13G |
| 1920 * 1280 | 39GB | 20GB | 14G |

さらに、プロセスが強制終了されないようにするために、少なくとも`32GB`のRAMを持つデバイスを推奨します。

Expand Down Expand Up @@ -157,7 +159,7 @@ python prompt_optimize.py --api_key "Zhipu AI API Key" --prompt {your prompt} --

### 推論モデル

`BF16` 精度でモデルを実行します
`BF16` の精度で `CogView4-6B` モデルを実行する

```python
from diffusers import CogView4Pipeline
Expand All @@ -182,37 +184,20 @@ image = pipe(

image.save("cogview4.png")
```
For more inference code, please check:

1. For using `BNB int4` to load `text encoder` and complete inference code annotations,
check [here](inference/cli_demo_cogview4.py).
2. For using `TorchAO int8 or int4` to load `text encoder & transformer` and complete inference code annotations,
check [here](inference/cli_demo_cogview4_int8.py).
3. For setting up a `gradio` GUI DEMO, check [here](inference/gradio_web_demo.py).
## Installation
```
git clone https://github.com/THUDM/CogView4
cd CogView4
git clone https://huggingface.co/THUDM/CogView4-6B
pip install -r inference/requirements.txt
```
## Quickstart
12G VRAM
```
MODE=1 python inference/gradio_web_demo.py
```
24G VRAM 32G RAM
```
MODE=2 python inference/gradio_web_demo.py
```
24G VRAM 64G RAM
```
MODE=3 python inference/gradio_web_demo.py
```
48G VRAM 64G RAM
```
MODE=4 python inference/gradio_web_demo.py
```

より詳しい推論コードについては、以下をご確認ください:

1. `BNB int4` を使用して `text encoder` をロードし、完全な推論コードの注釈を確認するには、[こちら](inference/cli_demo_cogview4.py) をご覧ください。
2. `TorchAO int8 または int4` を使用して `text encoder & transformer` をロードし、完全な推論コードの注釈を確認するには、[こちら](inference/cli_demo_cogview4_int8.py) をご覧ください。
3. `gradio` GUI デモをセットアップするには、[こちら](inference/gradio_web_demo.py) をご覧ください。

## ファインチューニング(微調整)

このリポジトリにはファインチューニング用のコードは含まれていませんが、LoRA および SFT を含む以下の 2 つの方法でファインチューニングが可能です:

1. [CogKit](https://github.com/THUDM/CogKit):CogView4 および CogVideoX のファインチューニングをサポートする、公式で保守されているシステムレベルのファインチューニングフレームワークです。
2. [finetrainers](https://github.com/a-r-r-o-w/finetrainers):低メモリ環境向けのソリューションで、RTX 4090 でのファインチューニングが可能です。
3. ControlNet モデルを直接訓練したい場合は、[トレーニングコード](https://github.com/huggingface/diffusers/tree/main/examples/cogview4-control) を参考にして自前で訓練することができます。

## ライセンス

Expand Down
Loading