Flux.1 #1331

KimBioInfoStudio · 2024-09-14T03:20:46Z

What does this PR do?

adaption of diffuser.pipelines.FluxPipeline

Env:

IMG="vault.habana.ai/gaudi-docker/1.17.0/ubuntu22.04/habanalabs/pytorch-installer-2.3.1:latest"
docker run -dit --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --net=host --ipc=host --name flux  ${IMG} /bin/bash
docker exec -it flux python -m pip install git+https://github.com/kimbioinfostudio/optimum-habana.git@kim/flux
docker exec -it flux python -m pip install git+https://github.com/HabanaAI/DeepSpeed.git@1.17.0
docker exec -w /root -it flux bash -c "git clone -b kim/flux https://github.com/kimbioinfostudio/optimum-habana.git"
docker exec -w /root/optimum-habana/examples/stable-diffusion -it flux bash

Performance:

Device	Mode	Steps	FPS
G2H	Eagar	28	0.399
G2H	Eagar	4	2.121
G2H	Lazy	28	0.002
G2H	Graph	28	0.086
G2H	Graph	4	0.587

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

KimBioInfoStudio · 2024-09-18T08:59:01Z

lazy mode w/o graph

python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16

got output as following:

[INFO|pipeline_flux.py:339] 2024-09-27 07:12:01,106 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:12:01,106 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [06:50<00:00, 14.85s/it][INFO|pipeline_flux.py:416] 2024-09-27 07:19:06,461 >> Speed metrics: {'generation_runtime': 425.355, 'generation_samples_per_second': 0.002, 'generation_steps_per_second': 0.067}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [07:05<00:00, 15.19s/it]
09/27/2024 07:19:19 - INFO - __main__ - Saving images in /tmp/flux_1_images...

output image：

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>

KimBioInfoStudio · 2024-09-27T05:47:33Z

graph mode:

python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 06:18:43,177 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 06:18:43,177 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:35<00:00,  9.50it/s][INFO|pipeline_flux.py:416] 2024-09-27 06:19:27,857 >> Speed metrics: {'generation_runtime': 44.6799, 'generation_samples_per_second': 0.086, 'generation_steps_per_second': 2.413}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:44<00:00,  1.60s/it]
09/27/2024 06:19:40 - INFO - __main__ - Saving images in /tmp/flux_1_images...

python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 4 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --use_hpu_graphs \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 06:14:42,741 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 06:14:42,741 >> The first two iterations are slower so it is recommended to feed more batches.
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:33<00:00,  6.27s/it][INFO|pipeline_flux.py:416] 2024-09-27 06:15:16,976 >> Speed metrics: {'generation_runtime': 34.2343, 'generation_samples_per_second': 0.587, 'generation_steps_per_second': 2.35}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:34<00:00,  8.56s/it]
09/27/2024 06:15:29 - INFO - __main__ - Saving images in /tmp/flux_1_images...

KimBioInfoStudio · 2024-09-27T07:25:05Z

eager:

PT_HPU_LAZY_MODE=0 \
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 28 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16

output:

[INFO|pipeline_flux.py:339] 2024-09-27 07:27:16,601 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:27:16,601 >> The first two iterations are slower so it is recommended to feed more batches.
  4%|██████▏                                                                                                                                                                      | 1/28 [00:01<00:41,  1.53s/it]09/27/2024 07:27:18 - WARNING - habana_frameworks.torch.utils.internal - Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:03<00:00, 11.58it/s][INFO|pipeline_flux.py:416] 2024-09-27 07:27:20,589 >> Speed metrics: {'generation_runtime': 3.9884, 'generation_samples_per_second': 0.399, 'generation_steps_per_second': 11.162}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 28/28 [00:03<00:00,  7.02it/s]
09/27/2024 07:27:59 - INFO - __main__ - Saving images in /tmp/flux_1_images...

PT_HPU_LAZY_MODE=0 \
python text_to_image_generation.py \                  
    --model_name_or_path black-forest-labs/FLUX.1-schnell \
    --prompts "A cat holding a sign that says hello world" \
    --num_images_per_prompt 1 \
    --batch_size 1 \
    --num_inference_steps 4 \
    --image_save_dir /tmp/flux_1_images \
    --scheduler flow_match_euler_discrete\
    --use_habana \
    --gaudi_config Habana/stable-diffusion \
    --bf16

[INFO|pipeline_flux.py:339] 2024-09-27 07:29:50,265 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_flux.py:344] 2024-09-27 07:29:50,265 >> The first two iterations are slower so it is recommended to feed more batches.
 25%|███████████████████████████████████████████▌                                                                                                                                  | 1/4 [00:01<00:04,  1.53s/it]09/27/2024 07:29:51 - WARNING - habana_frameworks.torch.utils.internal - Calling mark_step function does not have any effect. It's lazy mode only functionality. (warning logged once)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.94it/s][INFO|pipeline_flux.py:416] 2024-09-27 07:29:52,107 >> Speed metrics: {'generation_runtime': 1.8415, 'generation_samples_per_second': 2.121, 'generation_steps_per_second': 8.482}
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00,  2.17it/s]
09/27/2024 07:29:56 - INFO - __main__ - Saving images in /tmp/flux_1_images...

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

huijuanzh · 2024-10-08T03:06:26Z

@regisss please help to review this PR.
test under diffusers 0.31.0.dev0
4 inference steps:
Nvidia A800 Throughput(BF16):1.24 it/s
Eager Gaudi2 Throughput(BF16):8.484 it/s
Graph Gaudi2 Throughtput(BF16):2.348 it/s

28 inference steps:
Nvidia A800 Throughput(BF16):1.71 it/s
Eager Gaudi2 Throughput(BF16):11.172 it/s
Graph Gaudi2 Throughtput(BF16):2.408 it/s

ssarkar2

please delete measure_all_500, measure_all etc. binary files like npz needn't be uploaded

KimBioInfoStudio · 2024-10-14T06:44:32Z

Performance With Batching Enabled:

Device	Mode	Prompts	Image Per Prompts	BS	Steps	FPS
G2H	Graph	1	4	4	28	0.113
G2H	Graph	5	1	5	28	0.113

KimBioInfoStudio · 2024-10-14T07:35:36Z

please delete measure_all_500, measure_all etc. binary files like npz needn't be uploaded

@ssarkar2 removed, pls review again

upgrade diffusers

9991f09

KimBioInfoStudio requested a review from regisss as a code owner September 14, 2024 03:20

KimBioInfoStudio marked this pull request as draft September 14, 2024 03:20

KimBioInfoStudio added 3 commits September 18, 2024 10:06

replace schduler

9bbcc1b

update wkld entrypoint

cb2aaf0

rem demo wkld entrypoint

4fcf181

KimBioInfoStudio marked this pull request as ready for review September 18, 2024 08:59

KimBioInfoStudio changed the title ~~Flux~~ Flux。1 Sep 18, 2024

KimBioInfoStudio changed the title ~~Flux。1~~ Flux.1 Sep 18, 2024

KimBioInfoStudio and others added 9 commits September 23, 2024 14:03

add warp in hpu graph

8e0a02f

upgrade diffusers

154101e

replace schduler

8759264

update wkld entrypoint

16848b3

rem demo wkld entrypoint

073b6d0

add warp in hpu graph

60c5de3

Add fp8 to flux and fix timing

e187930

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

Enable batching for flux inference

66098ff

Signed-off-by: Daniel Socek <daniel.socek@intel.com> Co-authored-by: Deepak Narayana <deepak.narayana@intel.com>

update diffusers to adopt rope changes

0615ce1

KimBioInfoStudio added 4 commits September 27, 2024 13:48

fix import error

97a6dd5

fix readme conflict

f3f469c

fix time clac drift

9aefc5f

fix import error in lazy mode

1bc593a

dsocek and others added 4 commits September 27, 2024 18:22

Add hybrid fp8 and bf16 denoising to flux

3a53c95

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

use default scheduler from upstream diffusers

0b80a6a

fix import error

f267958

Fix timing issue with batching

9bcc65f

Signed-off-by: Daniel Socek <daniel.socek@intel.com>

KimBioInfoStudio and others added 6 commits October 8, 2024 13:58

fix conflicts

971ca4d

Add FusedSDPA

1144815

Merge branch 'dsocek/flux' into kim/flux

8145d20

use latest attn rope

479dc96

fix scheduler

18c5960

add OenFLUX.1

eee48c7

ssarkar2 reviewed Oct 11, 2024

View reviewed changes

KimBioInfoStudio added 4 commits October 14, 2024 11:34

fix errors

c299caa

rem quant files

c0d391e

rem tmp tests files

b5aee78

rem text_ids image_ids from split into batches

44a48c7

KimBioInfoStudio requested a review from ssarkar2 October 14, 2024 07:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flux.1 #1331

Flux.1 #1331

KimBioInfoStudio commented Sep 14, 2024 •

edited

Loading

KimBioInfoStudio commented Sep 18, 2024 •

edited

Loading

KimBioInfoStudio commented Sep 27, 2024 •

edited

Loading

KimBioInfoStudio commented Sep 27, 2024

huijuanzh commented Oct 8, 2024

ssarkar2 left a comment

KimBioInfoStudio commented Oct 14, 2024

KimBioInfoStudio commented Oct 14, 2024

Flux.1 #1331

Are you sure you want to change the base?

Flux.1 #1331

Conversation

KimBioInfoStudio commented Sep 14, 2024 • edited Loading

What does this PR do?

Before submitting

KimBioInfoStudio commented Sep 18, 2024 • edited Loading

KimBioInfoStudio commented Sep 27, 2024 • edited Loading

KimBioInfoStudio commented Sep 27, 2024

huijuanzh commented Oct 8, 2024

ssarkar2 left a comment

Choose a reason for hiding this comment

KimBioInfoStudio commented Oct 14, 2024

KimBioInfoStudio commented Oct 14, 2024

KimBioInfoStudio commented Sep 14, 2024 •

edited

Loading

KimBioInfoStudio commented Sep 18, 2024 •

edited

Loading

KimBioInfoStudio commented Sep 27, 2024 •

edited

Loading