-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sdxl-trainig: fixed ci, changed gated dataset, fixes for non-square datasets #1038
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. The diffusers have similar update as well. The switch to VAE fix fp16 is also needed.
--gaudi_config_name Habana/stable-diffusion \ | ||
--throughput_warmup_steps 3 \ | ||
--dataloader_num_workers 8 \ | ||
--bf16 \ | ||
--use_hpu_graphs_for_training \ | ||
--use_hpu_graphs_for_inference \ | ||
--validation_prompt="a robotic cat with wings" \ | ||
--validation_prompt="a cute Sundar Pichai creature" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm should we put a more neutral prompt here, for example "a cute dragon creature"? Not sure if Google will get mad here..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed: 36ecac0
@regisss @ssarkar2 This change seems to be working fine for square data (in README.md) and none-square one such as python train_text_to_image_sdxl.py --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --pretrained_vae_model_name_or_path madebyollin/sdxl-vae-fp16-fix --dataset_name linoyts/Tuxemon --resolution 256 --crop_resolution 256 --center_crop --random_flip --proportion_empty_prompts=0.2 --train_batch_size 16 --max_train_steps 2500 --learning_rate 1e-05 --max_grad_norm 1 --lr_scheduler constant --lr_warmup_steps 0 --output_dir sdxl_model_output --gaudi_config_name Habana/stable-diffusion --throughput_warmup_steps 3 --dataloader_num_workers 8 --bf16 --use_hpu_graphs_for_training --use_hpu_graphs_for_inference --validation_prompt="a cute Sundar Pichai creature" --validation_epochs 48 --checkpointing_steps 2500 --logging_step 10 --adjust_throughput --caption_column prompt
.
.
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-04 20:56:58,408 >> The first two iterations are slower so it is recommended to feed more batches.
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-04 20:57:01,461 >> Speed metrics: {'generation_runtime': 3.0052, 'generation_samples_per_second': 0.35, 'generation_steps_per_second': 0.35}
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-04 20:57:01,588 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-04 20:57:01,588 >> The first two iterations are slower so it is recommended to feed more batches.
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-04 20:57:04,276 >> Speed metrics: {'generation_runtime': 2.642, 'generation_samples_per_second': 0.381, 'generation_steps_per_second': 0.381}
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-04 20:57:04,398 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-04 20:57:04,398 >> The first two iterations are slower so it is recommended to feed more batches.
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-04 20:57:07,083 >> Speed metrics: {'generation_runtime': 2.6403, 'generation_samples_per_second': 0.381, 'generation_steps_per_second': 0.381}
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-04 20:57:07,205 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-04 20:57:07,205 >> The first two iterations are slower so it is recommended to feed more batches.
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-04 20:57:09,888 >> Speed metrics: {'generation_runtime': 2.6384, 'generation_samples_per_second': 0.381, 'generation_steps_per_second': 0.381}
Steps: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2500/2500 [39:21<00:00, 1.63it/s, lr=1e-5, mem_used=61.7, step_loss=0.144]06/04/2024 20:58:47 - INFO - accelerate.accelerator - Saving current state to sdxl_model_output/checkpoint-2500
Configuration saved in sdxl_model_output/checkpoint-2500/unet/config.json
Model weights saved in sdxl_model_output/checkpoint-2500/unet/diffusion_pytorch_model.safetensors
[WARNING|pipeline_utils.py:149] 2024-06-04 21:00:27,103 >> `use_torch_autocast` is True in the given Gaudi configuration but `torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only.
Configuration saved in sdxl_model_output/vae/config.json
Model weights saved in sdxl_model_output/vae/diffusion_pytorch_model.safetensors
Configuration saved in sdxl_model_output/unet/config.json
[INFO|configuration_utils.py:358] 2024-06-04 21:00:27,103 >> GaudiConfig {
"autocast_bf16_ops": null,
"autocast_fp32_ops": null,
"optimum_version": "1.20.0",
"transformers_version": "4.40.2",
"use_dynamic_shapes": false,
"use_fused_adam": true,
"use_fused_clip_norm": true,
"use_torch_autocast": true
}
[WARNING|pipeline_utils.py:149] 2024-06-04 21:00:27,103 >> `use_torch_autocast` is True in the given Gaudi configuration but `torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only.
Configuration saved in sdxl_model_output/vae/config.json
Model weights saved in sdxl_model_output/vae/diffusion_pytorch_model.safetensors
Configuration saved in sdxl_model_output/unet/config.json
Model weights saved in sdxl_model_output/unet/diffusion_pytorch_model.safetensors
Configuration saved in sdxl_model_output/scheduler/scheduler_config.json
Configuration saved in sdxl_model_output/model_index.json
[INFO|configuration_utils.py:113] 2024-06-04 21:00:35,647 >> Configuration saved in sdxl_model_output/gaudi_config.json
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-04 21:00:35,804 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-04 21:00:35,804 >> The first two iterations are slower so it is recommended to feed more batches.
^100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:14<00:00, 74.22s/it]
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-04 21:01:50,062 >> Speed metrics: {'generation_runtime': 74.2221, 'generation_samples_per_second': 0.143, 'generation_steps_per_second': 0.143}:14<00:00, 74.22s/it] I've confirmed the same test above with BS=1 and BS=2 as well. Another test is done with python train_text_to_image_sdxl.py --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --pretrained_vae_model_name_or_path madebyollin/sdxl-vae-fp16-fix --dataset_name poloclub/diffusiondb --resolution 512 --crop_resolution 512 --center_crop --random_flip --proportion_empty_prompts=0.2 --train_batch_size 2 --max_train_steps 1000 --learning_rate 1e-05 --max_grad_norm 1 --lr_scheduler constant --lr_warmup_steps 0 --output_dir sdxl_model_output --gaudi_config_name Habana/stable-diffusion --throughput_warmup_steps 3 --dataloader_num_workers 8 --bf16 --use_hpu_graphs_for_training --use_hpu_graphs_for_inference --validation_prompt="a cute Sundar Pichai creature" --validation_epochs 48 --checkpointing_steps 1000 --logging_step 10 --adjust_throughput --caption_column prompt
.
.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'variance_type', 'dynamic_thresholding_ratio', 'rescale_betas_zero_snr', 'clip_sample_range'} was not found in config. Values will be initialized to default values.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 160
CPU RAM : 1056375284 KB
------------------------------------------------------------------------------
{'reverse_transformer_layers_per_block', 'dropout', 'attention_type'} was not found in config. Values will be initialized to default values.
/usr/local/lib/python3.10/dist-packages/datasets/load.py:1491: FutureWarning: The repository for poloclub/diffusiondb contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/poloclub/diffusiondb
You can avoid this message in future by passing the argument `trust_remote_code=True`.
Passing `trust_remote_code=True` will be mandatory to load this dataset from the next major release of `datasets`.
warnings.warn(
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:29<00:00, 33.35 examples/s]
06/05/2024 02:19:08 - INFO - __main__ - ***** Running training *****
06/05/2024 02:19:08 - INFO - __main__ - Num examples = 1000
06/05/2024 02:19:08 - INFO - __main__ - Num Epochs = 2
06/05/2024 02:19:08 - INFO - __main__ - Instantaneous batch size per device = 2
06/05/2024 02:19:08 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 2
06/05/2024 02:19:08 - INFO - __main__ - Gradient Accumulation steps = 1
06/05/2024 02:19:08 - INFO - __main__ - Total optimization steps = 1000
Steps: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [17:58<00:00, 3.62it/s, lr=1e-5, mem_used=31.5, step_loss=0.341]06/05/2024 02:37:06 - INFO - accelerate.accelerator - Saving current state to sdxl_model_output/checkpoint-1000
Configuration saved in sdxl_model_output/checkpoint-1000/unet/config.json
Model weights saved in sdxl_model_output/checkpoint-1000/unet/diffusion_pytorch_model.safetensors
06/05/2024 02:38:45 - INFO - accelerate.checkpointing - Optimizer state saved in sdxl_model_output/checkpoint-1000/optimizer.bin
06/05/2024 02:38:45 - INFO - accelerate.checkpointing - Scheduler state saved in sdxl_model_output/checkpoint-1000/scheduler.bin
06/05/2024 02:38:45 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in sdxl_model_output/checkpoint-1000/sampler.bin
06/05/2024 02:38:45 - INFO - accelerate.checkpointing - Random states saved in sdxl_model_output/checkpoint-1000/random_states_0.pkl
06/05/2024 02:38:45 - INFO - __main__ - Saved state to sdxl_model_output/checkpoint-1000
Steps: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [19:37<00:00, 3.62it/s, lr=1e-5, mem_used=31.5, step_loss=0.0351]06/05/2024 02:38:45 - INFO - __main__ - Throughput = 9.616218221859341 samples/s
06/05/2024 02:38:45 - INFO - __main__ - Train runtime = 207.35802308097482 seconds
06/05/2024 02:38:45 - INFO - __main__ - Total Train runtime = 1177.2033570841886 seconds
{'image_encoder', 'gaudi_config', 'bf16_full_eval', 'feature_extractor', 'use_habana', 'use_hpu_graphs'} was not found in config. Values will be initialized to default values.
Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0. | 0/7 [00:00<?, ?it/s]
Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of stabilityai/stable-diffusion-xl-base-1.0.████████████████████████████████████████████▏ | 5/7 [00:00<00:00, 45.63it/s]
Loaded text_encoder_2 as CLIPTextModelWithProjection from `text_encoder_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 9.74it/s]
[INFO|pipeline_utils.py:130] 2024-06-05 02:38:47,002 >> Enabled HPU graphs.
[INFO|configuration_utils.py:305] 2024-06-05 02:38:47,094 >> loading configuration file gaudi_config.json from cache at /root/.cache/huggingface/hub/models--Habana--stable-diffusion/snapshots/60ee357057ec90d2b183de22d0327ddd5d5a6db9/gaudi_config.json
[INFO|configuration_utils.py:358] 2024-06-05 02:38:47,094 >> GaudiConfig {
"autocast_bf16_ops": null,
"autocast_fp32_ops": null,
"optimum_version": "1.20.0",
"transformers_version": "4.40.2",
"use_dynamic_shapes": false,
"use_fused_adam": true,
"use_fused_clip_norm": true,
"use_torch_autocast": true
}
[WARNING|pipeline_utils.py:149] 2024-06-05 02:38:47,094 >> `use_torch_autocast` is True in the given Gaudi configuration but `torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only.
Configuration saved in sdxl_model_output/vae/config.json
Model weights saved in sdxl_model_output/vae/diffusion_pytorch_model.safetensors
Configuration saved in sdxl_model_output/unet/config.json
Model weights saved in sdxl_model_output/unet/diffusion_pytorch_model.safetensors
Configuration saved in sdxl_model_output/scheduler/scheduler_config.json
Configuration saved in sdxl_model_output/model_index.json
[INFO|configuration_utils.py:113] 2024-06-05 02:38:55,719 >> Configuration saved in sdxl_model_output/gaudi_config.json
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-05 02:38:55,905 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-05 02:38:55,905 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [01:18<00:00, 78.79s/it]
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-05 02:40:14,729 >> Speed metrics: {'generation_runtime': 78.7885, 'generation_samples_per_second': 0.139, 'generation_steps_per_second': 0.139}███████| 1/1 [01:18<00:00, 78.79s/it]
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-05 02:40:26,698 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-05 02:40:26,698 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:06<00:00, 6.18s/it]
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-05 02:40:32,921 >> Speed metrics: {'generation_runtime': 6.1851, 'generation_samples_per_second': 0.249, 'generation_steps_per_second': 0.249}████████| 1/1 [00:06<00:00, 6.18s/it]
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-05 02:40:33,069 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-05 02:40:33,069 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.63s/it]
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-05 02:40:35,746 >> Speed metrics: {'generation_runtime': 2.6331, 'generation_samples_per_second': 0.383, 'generation_steps_per_second': 0.383}████████| 1/1 [00:02<00:00, 2.63s/it]
[INFO|pipeline_stable_diffusion_xl.py:537] 2024-06-05 02:40:35,847 >> 1 prompt(s) received, 1 generation(s) per prompt, 1 sample(s) per batch, 1 total batch(es).
[WARNING|pipeline_stable_diffusion_xl.py:542] 2024-06-05 02:40:35,848 >> The first two iterations are slower so it is recommended to feed more batches.
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00, 2.62s/it]
[INFO|pipeline_stable_diffusion_xl.py:825] 2024-06-05 02:40:38,507 >> Speed metrics: {'generation_runtime': 2.6246, 'generation_samples_per_second': 0.383, 'generation_steps_per_second': 0.383}████████| 1/1 [00:02<00:00, 2.62s/it]
06/05/2024 02:40:38 - INFO - __main__ - Saving images in /root/optimum-habana/examples/stable-diffusion/training/stable-diffusion-generated-images...
Steps: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [21:34<00:00, 1.29s/it, lr=1e-5, mem_used=31.5, step_loss=0.0351] -> above test with |
@regisss Would appreciate a review on this when you had a chance. Thank you. . |
The Gaudi1 command in the README fails with
on Synapse 1.16. |
@regisss python train_text_to_image_sdxl.py --pretrained_model_name_or_path stabilityai/stable-diffusion-xl-base-1.0 --pretrained_vae_model_n ame_or_path madebyollin/sdxl-vae-fp16-fix --dataset_name lambdalabs/naruto-blip-captions --resolution 256 --center_crop --random_flip --proportion_empty_prompts=0.2 --train_batch_size 1 --gradient_accumulation_steps 4 --max_train_steps 3000 --learning_rate 1e-05 --max_grad_norm 1 --lr_scheduler constant --lr_warmup_steps 0 --output_dir sdxl_model_output --gaudi_config_name Habana/stable-diffusion --throughput_warmup_steps 3 --use_hpu_graphs_for_training --use_hpu_graphs_for_inference --bf16 wrt to graph compile: this will be improved in future releases. |
@regisss I ran this on the updated g1 example and it completed. 06/13/2024 01:30:47 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: hpu
Mixed precision type: bf16
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'dynamic_thresholding_ratio', 'clip_sample_range', 'variance_type', 'rescale_betas_zero_snr', 'thresholding'} was not found in config. Values will be initialized to default values.
============================= HABANA PT BRIDGE CONFIGURATION ===========================
PT_HPU_LAZY_MODE = 1
PT_RECIPE_CACHE_PATH =
PT_CACHE_FOLDER_DELETE = 0
PT_HPU_RECIPE_CACHE_CONFIG =
PT_HPU_MAX_COMPOUND_OP_SIZE = 9223372036854775807
PT_HPU_LAZY_ACC_PAR_MODE = 1
PT_HPU_ENABLE_REFINE_DYNAMIC_SHAPES = 0
---------------------------: System Configuration :---------------------------
Num CPU Cores : 144
CPU RAM : 1056455416 KB
------------------------------------------------------------------------------
{'reverse_transformer_layers_per_block', 'attention_type', 'dropout'} was not found in config. Values will be initialized to default values.
Repo card metadata block was not found. Setting CardData to empty.
06/13/2024 01:30:52 - WARNING - huggingface_hub.repocard - Repo card metadata block was not found. Setting CardData to empty.
Map: 100%|██████████| 1221/1221 [01:56<00:00, 10.45 examples/s]
/usr/local/lib/python3.10/dist-packages/habana_frameworks/torch/hpu/__init__.py:158: UserWarning: torch.hpu.setDeterministic is deprecated and will be removed in next release. Please use torch.use_deterministic_algorithms instead.
warnings.warn(
[2024-06-13 01:32:54,220] [INFO] [real_accelerator.py:178:get_accelerator] Setting ds_accelerator to hpu (auto detect)
06/13/2024 01:32:54 - INFO - __main__ - ***** Running training *****
06/13/2024 01:32:54 - INFO - __main__ - Num examples = 1221
06/13/2024 01:32:54 - INFO - __main__ - Num Epochs = 10
06/13/2024 01:32:54 - INFO - __main__ - Instantaneous batch size per device = 1
06/13/2024 01:32:54 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 4
06/13/2024 01:32:54 - INFO - __main__ - Gradient Accumulation steps = 4
06/13/2024 01:32:54 - INFO - __main__ - Total optimization steps = 3000
Steps: 100%|██████████| 3000/3000 [1:00:03<00:00, 1.12it/s, lr=1e-5, mem_used=25.1, step_loss=0.00386]06/13/2024 02:32:58 - INFO - accelerate.accelerator - Saving current state to sdxl_model_output/checkpoint-3000
Configuration saved in sdxl_model_output/checkpoint-3000/unet/config.json
Model weights saved in sdxl_model_output/checkpoint-3000/unet/diffusion_pytorch_model.safetensors
06/13/2024 02:34:44 - INFO - accelerate.checkpointing - Optimizer state saved in sdxl_model_output/checkpoint-3000/optimizer.bin
06/13/2024 02:34:44 - INFO - accelerate.checkpointing - Scheduler state saved in sdxl_model_output/checkpoint-3000/scheduler.bin
06/13/2024 02:34:44 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in sdxl_model_output/checkpoint-3000/sampler.bin
06/13/2024 02:34:44 - INFO - accelerate.checkpointing - Random states saved in sdxl_model_output/checkpoint-3000/random_states_0.pkl
06/13/2024 02:34:44 - INFO - __main__ - Saved state to sdxl_model_output/checkpoint-3000
Steps: 100%|██████████| 3000/3000 [1:01:49<00:00, 1.12it/s, lr=1e-5, mem_used=25.1, step_loss=0.091] 06/13/2024 02:34:44 - INFO - __main__ - Throughput = 4.167025074619193 samples/s
06/13/2024 02:34:44 - INFO - __main__ - Train runtime = 2876.8725374410024 seconds
06/13/2024 02:34:44 - INFO - __main__ - Total Train runtime = 3709.792234335 seconds
Fetching 14 files: 100%|██████████| 14/14 [00:11<00:00, 1.25it/s]
{'feature_extractor', 'bf16_full_eval', 'image_encoder', 'use_habana', 'gaudi_config', 'use_hpu_graphs'} was not found in config. Values will be initialized to default values.
Loaded text_encoder_2 as CLIPTextModelWithProjection from `text_encoder_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...: 0%| | 0/7 [00:00<?, ?it/s] Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loaded tokenizer_2 as CLIPTokenizer from `tokenizer_2` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of stabilityai/stable-diffusion-xl-base-1.0.
Loading pipeline components...: 100%|██████████| 7/7 [00:00<00:00, 7.93it/s]
[INFO|pipeline_utils.py:130] 2024-06-13 02:34:56,910 >> Enabled HPU graphs.
[INFO|configuration_utils.py:305] 2024-06-13 02:34:57,000 >> loading configuration file gaudi_config.json from cache at /root/.cache/huggingface/hub/models--Habana--stable-diffusion/snapshots/60ee357057ec90d2b183de22d0327ddd5d5a6db9/gaudi_config.json
[INFO|configuration_utils.py:358] 2024-06-13 02:34:57,000 >> GaudiConfig {
"autocast_bf16_ops": null,
"autocast_fp32_ops": null,
"optimum_version": "1.20.0",
"transformers_version": "4.40.2",
"use_dynamic_shapes": false,
"use_fused_adam": true,
"use_fused_clip_norm": true,
"use_torch_autocast": true
}
[WARNING|pipeline_utils.py:149] 2024-06-13 02:34:57,000 >> `use_torch_autocast` is True in the given Gaudi configuration but `torch_dtype=torch.bfloat16` was given. Disabling mixed precision and continuing in bf16 only.
Configuration saved in sdxl_model_output/vae/config.json
Model weights saved in sdxl_model_output/vae/diffusion_pytorch_model.safetensors
Configuration saved in sdxl_model_output/unet/config.json
Model weights saved in sdxl_model_output/unet/diffusion_pytorch_model.safetensors
Configuration saved in sdxl_model_output/scheduler/scheduler_config.json
Configuration saved in sdxl_model_output/model_index.json
[INFO|configuration_utils.py:113] 2024-06-13 02:35:45,348 >> Configuration saved in sdxl_model_output/gaudi_config.json
Steps: 100%|██████████| 3000/3000 [1:02:50<00:00, 1.26s/it, lr=1e-5, mem_used=25.1, step_loss=0.091] The note about the first 2 steps is in the README as well. Could you please review this again? |
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
LGTM! |
Sure and Done ran the ------------------------------------------------------------------------------
{'attention_type', 'reverse_transformer_layers_per_block', 'dropout'} was not found in config. Values will be initialized to default values.
Repo card metadata block was not found. Setting CardData to empty.
06/13/2024 15:46:06 - WARNING - huggingface_hub.repocard - Repo card metadata block was not found. Setting CardData to empty.
Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1221/1221 [01:06<00:00, 18.32 examples/s]
06/13/2024 15:47:20 - INFO - __main__ - ***** Running training *****
06/13/2024 15:47:20 - INFO - __main__ - Num examples = 1221
06/13/2024 15:47:20 - INFO - __main__ - Num Epochs = 1
06/13/2024 15:47:20 - INFO - __main__ - Instantaneous batch size per device = 16
06/13/2024 15:47:20 - INFO - __main__ - Total train batch size (w. parallel, distributed & accumulation) = 16
06/13/2024 15:47:20 - INFO - __main__ - Gradient Accumulation steps = 1
06/13/2024 15:47:20 - INFO - __main__ - Total optimization steps = 2
Steps: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [09:41<00:00, 306.10s/it, lr=1e-5, mem_used=27.7, step_loss=0.34]06/13/2024 15:57:01 - INFO - accelerate.accelerator - Saving current state to /tmp/tmp6h_7me7t/checkpoint-2
Configuration saved in /tmp/tmp6h_7me7t/checkpoint-2/unet/config.json
Model weights saved in /tmp/tmp6h_7me7t/checkpoint-2/unet/diffusion_pytorch_model.safetensors
06/13/2024 16:04:24 - INFO - accelerate.checkpointing - Optimizer state saved in /tmp/tmp6h_7me7t/checkpoint-2/optimizer.bin
06/13/2024 16:04:24 - INFO - accelerate.checkpointing - Scheduler state saved in /tmp/tmp6h_7me7t/checkpoint-2/scheduler.bin
06/13/2024 16:04:24 - INFO - accelerate.checkpointing - Sampler state for dataloader 0 saved in /tmp/tmp6h_7me7t/checkpoint-2/sampler.bin
06/13/2024 16:04:24 - INFO - accelerate.checkpointing - Random states saved in /tmp/tmp6h_7me7t/checkpoint-2/random_states_0.pkl
06/13/2024 16:04:24 - INFO - __main__ - Saved state to /tmp/tmp6h_7me7t/checkpoint-2
Steps: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [17:04<00:00, 512.21s/it, lr=1e-5, mem_used=27.7, step_loss=0.503]
PASSED
========================================================================================================== warnings summary ===========================================================================================================
../../../../../../usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:485
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:485: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
../../../../../../usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:342
/usr/local/lib/python3.10/dist-packages/transformers/utils/generic.py:342: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
../../../../../../usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63
../../../../../../usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63
/usr/local/lib/python3.10/dist-packages/diffusers/utils/outputs.py:63: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
torch.utils._pytree._register_pytree_node(
../../../../../../usr/local/lib/python3.10/dist-packages/lightning_utilities/core/imports.py:14
/usr/local/lib/python3.10/dist-packages/lightning_utilities/core/imports.py:14: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
../../../../../../usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py:2832
/usr/local/lib/python3.10/dist-packages/pkg_resources/__init__.py:2832: DeprecationWarning: Deprecated call to `pkg_resources.declare_namespace('google')`.
Implementing implicit namespace packages (as specified in PEP 420) is preferred to `pkg_resources.declare_namespace`. See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages
declare_namespace(pkg)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
====================================================================================== 2 passed, 60 deselected, 6 warnings in 1217.80s (0:20:17) ====================================================================================== |
What does this PR do?
Mirrors the changes here huggingface/diffusers@8edaf3b
@regisss @dsocek @libinta
Hi team,
Opening this PR to fix up some SDXL issues related to gated dataset. Tests are completed and provided in below.
Fixes # (issue)
gated dataset for sdxl training
Before submitting
Tests
1x HPU
8x HPU
CI