-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
PoC: Rewrite fine_tune.py as train_native.py #1950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sd3
Are you sure you want to change the base?
Conversation
|
Expand |
|
Included #1985 (implied #1409, #837 but not #1468). From discussion in GUI repo, there are more memory efficient optimizers other than accelerate launch sdxl_train_v2.py
--pretrained_model_name_or_path="/run/media/user/Intel P4510 3/astolfo_xl/x255c-AstolfoMix-25022801-1458190.safetensors"
--in_json "/run/media/user/Intel P4510 3/just_astolfo/meta_lat_v3.json"
--train_data_dir="/run/media/user/Intel P4510 3/just_astolfo/kohyas_finetune"
--output_dir="/run/media/user/Intel P4510 3/astolfo_xl/just_astolfo/model_out"
--log_with=tensorboard --logging_dir="/run/media/user/Intel P4510 3/astolfo_xl/just_astolfo/tensorboard" --log_prefix=just_astolfo_25030801_
--seed=25030801 --save_model_as=safetensors --caption_extension=".txt" --enable_wildcard
--optimizer_type "pytorch_optimizer.CAME" --optimizer_args "weight_decay=1e-2" --learning_rate="1e-6" --train_text_encoder --learning_rate_te1="1e-5" --learning_rate_te2="1e-5"
--max_train_epochs=10
--xformers --gradient_checkpointing --gradient_accumulation_steps=4 --max_grad_norm=0
--max_data_loader_n_workers=32 --persistent_data_loader_workers --pin_memory
--train_batch_size=1 --full_bf16 --mixed_precision=bf16 --save_precision=fp16
--enable_bucket --cache_latents --skip_cache_check --save_every_n_epochs=1
#--deepspeed --mem_eff_attn --torch_compile --dynamo_backend=inductor
#--skip_until_initial_step --initial_step=1 --initial_epoch=1
#numactl --cpunodebind=1 --membind=1 |
|
Added profiler support:
Meanwhile fixed the checking of |
Referring to Issue #1947 and PR #1359 .
fine_tune.pycan be merged with the concepts intrain_network.py, and becomestrain_native.py.--skip_until_initial_step,--validation_split) has been added,--mem_eff_attn,--xformerswhich applies for more aggressive checking (probable still VAE only?)Tested with SDXL with this CLI command (hint: many features):
And the following
accelerate config:(A bit off topic) It runs for around 15.5s / it (4 cards x 4 accumulation steps) with 4x RTX 3090 24GB (X299 DARK, 10980XE, P4510 4TB).