Skip to content

Update pipeline_stable_diffusion_3_controlnet.py#8660

Merged
sayakpaul merged 2 commits intohuggingface:mainfrom
haofanwang:patch-12
Jun 23, 2024
Merged

Update pipeline_stable_diffusion_3_controlnet.py#8660
sayakpaul merged 2 commits intohuggingface:mainfrom
haofanwang:patch-12

Conversation

@haofanwang
Copy link
Contributor

This PR enables setting T5 Token limit as done in #8506

@sayakpaul
Copy link
Member

Thanks!

Have you performed any ablations on how does this choice effect the overall generation quality?

@sayakpaul sayakpaul requested a review from asomoza June 21, 2024 12:40
@haofanwang
Copy link
Contributor Author

Not a comprehensive ablation , but I do find that long prompt performs better than short prompt in SD3.

@sayakpaul
Copy link
Member

Cool. Could you maybe show one two examples in comparison?

@asomoza
Copy link
Member

asomoza commented Jun 21, 2024

I didn't test the original PR with the controlnet. These are my results, feel free to use them.

SD3 definitely likes longer prompts in a more normal language than "tags". Just adding a long prompt and enabling it with the T5 makes the generation better with more details. I still get the best results if I use two prompts, one for the clip text encoders and another one for the T5.

prompt = "a towering humanoid robot, with wings folded over its back, engines transformed into legs, and the cockpit shifted down to form the chest, armed with a 55mm gunpod and laser cannons, ready for battle."
long_prompt = "A towering humanoid robot, with wings folded over its back, engines transformed into legs, and the cockpit shifted down to form the chest, armed with a 55mm gunpod and laser cannons, stands ready for battle amidst the smoldering ruins of a once-thriving metropolis. Its massive form, a fusion of sleek technology and battle-hardened steel, dominates the desolate landscape. The wings, once part of a nimble aircraft, now fold protectively over its back, their edges scorched from countless dogfights. The engines, stripped of their airborne purpose, have transformed into powerful legs, each step leaving seismic tremors in their wake. The cockpit, once perched high in the sky, has descended to the chest, encased in reinforced plating. The 55mm gunpod, mounted on the robot’s right arm, gleams with a deadly promise. Its ammunition belt snakes upward, connecting to a storage unit embedded in the robot’s shoulder. The gunpod swivels, tracking an incoming enemy fighter. The laser cannons charge, their glow intensifying. The city’s last survivors watch from hidden vantage points, their hope rekindled by the sight of their guardian."
canny 77 short prompt 77 long prompt 512 long prompt 512 short + long prompt
20240621110123 20240621110318_3569554572 20240621110427_3569554572 20240621110500_3569554572 20240621110542_3569554572

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@sayakpaul sayakpaul merged commit f1f542b into huggingface:main Jun 23, 2024
sayakpaul pushed a commit that referenced this pull request Dec 23, 2024
Co-authored-by: YiYi Xu <yixu310@gmail,com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants