-
Notifications
You must be signed in to change notification settings - Fork 11k
Added support for NewBieModel #11284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
I have tested this on latest ComfyUI version and got an error: I was able to resolve the above error after installing flash-attn2: Now the model is running without any issues for me, torch compile also works |
|
Working great here too o/ ! |
|
AFAIK nothing else in Comfy core is has a dependency on Flash Attention (i.e. I prefer SDPA/SageAttention so do not even have FA installed). See for example this file which allows trying an FA import and falling back to other implementations: ComfyUI/comfy/ldm/modules/attention.py Lines 33 to 40 in 908fd7d
Maybe this could be refactored to use whatever generic Attention primitives Comfy provides and remove the hard dependency on FA? |
comfy/ldm/newbie/model.py
Outdated
| for i in range(bsz): | ||
| img = x[i] | ||
| C, H, W = img.size() | ||
| img = img.view(C, H // pH, pH, W // pW, pW).permute(1, 3, 2, 4, 0).flatten(2).flatten(0, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this line, expressions such as H // pH and W // pW rely on integer floor division. When H or W is an odd number, the floor division silently discards the remainder, causing a mismatch between the target shape and the actual tensor size, which leads to a runtime error.
This situation commonly occurs after an upscaling step, where the resulting height or width may become odd, and therefore requires explicit handling (e.g. padding or cropping) before patchification.
File "H:\ComfyUI-aki-v1.7_alt\ComfyUI\comfy\samplers.py", line 214, in _calc_cond_batch_outer
return executor.execute(model, conds, x_in, timestep, model_options)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ComfyUI-aki-v1.7_alt\ComfyUI\comfy\patcher_extension.py", line 112, in execute
return self.original(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ComfyUI-aki-v1.7_alt\ComfyUI\comfy\samplers.py", line 326, in _calc_cond_batch
output = model.apply_model(input_x, timestep_, **c).chunk(batch_chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ComfyUI-aki-v1.7_alt\ComfyUI\comfy\model_base.py", line 1009, in apply_model
model_output = self.diffusion_model(xc, t_val, cap_feats, cap_mask, **model_kwargs).float()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ComfyUI-aki-v1.7_alt\python\Lib\site-packages\torch\nn\modules\module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ComfyUI-aki-v1.7_alt\python\Lib\site-packages\torch\nn\modules\module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ComfyUI-aki-v1.7_alt\ComfyUI\comfy\ldm\newbie\model.py", line 932, in forward
x, mask, img_size, cap_size, freqs_cis = self.patchify_and_embed(x, cap_feats, cap_mask, adaln_input)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "H:\ComfyUI-aki-v1.7_alt\ComfyUI\comfy\ldm\newbie\model.py", line 726, in patchify_and_embed
img = img.view(C, H // pH, pH, W // pW, pW).permute(1, 3, 2, 4, 0).flatten(2).flatten(0, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review, I am fix this bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the patchify crash is fixed. The old implementation reshaped using H//ph and W//pw, which fails when the image height/width isn’t divisible by the patch size (common after upscaling). The updated version pads inputs to the nearest patch-size multiple before patchify and crops back to the original size after unpatchify, so it no longer errors on odd/non-divisible resolutions.
In addition, I also refactored the NewBie integration to reuse Lumina NextDiT backbone (NewBie only keeps the next clip specific logic), normalized the loader interface by sanitizing extra kwargs and correctly propagating device/dtype/operations, and fixed dtype/device mismatches to improve stability and performance during inference.
|
My K sampler reported these errors: ... |
|
It seems like, for me at least, |
|
Hi, I'm also trying to add NewBie to ComfyUI in a simpler and more ComfyUI-idiomatic way. I believe it could be the best anime model we have at hand before any anime finetune of Z-Image or Qwen-Image appears. I think we can just follow the direction of #11172 and do minimal changes to the existing Lumina2 class, then we don't need to define a new class for NewBie. (Update: After discussing with the NewBie devs, they think a new class for NewBie may be easier for extensions in future.) Here is a generated image, with a simple workflow in it, without any custom node.
ComfyUI features such as SageAttention and memory swap just work. An interesting finding: Lumina2 sets TODOSliding attention in Gemma is not implemented yet, see ComfyUI/comfy/text_encoders/llama.py Line 390 in 9304e47
so the result is not fully correct when the prompt is longer than 1024 tokens. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
|
Update: I've implemented Jina CLIP v2 in a ComfyUI-idiomatic way at https://github.com/woct0rdho/ComfyUI/tree/newbie . The architecture is put in a single py file, and the weights are also packaged in a single file at https://huggingface.co/woctordho/comfyui-jina-clip-v2 . It does not depend on Transformers and does not download anything from the internet. I've tested that it produces the same Here is an image generated with my forked ComfyUI at commit woct0rdho@98b25d4 , with both Gemma and Jina conditioning:
@E-Anlia You can copy my code if needed. Or you can let me open a separate PR that adds Jina CLIP v2 to ComfyUI if it's more convenient. |
@woct0rdho Thanks for your great work! I tried cherry picking your commit [woct0rdho/ComfyUI@98b25d4] and successfully run newbie model with dual clip Gemma and Jina. But there seems to be something wrong in the output image. See it below |




🧩 What does this PR do?
This PR adds native support for NewBie image models, a Next-DiT based text-to-image architecture, to ComfyUI.
NewBie models are DiT-style (Flow-based) transformers, inspired by Lumina / Next-DiT research, but they are not compatible with existing Lumina or SD-style UNet assumptions.
This PR introduces a dedicated model class and loading path so that NewBie models can be used without modifying or breaking any existing models.
🧠 Why is this needed?
Previously, running NewBie models in ComfyUI required local forks or heavy monkey-patching, often by modifying Lumina-related code paths.
This PR:
Avoids modifying Lumina or any existing model logic
Introduces a clean, isolated NewBie model implementation
Matches the inference behavior that has already been validated in production via custom nodes
🔒 Scope & safety
This PR is intentionally conservative:
✅ No changes to existing Models behavior
✅ No changes to shared attention or sampling utilities
✅ All NewBie logic is isolated under a new model class
✅ If a checkpoint is not detected as NewBie, behavior is unchanged
🧪 Testing
Verified loading and inference with NewBie image models
Confirmed correct timestep direction and conditioning behavior
Confirmed no regression when running existing models
📌 Notes
This PR does not attempt to refactor or optimize existing model code.
Its goal is solely to provide first-class support for a new DiT-based architecture that is already used by the community.