This custom node integrates the LongCat-Image pipeline into ComfyUI, enabling text-to-image generation and image editing with the LongCat-Image models.
- Text-to-Image Generation: Generate high-quality images from text prompts using LongCat-Image models
- Image Editing: Edit existing images with instruction-based prompts using LongCat-Image-Edit models
- Chinese Text Support: Excellent Chinese text rendering capabilities
- Efficient: Only 6B parameters with competitive performance
Search for comfyui_longcat_image in comfyui custom nodes manager / install missing node directly at the workflow provided
Or you could install manually with this command
cd custom_nodes/comfyui_longcat_image
pip install -r requirements.txtFor ~2x faster inference, install SageAttention and enable sage at attention_backend option in model loader node:
pip install sageattentionRequirements: CUDA-capable NVIDIA GPU with PyTorch CUDA support.
Download the models using huggingface-cli:
pip install -U huggingface_hub
# For text-to-image
hf download meituan-longcat/LongCat-Image --local-dir models/diffusion_models/LongCat-Image
# For image editing
hf download meituan-longcat/LongCat-Image-Edit --local-dir models/diffusion_models/LongCat-Image-Edit
# For fine-tuning (optional)
hf download meituan-longcat/LongCat-Image-Dev --local-dir models/diffusion_models/LongCat-Image-DevLoads a LongCat-Image model for use with other nodes.
Inputs:
model_path: Path to the model directory (e.g., "LongCat-Image" or "LongCat-Image-Edit")dtype: Data type for model weights (bfloat16, float16, float32)enable_cpu_offload: Enable CPU offload to save VRAM (false/true, default: true)attention_backend: Choose attention backend - "default" or "sage" (default: default)
Outputs:
LONGCAT_PIPE: Pipeline object for use with generation nodes
The model loader supports low VRAM mode via the enable_cpu_offload option:
-
Disabled: All models loaded to GPU at once
- Faster inference
- Requires more VRAM (typically ~24GB+)
-
Enabled (default): Models offloaded to CPU when not in use
- Slower inference (due to model transfers)
- Requires only ~17-19GB VRAM
- Prevents Out-of-Memory errors on lower-end GPUs
When to use CPU offload:
- GPUs with less than 24GB VRAM
- When experiencing OOM errors
- When running multiple models simultaneously
The model loader supports an optional SageAttention backend for improved inference speed:
-
default: Uses PyTorch's standard scaled dot product attention
- Works on all systems (CPU/GPU)
- Standard performance
-
sage: Uses SageAttention for accelerated attention computation
- ~2x faster inference speed compared to default attention
- Requires CUDA-capable GPU
- Requires the
sageattentionpackage (see installation section above) - Automatically falls back to default attention for unsupported operations
To use SageAttention:
- Install the sageattention package:
pip install sageattention- Set
attention_backendto "sage" in the Model Loader node
Requirements:
- CUDA-capable NVIDIA GPU
- PyTorch with CUDA support
- The
sageattentionpackage installed
Generates images from text prompts.
Inputs:
LONGCAT_PIPE: Pipeline from the model loaderprompt: Text description of the image to generatenegative_prompt: Things to avoid in the generated imagewidth: Image width (default: 1344)height: Image height (default: 768)steps: Number of inference steps (default: 50)guidance_scale: CFG scale (default: 4.5)seed: Random seedenable_cfg_renorm: Enable CFG renormalization (true/false)enable_prompt_rewrite: Enable built-in prompt rewriting (true/false)
Outputs:
IMAGE: Generated image
Edits images based on instruction prompts.
Inputs:
LONGCAT_PIPE: Pipeline from the model loader (must be an edit model)image: Input image to editprompt: Edit instructionnegative_prompt: Things to avoid in the edited imagesteps: Number of inference steps (default: 50)guidance_scale: CFG scale (default: 4.5)seed: Random seed
Outputs:
IMAGE: Edited image
Example workflow JSON files are provided in this directory:
example_workflow_t2i.json- Text-to-image generation workflowexample_workflow_edit.json- Image editing workflow
You can load these workflows in ComfyUI by dragging and dropping the JSON file onto the canvas.
-
Add a LongCat-Image Model Loader node
- Set
model_pathto "LongCat-Image"
- Set
-
Add a LongCat-Image Text to Image node
- Connect the loader output to the pipeline input
- Enter your prompt
- Adjust settings as needed
-
Add a Save Image node to save the output
-
Add a LongCat-Image Model Loader node
- Set
model_pathto "LongCat-Image-Edit"
- Set
-
Add a Load Image node to load your input image
-
Add a LongCat-Image Edit node
- Connect the loader output to the pipeline input
- Connect the image to edit
- Enter your edit instruction (e.g., "将猫变成狗" - "change the cat to a dog")
-
Add a Save Image node to save the output
| Model | Type | Description |
|---|---|---|
| LongCat-Image | Text-to-Image | Final release model for out-of-the-box inference |
| LongCat-Image-Dev | Text-to-Image | Mid-training checkpoint, suitable for fine-tuning |
| LongCat-Image-Edit | Image Editing | Specialized model for image editing |
- Parameters: 6B (highly efficient)
- Supported Resolutions: 768x1344 and variations
- Chinese Text Support: Industry-leading Chinese dictionary coverage
- Quality: Competitive with much larger models
| Backend | Speed | Requirements | When to Use |
|---|---|---|---|
| default | 1x (baseline) | Any system | General use, CPU inference |
| sage | ~2x faster | CUDA GPU + sageattention package | Maximum speed on NVIDIA GPUs |
Note: SageAttention provides approximately 2x speed improvement for attention operations on CUDA GPUs while maintaining output quality.
| Mode | VRAM Required | Speed | When to Use |
|---|---|---|---|
| Standard (CPU offload disabled) | ~24GB+ | Faster | High-end GPUs (e.g., RTX 3090, 4090, A100) |
| Low VRAM (CPU offload enabled) | ~17-19GB | Slower | Mid-range GPUs (e.g., RTX 3080, 4080) |
Note: The Low VRAM mode uses CPU offloading to transfer models between CPU and GPU as needed, reducing VRAM usage at the cost of slower inference speed.
- For better results, use a strong LLM for prompt engineering
- The model has excellent Chinese text rendering capabilities
- Enable prompt rewriting for enhanced generation quality
- Default guidance scale of 4.5 works well for most cases
LongCat-Image is licensed under Apache 2.0. See the LongCat-Image repository for more information.