Skip to content

EEdit⚡: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

License

Notifications You must be signed in to change notification settings

yuriYanZeXuan/EEdit

Repository files navigation

EEdit ⚡️:Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

Zexuan Yan*  •  Yue Ma*  •  Chang Zou  •  Wenteng Chen  •  Qifeng Chen  •  Linfeng Zhang
* Equal Contribution     Corresponding Author

arXiv Project Page

📝 Introduction

EEdit Teaser

Inversion-based image editing is rapidly gaining momentum while suffering from significant computation overhead, hindering its application in real-time interactive scenarios. In this paper, we rethink that the redundancy in inversion-based image editing exists in both the spatial and temporal dimensions, such as the unnecessary computation in unedited regions and the redundancy in the inversion progress.

To tackle these challenges, we propose a practical framework, named EEdit, to achieve efficient image editing. Specifically, we introduce three techniques to solve them one by one:

  • For spatial redundancy, spatial locality caching is introduced to compute the edited region and its neighboring regions while skipping the unedited regions
  • Token indexing preprocessing is designed to further accelerate the caching
  • For temporal redundancy, inversion step skipping is proposed to reuse the latent for efficient editing

Our experiments demonstrate an average of 2.46X acceleration without performance drop in a wide range of editing tasks including prompt-guided image editing, dragging and image composition.

🛠️ Installation

conda create -n eedit python=3.12.3
conda activate eedit
pip install -r EEdit/requirements.txt

📥 Checkpoints & Datasets

All model weights and datasets are from open-source, free and publicly available channels:

We use FLUX-dev as our experimental model. You can obtain it from either:

We use PIE-BENCH as the prompt-guided dataset, you can refer to link

We use TF-ICON benchmark as the ref-guided dataset, you can refer to link

We use DragBench-DR and Drag-Bench-SR as the drag-guided datasets, you can refer to link

For masks generated from mapping_file.json, we provide scripts that follows:

python MyCodes/myutils.py

When all the data and checkpoints are ready, please follow our file directory structure to be compatible with the scripts.

📁 Checkpoints Structure
weights
├── flux1-dev.safetensors
├── model_index.json
├── scheduler
│   └── scheduler_config.json
├── sd_vae_ft_mse
│   ├── config.json
│   └── diffusion_pytorch_model.safetensors
├── text_encoder
│   ├── config.json
│   └── model.safetensors
├── text_encoder_2
│   ├── config.json
│   ├── model-00001-of-00002.safetensors
│   ├── model-00002-of-00002.safetensors
│   ├── model.safetensors.index-1.json
│   └── model.safetensors.index.json
├── tokenizer
│   ├── merges-1.txt
│   ├── merges.txt
│   ├── special_tokens_map-1.json
│   ├── special_tokens_map.json
│   ├── tokenizer_config-1.json
│   ├── tokenizer_config.json
│   ├── vocab-1.json
│   └── vocab.json
├── tokenizer_2
│   ├── special_tokens_map-1.json
│   ├── special_tokens_map.json
│   ├── spiece.model
│   ├── tokenizer.json
│   └── tokenizer_config.json
├── transformer_config.json
└── vae
    ├── config.json
    └── diffusion_pytorch_model.safetensors
📁 Datasets Structure
input
├── composition
│   ├── Real-Cartoon
│   │   ├── 0000 a cartoon animation of a sheep in the forest
│   │   │   ├── bg58.png
│   │   │   ├── cp_bg_fg.jpg
│   │   │   ├── dccf_image.jpg
│   │   │   ├── fg35_63d22cda1f5b66e8e5aca776.jpg
│   │   │   ├── fg35_mask.png
│   │   │   └── mask_bg_fg.jpg
│   ├── Real-Painting
│   ├── Real-Sketch
│   ├── Real-Real
    ...

├── drag_data
│   ├── dragbench-dr
│   │   ├── animals
│   │   │   ├── JH_2023-09-14-1820-16
│   │   │   │   ├── meta_data.pkl
│   │   │   │   ├── meta_data_region.pkl
│   │   │   │   ├── original_image.png
│   │   │   │   └── user_drag.png
│   └── dragbench-sr
│       ├── art_0
│       │   ├── meta_data.pkl
│       │   ├── meta_data_region.pkl
│       │   ├── original_image.png
│       │   └── user_drag.png
    ...

├── inpaint
│   ├── annotation_images 
│   │   ├── 0_random_140
│   │   │   ├── 000000000000.jpg
│   │   │   
│   │   ├── 1_change_object_80
│   │   │   ├── 1_artificial
│   │   │   │   ├── 1_animal
│   │   │   │   │   ├── 111000000000.jpg
│   │   │   │   ├── 2_human
│   │   │   │   │   ├── 112000000000.jpg
│   │   ├── 2_add_object_80
│   │   │   ├── 1_artificial
│   │   │   │   ├── 1_animal
│   │   │   │   │   ├── 211000000000.jpg
        ...
│   ├── mapping_file.json
│   └── masks
│       ├── mask-000.png
|       ├── mask-001.png
|       ├── ...

🚀 Generation

cd EEdit && source run_gen.sh

✨ Hyper-parameters Guidance

As we all know, edited results are affected by many parameters and even random seed. We try our best to explain those parameters and options that may affect image quality, so that you can generate satisfactory image editing results by yourself.

Reference-guided editing (Image Composition)
Parameter Value Description
eta 0.6 Controls the strength of inversion injection during denoising. Higher values preserve more of the original image.
gamma 0.6 Controls the strength of inversion injection during denoising. Higher values preserve more of the original image.
blend_ratio 0 Deprecated parameter, not in use.
start_timestep 0 Fixed parameter, no adjustment needed.
stop_timestep 10 Number of timesteps where inversion affects denoising. Higher values result in output closer to original image.
use_rf_inversion true Fixed parameter, keep as true.
use_cache true
false
Enable caching to accelerate inference. False will perform as non-accelerated pipeline.
num_inference_steps 28 Number of inference steps in the diffusion process.
cascade_num 1/3/5 Controls region score bonus for K-L1 distance neighboring regions. No adjustment needed.
fresh_ratio 0.1 Cache refresh ratio. Fixed parameter, higher is not always better.
fresh_threshold 1/2/3 Complete refresh interval. Set to 2 if edit results don't follow instructions well. Setting to 1 disables acceleration.
soft_fresh_weight 0.25 Fixed parameter, no adjustment needed.
tailing_step 1 Fixed parameter, higher values reduce speed.
inv_skip 2/3/4 Inversion step skipping interval. Default value of 2 is ok.
cache_type "ours_predefine"
"ours_cache"
Use token index preprocessing for faster speed. "ours_cache" disables preprocessing.
Prompt-guided editing
Parameter Value Description
use_cache true Enable caching mechanism to accelerate inference
num_inference_steps 28 Total number of denoising steps in diffusion process
cascade_num 1/3/5 Number of cascade levels for region scoring
fresh_ratio 0.1 Ratio of cache entries to refresh each step, higher value caused lower speed.
fresh_threshold 2/3 Interval for complete cache refresh
soft_fresh_weight 0.25 Weight factor for soft cache refreshing
tailing_step 1 Step interval for tailing cache updates
strength 1.0 Fixed. Overall strength of the editing effect
inv_skip 2/3 Interval for skipping inversion steps
eta/gamma 0.7 Controls strength of inversion injection during denoising
stop_timestep 6 Timestep to stop inversion influence
mask_timestep 18 Timestep to end applying editing mask
cache_type "ours_predefine"
"ours_cache"
Cache strategy type - "ours_predefine" uses TIP while "ours_cache" does not.
Drag-guided editing (some are ommited)
Parameter Value Description
drag_class "copy"/"cut" Controls region handling - "copy" preserves source region latents, "cut" swaps source and target region latents
t_prime_ratio 0.5 Controls dragging strength in (0,1) range. Higher values reduce dragging strength
alpha 1 Noise blending ratio in (0,1) range applied to inversion latents
inv_skip 2/3 Interval for skipping inversion steps

📝 TODO List

  • Release evaluation code
  • Release notebook and Hugging Face demo
  • Develop more user-friendly interaction logic and experience, including Gradio interface

🙏 Acknowledgements

  • Thanks to ToCa for cache implementations
  • Thanks to Diffusers for pipeline implementations
  • Thanks to Region Drag for dragging implementations

📝 BibTeX

@misc{yan2025eeditrethinkingspatial,
      title={EEdit : Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing}, 
      author={Zexuan Yan and Yue Ma and Chang Zou and Wenteng Chen and Qifeng Chen and Linfeng Zhang},
      year={2025},
      eprint={2503.10270},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2503.10270}, 
}

📧 Contact

yzx_ustc@mail.ustc.edu.cn

About

EEdit⚡: Rethinking the Spatial and Temporal Redundancy for Efficient Image Editing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published