中文 | English
Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.
👆 Click above or https://www.editbanana.net/ to try Edit Banana online! Upload an image to get editable DrawIO (XML) in seconds.
Warning
Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.
Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:
Scan to join the Edit Banana community
Tip
If the QR code has expired, please submit an Issue to request an updated one.
For academic cooperation, technical docking, commercial licensing, project customization and other business inquiries, please contact us via email:
E-mail: ccl@bit.edu.cn
- 📸 Effect Demonstration
- 🚀 Key Features
- 🛠️ Architecture Pipeline
- 📂 Project Structure
- 📦 Installation & Setup
- 🔤 Usage
- ⚙️ Configuration
- 📌 Development Roadmap
- 💬 Join WeChat Group
- 🤝 Contribution Guidelines
- 🤩 Contributors
- 📄 License
- 🌟 Star History
To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 4 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.
Note
✨ Conversion Highlights:
- Preserves the layout logic, color matching, and element hierarchy of the original diagram.
- 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness).
- Accurate text recognition, supporting direct subsequent editing and format adjustment.
- All elements are independently selectable, supporting native DrawIO template replacement and layout optimization.
-
Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
-
Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs.
-
Text Recognition:
- Local OCR for text localization; easy to install, runs offline.
- Pix2Text for mathematical formula recognition and LaTeX conversion .
- Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to the formula engine.
-
User System:
- Registration: New users receive 10 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
- Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
- Input: Image (PNG/JPG/BMP/TIFF/WebP).
- Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
- Text Extraction (Parallel):
- Local OCR (Tesseract) detects text bounding boxes.
- High-res crops of text/formula regions are sent to Pix2Text for LaTeX conversion.
- DrawIO XML Generation: Merging spatial data from SAM3 and text OCR results.
Click to expand project structure
Edit-Banana/
├── config/ # Configuration files (copy config.yaml.example → config.yaml)
├── flowchart_text/ # OCR & Text Extraction Module (standalone entry)
│ ├── src/
│ └── main.py # OCR-only entry point
├── input/ # [Manual] Input images directory
├── models/ # [Manual] Model weights (SAM3) and optional BPE vocab
├── output/ # [Manual] Results directory
├── sam3/ # SAM3 library (see Installation: install from facebookresearch/sam3)
├── sam3_service/ # SAM3 HTTP service (optional, for multi-process deployment)
├── scripts/ # Setup and utility scripts
│ ├── setup_sam3.sh # Install SAM3 lib and copy BPE to models/
│ ├── setup_rmbg.py # Download RMBG model from ModelScope
│ └── merge_xml.py # XML merge utilities
├── main.py # CLI entry (modular pipeline)
├── server_pa.py # FastAPI backend server
└── requirements.txt # Python dependencies
Follow these core phases to set up the project locally.
Configure your base environment and directory structure.
-
Python 3.10+** & CUDA-capable GPU (Highly recommended)
-
Install PyTorch with CUDA support (e.g., for CUDA 11.8):
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Edit-Banana
mkdir -p input output sam3_outputNext, install the required packages and download necessary model weights (which should be placed in models/ and not committed).
pip install -r requirements.txt-
SAM3 Library & BPE: Run
bash scripts/setup_sam3.shto install the lib and copy the BPE vocab tomodels/. Verify with:python -c "from sam3.model_builder import build_sam3_image_model; print('OK')" -
SAM3 Weights: Download sam3.pt from ModelScope or Hugging Face and place it under
models/sam3_ms. -
Text Local OCR (Tesseract):
sudo apt install tesseract-ocr tesseract-ocr-chi-sim
🧩 Optional Capabilities (OCR Engine, Formula, RMBG) - Click to expand
-
PaddleOCR (Alternative/Better for mixed text): Use paddlepaddle==3.2.2 (avoiding 3.3.0 bug).
pip install paddlepaddle==3.2.2 paddleocr.
-
Formula (Pix2Text):
pip install pix2text onnxruntime-gpu.
-
Background Removal (RMBG):
pip install onnxruntime modelscopethen runpython scripts/setup_rmbg.py.
Copy the example config and adjust the asset paths:
cp config/config.yaml.example config/config.yamlEdit config.yaml to ensure sam3.checkpoint_path and sam3.bpe_path match your models/ locations.
🛠️ Before First Run Checklist & Troubleshooting - Click to expand
Checklist:
- Config files copied and model paths set in
config.yaml - SAM3 weights (
sam3.pt) and BPE vocab placed undermodels/ - Extracted SAM3 library via
scripts/setup_sam3.shTesseract or PaddleOCR installed
Common Issues:
- "no kernel image is available...": GPU arch mismatch. Upgrade PyTorch or set
sam3.device: "cpu". - "Model file not found at ...rmbg/...": RMBG is optional. Enable by downloading via script.
- "PaddleOCR inference failed...": Use
paddlepaddle==3.2.2or fallback to Tesseract.
Supports image files (PNG, JPG, BMP, TIFF, WebP). To process a single image:
python main.py -i input/test_diagram.pngThe output XML will be saved in the output/ directory. For batch processing, put images in input/ and run python main.py without -i.
-
One-time setup
git clone https://github.com/BIT-DataLab/Edit-Banana.git && cd Edit-Banana python3 -m venv .venv && source .venv/bin/activate # Linux/macOS; Windows: .venv\Scripts\activate pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 # or CPU build pip install -r requirements.txt sudo apt install tesseract-ocr tesseract-ocr-chi-sim # OCR (or equivalent on your OS)
Install the SAM3 library and download model weights + BPE. Then:
mkdir -p input output cp config/config.yaml.example config/config.yaml # Edit config/config.yaml: set sam3.checkpoint_path and sam3.bpe_path to your models/ paths -
Test with CLI
# Put a diagram image in input/, e.g. input/test.png python main.py -i input/test.png # Output appears under output/<image_stem>/ (DrawIO XML and intermediates)
-
Optional: test the web API
python server_pa.py # In another terminal: curl -X POST http://localhost:8000/convert -F "file=@input/test.png" # Or open http://localhost:8000/docs and use the /convert endpoint with a file upload
Customize the pipeline behavior in config/config.yaml:
-
sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
-
paths: Set input/output directories.
-
dominant_color: Fine-tune color extraction sensitivity.
| Feature Module | Status | Description |
|---|---|---|
| Core Conversion Pipeline | ✅ Completed | Full pipeline of segmentation, reconstruction and OCR |
| Intelligent Arrow Connection | Automatically associate arrows with target shapes | |
| DrawIO Template Adaptation | 📍 Planned | Support custom template import |
| Batch Export Optimization | 📍 Planned | Batch export to DrawIO files (.drawio) |
| Local LLM Adaptation | 📍 Planned | Support local VLM deployment, independent of APIs |
Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):
- Fork this repository
- Create a feature branch (
git checkout -b feature/xxx) - Commit your changes (
git commit -m 'feat: add xxx') - Push to the branch (
git push origin feature/xxx) - Open a Pull Request
Bug Reports: Issues Feature Suggestions: Discussions
This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).
🌟 If this project helps you, please star it to show your support!










