videowipe

Remove hardcoded subtitles, watermarks, and text overlays from video.
Auto-detect targets, generate masks, and inpaint — pip install videowipe and go.

中文

What it does

videowipe detects and removes hardcoded text, watermarks, logos, and timestamps from video. A full pipeline runs in one command: sample frames → detect text regions → select targets (with optional OCR and natural-language intent parsing) → generate masks → inpaint the background.

No manual mask required. The built-in detector handles multilingual content out of the box.

STTN is the default inpainting backend. Any external model can be plugged in via --external-command — ProPainter has been validated as a higher-quality alternative.

Install

Requires Python 3.8+ and either ONNX Runtime or PyTorch.

# If you already have PyTorch:
pip install videowipe

# Lightweight ONNX Runtime backend:
pip install videowipe[onnx]

# Or the PyTorch backend:
pip install videowipe[torch]

# Optional: OCR text recognition for better detection accuracy
pip install videowipe[ocr]

Model weights download automatically on first run to ~/.videowipe/weights/. No manual setup needed.

Usage

Python API

from videowipe import remove_text

# Mask is optional — subtitle regions are auto-detected if omitted
remove_text(
    video="input.mp4",
    output="result/",
)

# Or provide your own mask for full control
remove_text(
    video="input.mp4",
    mask="mask.png",
    output="result/",
)

Full pipeline with target selection

Use task="clean" for the complete detection pipeline with target selection, intent parsing, and OCR:

from videowipe import WipeEngine

engine = WipeEngine(task="clean", detect_mode="balanced", ocr="auto")
engine.process(
    video="input.mp4",
    targets=["subtitle", "watermark"],
    regions=["bottom"],
    intent="remove Chinese subtitles and logo watermark",
    output="result/",
)
engine.cleanup()

Batch processing

Reuse the engine to avoid reloading the model:

from videowipe import WipeEngine

engine = WipeEngine(task="detext")
engine.process(video="clip1.mp4", output="result/")
engine.process(video="clip2.mp4", mask="mask.png", output="result/")
engine.cleanup()

CLI

# Auto-detect and remove all text overlays (recommended)
videowipe clean input.mp4 -o result/

# With manual mask
videowipe clean input.mp4 -m mask.png -o result/

`clean` command options

# Only remove specific target types
videowipe clean input.mp4 --target subtitle
videowipe clean input.mp4 --target watermark

# Target a specific screen region
videowipe clean input.mp4 --region bottom
videowipe clean input.mp4 --region top-right

# Natural language intent
videowipe clean input.mp4 --intent "remove bottom Chinese subtitles"

# Preview detection results without processing
videowipe clean input.mp4 --preview -o result/

# Interactively confirm detected targets
videowipe clean input.mp4 --confirm

Flag	Description	Default
`--target`	Target type to clean (can repeat): `subtitle`, `timestamp`, `watermark`, `logo`	auto-detect all
`--region`	Screen region (can repeat): `top`, `bottom`, `top-left`, `top-right`, `bottom-left`, `bottom-right`, `center`	all regions
`--intent`	Natural-language cleanup intent	—
`--preview`	Write detection artifacts only (no inpainting)	off
`--confirm`	Show detected targets and confirm before processing	off
`--detect-mode`	Detection preset: `fast` (24 frames), `balanced` (50), `sensitive` (80)	`balanced`
`--ocr`	OCR text recognition: `auto`, `off`, `rapidocr`	`auto`
`--agent`	Local LLM CLI for intent-based selection (e.g., `claude`, `codex`)	—
`--external-command`	External inpainting command (bypasses built-in STTN)	—
`-g, --gap`	Segment length per pass; higher = better quality, slower	`200`
`-d, --dual`	Show original video side-by-side in output	off
`-m, --mask`	Mask image path (auto-detect if omitted)	auto

Legacy: detext command

The detext command auto-detects subtitles only. Prefer clean for new usage.

# Auto-detect subtitles
videowipe detext -v input.mp4 -o result/

# With manual mask
videowipe detext -v input.mp4 -m mask.png -o result/

Flag	Description	Default
`-v, --video`	Input video path	required
`-m, --mask`	Mask image path (auto-detect if omitted)	auto
`-o, --output`	Output directory	`result/`
`-w, --weight`	Model weight path. PyTorch accepts `.pth`/`.pt`; ONNX expects a prefix path ending in `.onnx` with matching `_encoder`, `_transformer`, and `_decoder` files.	auto
`-g, --gap`	Segment length per pass; higher = better quality, slower	`200`
`-d, --dual`	Show original video side-by-side in output	off
`--external-command`	External inpainting command (bypasses built-in STTN)	—

External models

Pass --external-command to use any third-party inpainting model instead of the built-in STTN. The command receives <video> <mask> <output_dir> and must produce an output video in the output directory.

ProPainter has been validated as a higher-quality alternative. A ready-to-use wrapper is included:

# Clone ProPainter outside this repo first
git clone https://github.com/sczhou/ProPainter.git ../models/ProPainter

# Use via the named model (recommended)
videowipe clean input.mp4 --model propainter --propainter-dir ../models/ProPainter

# Or via the generic external command (equivalent, now argv-form)
videowipe clean input.mp4 --external-command "python scripts/propainter_wipe.py"

Note: ProPainter requires a GPU with ~16GB VRAM for 480p video and is licensed under NTU S-Lab License 1.0 (non-commercial).

Quality comparison: ProPainter vs STTN

Tested on a multilingual music video (Korean + Burmese subtitles, 852x480, 10s clip). Both models used the same mask.

Original	ProPainter (GPU fp16)	STTN (CPU ONNX)

Comparison images are in pics/comparison/.

Preview

Subtitle removal

Before	After

Watch video

Auto-detection accuracy

Built-in detector locates text regions across multilingual content without manual masks:

Video	Candidates	Selected	Types
Chinese drama	4	2	top subtitle, bottom subtitle
English clip	2	2	bottom subtitle
Music video (Korean + Burmese)	7	5	top watermark, bottom multilingual subtitles

Tested with --detect-mode balanced (50 sampled frames). Green boxes show selected regions for inpainting.

How it works

The pipeline has three stages:

Detection — A DBNet-based text detector samples frames across the video, finds text regions in each frame, clusters them by position, and selects the best preview frame. Supports multilingual content out of the box.
Target selection — Detected regions are classified by type (subtitle, watermark, logo, timestamp). Optional OCR reads the text content. An intent parser (rule-based or LLM-backed via --agent) lets you specify what to remove in natural language.
Inpainting — Masked regions are filled in using temporal information from neighboring frames. The default backend is STTN (8-layer spatial-temporal transformer with CNN encoder). Any external model can be substituted via --external-command.

Docker

No Python? No problem. Run videowipe directly with Docker.

CPU:

docker pull ghcr.io/kkenny0/videowipe:latest
docker run --rm -v "$(pwd)":/data ghcr.io/kkenny0/videowipe clean /data/input.mp4 -o /data/result/

GPU (requires NVIDIA Container Toolkit):

docker pull ghcr.io/kkenny0/videowipe:gpu
docker run --rm --gpus all -v "$(pwd)":/data ghcr.io/kkenny0/videowipe:gpu clean /data/input.mp4 -o /data/result/

Or use the included wrapper script (auto-detects GPU):

./scripts/docker-videowipe.sh clean input.mp4 -o result/

Image	Size	GPU	Notes
`videowipe:latest`	~480 MB	No	CPU only, smallest image
`videowipe:gpu`	~1.4 GB	Yes	ONNX Runtime with CUDA

Build from source

Use --target to select the image variant:

# CPU
docker build --target runtime-cpu -t videowipe:latest .

# GPU (requires NVIDIA Container Toolkit at build time for base image)
docker build --target runtime-gpu --build-arg VARIANT=gpu -t videowipe:gpu .

Note: The GPU image requires a machine with NVIDIA runtime to verify CUDA execution. Without it, ONNX Runtime silently falls back to CPU.

Run after building:

# CPU
docker run --rm -v "$(pwd)":/data videowipe:latest clean /data/input.mp4 -o /data/result/

# GPU
docker run --rm --gpus all -v "$(pwd)":/data videowipe:gpu clean /data/input.mp4 -o /data/result/

Credits

This project builds on STTN and the original Video-Auto-Wipe implementation. The built-in text detection model is from OnnxOCR.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
.github/workflows		.github/workflows
assets		assets
input/detext_examples		input/detext_examples
pics		pics
plans		plans
scripts		scripts
src/videowipe		src/videowipe
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_CN.md		README_CN.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

videowipe

What it does

Install

Usage

Python API

Full pipeline with target selection

Batch processing

CLI

`clean` command options

External models

Preview

Subtitle removal

Auto-detection accuracy

How it works

Docker

Build from source

Credits

License

About

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

videowipe

What it does

Install

Usage

Python API

Full pipeline with target selection

Batch processing

CLI

clean command options

External models

Preview

Subtitle removal

Auto-detection accuracy

How it works

Docker

Build from source

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`clean` command options

Packages