MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
-
Updated
Dec 25, 2025 - HTML
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
Add a description, image, and links to the webagent topic page so that developers can more easily learn about it.
To associate your repository with the webagent topic, visit your repo's landing page and select "manage topics."