Skip to content

This WordPress plugin automatically scrapes web novels and manga from various sources and publishes them on your Madara-powered website.

License

Notifications You must be signed in to change notification settings

druvx13/ultimate-manga-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ultimate Web Novel & Manga Scraper

Institutional-Grade Documentation Edition

This repository contains the Ultimate Web Novel & Manga Scraper, a comprehensive WordPress plugin designed to automate the ingestion of manga and web novel content. It is engineered to integrate seamlessly with the Madara theme, transforming a standard WordPress installation into a fully automated content aggregation platform.


📖 Table of Contents

  1. Project Overview
  2. Feature Inventory
  3. System Requirements
  4. Technology Stack
  5. Directory Overview
  6. Installation
  7. Environment Setup
  8. Configuration
  9. Database Setup
  10. Admin & System Usage
  11. Development Workflow
  12. Production Deployment
  13. Security Considerations
  14. Limitations & Assumptions
  15. Maintenance
  16. Licensing

1. Project Overview

The plugin operates as a "God Object" within the WordPress ecosystem, specifically targeting the Madara manga theme. It acts as a bridge between external content sources (MangaFox, WuxiaWorld, Madara-based sites, etc.) and the local WordPress database.

It handles the entire lifecycle of content acquisition:

  • Scheduling: Cron-based execution.
  • Fetching: Multi-mode scraping (cURL, PhantomJS, Puppeteer).
  • Processing: HTML parsing, cleaning, and text spinning.
  • Translation: Automated translation via Google/DeepL/Microsoft.
  • Storage: Saving to local FS, DB, or Cloud Storage (S3).

2. Feature Inventory

  • Multi-Source Scraping: Built-in rules for major manga/novel sites.
  • Headless Browser Support: Renders JavaScript-heavy sites using PhantomJS or Puppeteer.
  • Translation Pipeline: Converts content language on-the-fly.
  • Proxy Support: Rotates proxies to bypass IP bans.
  • Cloudflare Bypass: Mechanisms to handle anti-bot protection.
  • Madara Enhancements: specialized module for cloning other Madara sites via AJAX.
  • Auto-Update: Updates existing manga with new chapters automatically.

3. System Requirements

  • CMS: WordPress 5.0+
  • Theme: Madara (Active)
  • Plugin Dependency: Madara Core (WP_MANGA_STORAGE)
  • PHP: 7.4+
  • Extensions: curl, dom, mbstring, json, libxml
  • Optional:
    • Node.js (for Puppeteer)
    • PhantomJS binary
    • shell_exec enabled

4. Technology Stack

  • Language: PHP 7/8
  • Frontend: jQuery (Admin UI)
  • Parsers: PHP Simple HTML DOM Parser, DOMDocument
  • Headless: PhantomJS (JS), Puppeteer (Node.js)
  • Database: MySQL/MariaDB (WordPress Schema + Madara Custom Tables)

5. Directory Overview

See DIRECTORY_STRUCTURE.md for a complete manifest.

  • root: Core logic (ultimate-manga-scraper.php).
  • includes/: Madara integration classes.
  • res/: Libraries, drivers, and admin UI templates.
  • images/, scripts/, styles/: Assets.

6. Installation

See DEPLOYMENT.md for detailed steps.

  1. Upload plugin to /wp-content/plugins/.
  2. Activate via WordPress Admin.
  3. Ensure Madara theme is active.

7. Environment Setup

  • Permissions: Ensure the web server can write to wp-content/uploads and wp-content/plugins/ultimate-manga-scraper.
  • Cron: Disable WP-Cron and setup a system cron for reliability.

8. Configuration

See CONFIGURATION.md.

Configuration is handled via Ultimate Web Novel & Manga Scraper -> Main Settings. Key areas:

  • Headless Settings: Paths to binaries.
  • Translation Keys: API credentials.
  • Storage Backend: Local vs Cloud.

9. Database Setup

The plugin utilizes the standard WordPress wp_options table for storing rules and settings. Content is stored in wp_posts (Manga) and wp_postmeta. Chapter data is managed by Madara's storage engine.

10. Admin & System Usage

  1. Define Rules: Go to the specific scraper tab (e.g., Manga Scraper).
  2. Add URL: Paste the TOC URL of the target manga.
  3. Set Schedule: Define how often to check for updates.
  4. Run: Click "Run This Rule Now" or wait for Cron.
  5. Monitor: Watch the "Activity & Logging" tab.

11. Development Workflow

  • Architecture: See ARCHITECTURE.md.
  • Data Flow: See DATA_FLOW.md.
  • Modifying: Edits should primarily be made in ultimate-manga-scraper.php for core logic, or includes/ for Madara-specific logic.

12. Production Deployment

  • Security: See SECURITY.md.
  • Optimization: Use Redis/Memcached object caching. Use a real Cron job.

13. Security Considerations

  • SSRF: The plugin makes outbound requests to user-defined URLs.
  • RCE: shell_exec is used for headless browsers. Secure your server accordingly.
  • Access Control: Restrict Admin access.

14. Limitations & Assumptions

  • Theme Dependency: Assumes Madara theme structure is present.
  • Site Changes: Scrapers rely on DOM structure. Target site changes will break scraping until updated.
  • Legal: User is responsible for copyright compliance of scraped content.

15. Maintenance

  • Logs: Rotate logs (auto_clear_logs).
  • Updates: Check CHANGELOG.md.

16. Licensing

Released into the Public Domain. See LICENSE for details.


Documentation Index: DOCUMENTATION_INDEX.md

About

This WordPress plugin automatically scrapes web novels and manga from various sources and publishes them on your Madara-powered website.

Resources

License

Security policy

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •