Skip to content

futureg-lab/mx-scraper

Repository files navigation

mx-scraper

Download image galleries or metadata on the web.

This rewrite is expected to support previous implementation's metadata format.

The main idea was to separate the core (mx-scraper) from the plugins (user defined) as it was not possible from previous implementations.

Usage

# pip install beautifulsoup4

# Plugins can be specified with -p or --plugin
# By default, it will be inferred from the args
# Each plugin may have its own set of dependencies that are independent from mx-scraper
# Uses bs4
mx-scraper fetch --plugin images https://www.google.com
# Uses gallery-dl
mx-scraper fetch --meta-only -v https://x.com/afmikasenpai/status/1901323062949159354
mx-scraper fetch -p gallery-dl https://x.com/afmikasenpai/status/1901323062949159354

# Alternatively, to infer batched terms targeting various sources/plugins, prefixing is often required (e.g. id or name)
# The prefix is plugin specific (refer to plugin_name/__init__.py :: mx_is_supported)
mx-scraper fetch --meta-only -v img:https://www.google.com https://mto.to/series/68737
mx-scraper fetch --meta-only -v nh:177013

Commands

mx-scraper engine

Usage: mx-scraper <COMMAND>

Commands:
  fetch        Fetch a sequence of terms
  fetch-files  Fetch a sequence of terms from a collection of files
  request      Request a url
  infos        Display various informations
  server       Spawn a graphql server interfacing mx-scraper
  help         Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help

Each fetch strategy will share the same configuration..

Features

  • CLI

    • Fetch a list of terms
    • Fetch a list of terms from a collection of files⌈
    • Generic URL Request
      • Print as text
      • Download --dest flag
    • Authentications (Basic, Bearer token)
  • Cookies

    • Loading from a file (Netscape format, key-value)
    • Loading from the config (key-value)
  • Http Client/Downloader

    • Support of older mx-scraper book schema
    • Download
    • Cache support (can be disabled with --no-cache or from config)
    • Configurable Http Client (default, Flaresolverr, cfworker)
  • Plugins

    • Python plugin
      • MxRequest with runtime context (headers, cookies, auth)
    • gallery-dl extractors
    • Subprocess (e.g. imgbrd-grabber)
  • Send context from an external source (e.g. browser)

    • Cookies, UA (through --listen-cookies, will open a callback url that can receive a FetchContext object)
    • Rendered HTML page

GraphQL server

You can also use the extractors through GraphQL queries. You will have the same options as the command-line interface.

Usage: mx-scraper server [OPTIONS]

Options:
      --port <PORT>  Server port
  -h, --help         Print help

Playground Screenshot

Releases

No releases published

Packages

No packages published