Download image galleries or metadata on the web.
This rewrite is expected to support previous implementation's metadata format.
The main idea was to separate the core (mx-scraper) from the plugins (user defined) as it was not possible from previous implementations.
# pip install beautifulsoup4
# Plugins can be specified with -p or --plugin
# By default, it will be inferred from the args
# Each plugin may have its own set of dependencies that are independent from mx-scraper
# Uses bs4
mx-scraper fetch --plugin images https://www.google.com
# Uses gallery-dl
mx-scraper fetch --meta-only -v https://x.com/afmikasenpai/status/1901323062949159354
mx-scraper fetch -p gallery-dl https://x.com/afmikasenpai/status/1901323062949159354
# Alternatively, to infer batched terms targeting various sources/plugins, prefixing is often required (e.g. id or name)
# The prefix is plugin specific (refer to plugin_name/__init__.py :: mx_is_supported)
mx-scraper fetch --meta-only -v img:https://www.google.com https://mto.to/series/68737
mx-scraper fetch --meta-only -v nh:177013mx-scraper engine
Usage: mx-scraper <COMMAND>
Commands:
fetch Fetch a sequence of terms
fetch-files Fetch a sequence of terms from a collection of files
request Request a url
infos Display various informations
server Spawn a graphql server interfacing mx-scraper
help Print this message or the help of the given subcommand(s)
Options:
-h, --help Print helpEach fetch strategy will share the same configuration..
-
CLI
- Fetch a list of terms
- Fetch a list of terms from a collection of files⌈
- Generic URL Request
- Print as text
- Download
--destflag
- Authentications (Basic, Bearer token)
-
Cookies
- Loading from a file (Netscape format, key-value)
- Loading from the config (key-value)
-
Http Client/Downloader
- Support of older mx-scraper book schema
- Download
- Cache support (can be disabled with
--no-cacheor from config) - Configurable Http Client (default, Flaresolverr, cfworker)
-
Plugins
- Python plugin
-
MxRequestwith runtime context (headers, cookies, auth)
-
- gallery-dl extractors
- Subprocess (e.g. imgbrd-grabber)
- Python plugin
-
Send context from an external source (e.g. browser)
- Cookies, UA (through
--listen-cookies, will open a callback url that can receive aFetchContextobject) - Rendered HTML page
- Cookies, UA (through
You can also use the extractors through GraphQL queries. You will have the same options as the command-line interface.
Usage: mx-scraper server [OPTIONS]
Options:
--port <PORT> Server port
-h, --help Print help
