A minimal MediaWiki archiver. This program is intended to export articles from a MediaWiki instance to a SQLite database.
This project's GitHub Releases are weekly archives of 52Poké Wiki, licensed under CC BY-NC-SA 3.0. They are database archives created with mwarchiver, not binaries of mwarchiver. Please review the 52Poké Wiki machine reading rules.
By default the program will load $HOME/.mwarchiver.yaml if the file exists. You can also specify a config file with --config.
Example config:
api_url: https://en.wikipedia.org/w/api.php
user_agent: "mwarchiver (contact: you@example.com)"
db_path: mwarchiver.db
limit: 1000 # Maximum exports of a single namespace
namespaces: [0] # List of namespace numbers: https://www.mediawiki.org/wiki/Help:NamespacesEnvironment variables (prefix MWARCHIVER_):
MWARCHIVER_API_URLMWARCHIVER_USER_AGENTMWARCHIVER_DB_PATHMWARCHIVER_OUTPUT_PATHMWARCHIVER_LIMITMWARCHIVER_NAMESPACES(comma-separated, e.g.0,1,2)
Run with a mounted config file:
docker run --rm \
-v "$PWD/.mwarchiver.yaml:/root/.mwarchiver.yaml:ro" \
-v "$PWD:/data" \
-w /data \
ghcr.io/52poke/mwarchiver:latestRun with env vars instead of a config file:
docker run --rm \
-e MWARCHIVER_API_URL=https://en.wikipedia.org/w/api.php \
-e MWARCHIVER_DB_PATH=/data/mwarchiver.db \
-e MWARCHIVER_LIMIT=1000 \
-e MWARCHIVER_NAMESPACES=0 \
-v "$PWD:/data" \
-w /data \
ghcr.io/52poke/mwarchiver:latestOptional release upload (GitHub Releases):
docker run --rm \
-e MWARCHIVER_API_URL=https://en.wikipedia.org/w/api.php \
-e MWARCHIVER_DB_PATH=/data/mwarchiver.db \
-e RELEASE_UPLOAD=1 \
-e GITHUB_TOKEN=ghp_yourtoken \
-e GITHUB_REPOSITORY=owner/repo \
-v "$PWD:/data" \
-w /data \
ghcr.io/52poke/mwarchiver:latestapiVersion: batch/v1
kind: CronJob
metadata:
name: mwarchiver
spec:
schedule: "0 4 * * 0"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: mwarchiver
image: ghcr.io/52poke/mwarchiver:latest
env:
- name: MWARCHIVER_API_URL
value: "https://en.wikipedia.org/w/api.php"
- name: MWARCHIVER_DB_PATH
value: /data/mwarchiver.db
- name: MWARCHIVER_LIMIT
value: "1000"
- name: MWARCHIVER_NAMESPACES
value: "0"
- name: RELEASE_UPLOAD
value: "1"
- name: GITHUB_REPOSITORY
value: "OWNER/REPO"
- name: GITHUB_TOKEN
valueFrom:
secretKeyRef:
name: mwarchiver-github
key: token
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: mwarchiver-datamwarchiver is licensed under the MIT license.