Paper Digest is a small but production-minded Python project for pulling the latest research papers every day and turning them into a readable digest.
The current scope is intentionally narrow:
- Fetch the newest papers from arXiv, Crossref, PubMed, Semantic Scholar, and OpenAlex.
- Apply include and exclude keyword filters on title and abstract.
- Optionally enrich selected papers with structured LLM analysis.
- Generate machine-readable
JSONand human-readableMarkdown. - Build a static archive site with search, feed subscriptions, topic tracking, canonical paper detail pages, rising-paper views, trend views, notification-history views, and RSS subscription feeds.
- Persist state to avoid repeating already-sent papers.
- Optionally deliver the digest through SMTP email, Feishu webhooks, WeCom webhooks, Slack incoming webhooks, Discord incoming webhooks, or Telegram bots.
- Stay easy to automate from
cron, GitHub Actions, or a notification bot.
This repository is structured to grow like a real open-source project rather than a one-off script. The baseline includes:
- Clear packaging metadata and typed Python modules.
- Config validation with actionable error messages.
- Unit tests for config loading, parsing, filtering, and service orchestration.
- CI-ready commands for coverage-gated tests, linting, type checking, and build validation.
- Contributor-facing docs such as
LICENSE,CONTRIBUTING.md, andSECURITY.md. - Maintainer automation such as
pre-commit, grouped Dependabot updates, dependency review, workflow lint, community triage automation, PR hygiene checks, issue forms, tag-based release builds, and docs-reference consistency checks across Markdown pages, GitHub metadata, maintainer-doc entry maps, and release-lifecycle artifacts, maintainer issue templates, lifecycle cross-link fields, issue close-out rules, summary-template default semantics, structured workflow/issue-form lifecycle blocks, and a repo-local lifecycle contract schema.
- Full config reference and commented examples:
config.example.toml - Small starting profiles for common setups:
docs/config-recipes.md - Ready-to-edit Feishu LM, agent runtime security, and Terminal/SWE agent arXiv
morning digest config:
examples/feishu-lm-arxiv.toml - Runtime and platform support policy:
docs/compatibility-matrix.md - Label taxonomy and triage labels:
docs/label-taxonomy.md - Security disclosure and support routing:
SECURITY.mdandSUPPORT.md - Governance, roadmap, and maintainer ownership:
GOVERNANCE.md,docs/roadmap-policy.md, anddocs/maintainer-rotation.md - Proposal flow, discussion categories, and ADRs:
docs/discussions-policy.md,docs/adr/README.md, anddocs/adr/0000-template.md - Review and branch protection policy:
docs/review-policy.mdanddocs/branch-protection-policy.md - Manual GitHub admin settings checklist:
docs/repository-settings-checklist.md - Rulesets and maintainer saved replies:
docs/ruleset-policy.mdanddocs/saved-replies.md - Maintainer onboarding, offboarding, and access review:
docs/maintainer-access-policy.md - Maintainer operations hub:
docs/maintainer-operations-hub.md - Quarterly repository-operations review checklist and summary template:
docs/quarterly-maintainer-review.md - Release cadence and lifecycle runbook:
docs/release-cadence-policy.mdanddocs/release-lifecycle-runbook.md - Release and maintainer-operations history index:
docs/operations-history.md - Post-release verification and next-cycle checklist:
docs/post-release-checklist.md - Release checklist and release-notes guidance:
RELEASING.md - Maintainer issue-handling rules:
docs/issue-triage.md - Maintainer workflow inventory and CI policy:
docs/maintainer-guide.md - Dependency update strategy:
docs/dependency-policy.md - Architecture notes:
docs/architecture.md
Paper Digest intentionally keeps a narrow support surface.
| Surface | Status | Notes |
|---|---|---|
| CPython 3.12 | Supported | Required by local setup, CI, and release validation. |
| CPython 3.13+ | Expected, not CI-gated yet | Validate manually before advertising broader support. |
| PyPy or CPython < 3.12 | Unsupported | Not tested or documented. |
GitHub Actions on ubuntu-latest |
Supported | This is the production runner for CI and scheduled jobs. |
| Local macOS and Linux runs | Supported on a best-effort basis | The CLI is stdlib-only at runtime, but workflow examples assume a POSIX shell. |
If you widen the supported matrix, update the compatibility doc, CI, and release notes together.
Create a virtual environment and install the project:
python -m venv .venv
. .venv/bin/activate
python -m pip install -e '.[dev]'- Choose a config starting point.
- For a fully commented reference, copy
config.example.toml. - For a smaller profile such as "local smoke test" or "GitHub Actions
schedule", start from
docs/config-recipes.md. - For a Feishu morning digest focused on LM, agent runtime security, and
Terminal/SWE agent papers from arXiv, copy
examples/feishu-lm-arxiv.toml.
- Copy a config into the local ignored
config.tomlfile:
cp config.example.toml config.toml- For a first local smoke test, keep
[analysis] enabled = falseand leave[[deliveries]]commented out or use a webhook placeholder. Then generate the digest:
python -m paper_digest --config config.toml- Inspect the outputs:
output/latest.jsonoutput/latest.mdoutput/site/index.htmloutput/site/reading-list.htmloutput/site/review-queue.htmloutput/site/weekly-review.htmloutput/YYYY-MM-DD/digest.jsonoutput/YYYY-MM-DD/digest.md
- When the local output looks right, add the full
config.tomlcontent to the GitHub repository secretPAPER_DIGEST_CONFIG_TOMLand runDaily Digestmanually once before relying on the daily schedule.
Example:
[app]
timezone = "Asia/Shanghai"
lookback_hours = 24
output_dir = "output"
request_delay_seconds = 3
request_timeout_seconds = 60
fetch_retry_attempts = 4
fetch_retry_backoff_seconds = 10
[[feeds]]
name = "LLM"
categories = ["cs.AI", "cs.CL", "cs.LG"]
keywords = ["agent", "reasoning", "alignment"]
exclude_keywords = ["survey"]
max_results = 100
max_items = 15Field reference:
timezone: Timezone used for display and output folder naming.lookback_hours: Papers older than this time window are ignored.output_dir: Directory where dated and latest digests are written.request_delay_seconds: Delay between source HTTP requests.request_timeout_seconds: Per-request timeout for arXiv, Crossref, PubMed, Semantic Scholar, and OpenAlex fetches.fetch_retry_attempts: Maximum number of fetch attempts for transient failures.fetch_retry_backoff_seconds: Base backoff used between retry attempts.openalex_api_key_env: Optional environment variable name for an OpenAlex API key on manual or scheduled runs.state: Persistent history used for deduplication across runs.feedback: Local per-paper feedback state keyed bycanonical_id.notify: Feedback-driven notification focus rules.source:arxiv,crossref,pubmed,semantic_scholar, oropenalex.categories: arXiv categories such ascs.AI,cs.CL, orcs.CV. When the arXiv API is temporarily rate-limited, arXiv feeds fall back to category RSS endpoints before failing the digest run.queries: Required forcrossref,pubmed,semantic_scholar, andopenalexfeeds.types: Optional Crossref work types such asjournal-article, PubMed publication types such asJournal ArticleorReview, or Semantic Scholar publication types such asRevieworJournalArticle, or OpenAlex work types such asarticleorpreprint.keywords: Keep a paper when any keyword matches title or abstract.exclude_keywords: Drop a paper when any excluded keyword matches.max_results: Number of newest candidates fetched before local filtering.max_items: Maximum number of papers emitted for that feed.sort_by: Optional per-feed override forrelevance,published_at, orhybrid.digest: Rendering options for template selection and feed-level briefings.ranking: Default ranking strategy and relevance-weight tuning.analysis: Optional structured paper analysis, currently backed by OpenAI.deliveries: Optional notification outputs such as email, Feishu webhook, WeCom webhook, Slack webhook, Discord webhook, or Telegram bot.output/site: Generated static archive site for historical browsing.
Digest rendering:
[digest]
template = "default"
top_highlights = 3
feed_key_points = 3Ranking strategy:
[ranking]
sort_by = "hybrid"
title_match_weight = 40
summary_match_weight = 18
doi_weight = 12
pdf_weight = 8
rich_summary_weight = 6
metadata_weight = 4
multi_source_weight = 10
freshness_weight_cap = 24Optional LLM analysis:
[analysis]
enabled = true
provider = "openai"
model = "gpt-5-mini"
api_key_env = "OPENAI_API_KEY"
base_url = "https://api.openai.com/v1/responses"
timeout_seconds = 60
max_papers = 8
max_output_tokens = 600
language = "English"
reasoning_effort = "minimal"Digest notes:
feed_key_pointscontrols how many feed-level "today's key points" lines appear before the detailed paper list.sort_by = "hybrid"is the default and keeps the current behavior: relevance-first ranking withpublished_atas the tie-breaker.sort_by = "published_at"keeps the newest papers first and usesrelevance_scoreonly as an explanatory secondary signal.sort_by = "relevance"keeps the strongest keyword and metadata matches at the top, even when several papers are similarly recent.template = "zh_daily_brief"switches the output into a Chinese briefing layout with a topic-organized "今日重点" section plus per-feed "本组速览".zh_daily_briefworks even when analysis is disabled. In that mode, the project generates rule-based Chinese briefing scaffolding around the raw paper title and abstract summary, including high-frequency topic extraction, rule-based tags such as方法/数据/应用, and topic-oriented highlights.- The JSON output now records the active sorting summary, per-feed
sort_by,relevance_score, andmatch_reasonsso downstream archive pages and integrations can explain why each paper surfaced.
Feedback loop:
[feedback]
enabled = true
path = ".paper-digest-state/feedback.json"
star_boost = 80
follow_up_boost = 35
reading_boost = 18
done_penalty = 20
ignore_penalty = 120
hide_ignored = true- Feedback is keyed by canonical paper identity: DOI first, then arXiv id, then a normalized title fallback.
- Supported statuses are
star,follow_up,reading,done, andignore. star,follow_up, andreadingboost ranking;donelowers priority;ignoreeither hides papers or down-ranks them, depending onhide_ignored.- Each feedback entry can also carry a free-form
note, a concretenext_action, an optionaldue_date, a temporarysnoozed_until, and an optionalreview_interval_days, so you can record both why you marked a paper and how it should re-enter your workflow later. - The archive site exposes a dedicated
output/site/reading-list.htmlpage that aggregates starred, follow-up, and in-progress papers. - The archive site also exposes
output/site/weekly-review.html, which groups papers into unfinished backlog, continuously overdue work, recurring-review returns, snoozed items, and completed work. - The archive site exposes
output/site/review-queue.html, which highlights overdue items, papers due within 3 days, queued next actions, newly surfaced unmarked papers, and resurfaced follow-ups. - The archive site exposes
output/site/notification-history.html, which visualizes the remembered action-notification state that suppresses duplicateAction Briefreasons across runs. - Paper detail pages, reading lists, weekly review sections, review queues, and Focus outputs all surface those feedback notes, next actions, and due dates once they are present.
- Canonical detail pages also show the most recent remembered action
notifications, so you can tell which reasons have already been sent and why
a paper may be absent from today's
Action Brief.
Notification focus:
[notify]
feedback_only = false
include_new_starred = true
include_follow_up_resurfaced = true
include_starred_momentum = true
max_focus_items = 5
max_action_items = 5
action_overdue_only = false
# action_due_within_days = 7- Notification outputs now include a dedicated
Focusblock when a paper was newly starred, afollow_uppaper resurfaced in the current scan, or a starred paper newly entered the momentum view. feedback_only = trueturns webhook or email notifications into a feedback-driven briefing that only pushes the Focus and action sections.- Focus items explain why they were pushed, preserve the paper's
star/follow_upstatus, and surface coverage context such as active days, feed span, and appearance count. - Daily digests now also include a dedicated "本周该处理什么" section for
newly changed action states such as snooze resumes, first
due_soonentries, overdue escalations, and recurring reviews that are now due, while the weekly review page keeps the longer backlog. max_action_itemscaps how many action reminders get rendered into one run.action_overdue_only = truenarrows action reminders to already overdue items.action_due_within_days = 7is the lighter-weight alternative when you want to keep only near-term action reminders.- These
[notify]action settings define the global action pool before any delivery-specific filters are applied.
Example feedback file:
{
"version": 1,
"papers": {
"doi:10.5555/paper-circle": {
"status": "star",
"updated_at": "2026-04-10T09:15:00+08:00",
"note": "use this as the anchor paper for next week's review",
"next_action": "compare section 4 with the baseline table",
"due_date": "2026-04-18",
"snoozed_until": "2026-04-20",
"review_interval_days": 14
},
"arxiv:2604.00001": "reading",
"title:example-normalized-title": "done"
}
}You can manage that file without editing JSON directly:
python -m paper_digest feedback set 'doi:10.5555/paper-circle' star --config config.toml
python -m paper_digest feedback set 'doi:10.5555/paper-circle' follow_up --config config.toml
python -m paper_digest feedback set 'doi:10.5555/paper-circle' reading --config config.toml
python -m paper_digest feedback set 'doi:10.5555/paper-circle' done --config config.toml
python -m paper_digest feedback set 'doi:10.5555/paper-circle' star --note 'anchor paper for review' --config config.toml
python -m paper_digest feedback note 'doi:10.5555/paper-circle' 'compare section 4 with baseline table' --config config.toml
python -m paper_digest feedback action set 'doi:10.5555/paper-circle' 'compare baseline table' --config config.toml
python -m paper_digest feedback due set 'doi:10.5555/paper-circle' 2026-04-18 --config config.toml
python -m paper_digest feedback snooze set 'doi:10.5555/paper-circle' 2026-04-20 --config config.toml
python -m paper_digest feedback interval set 'doi:10.5555/paper-circle' 14 --config config.toml
python -m paper_digest feedback action clear 'doi:10.5555/paper-circle' --config config.toml
python -m paper_digest feedback due clear 'doi:10.5555/paper-circle' --config config.toml
python -m paper_digest feedback snooze clear 'doi:10.5555/paper-circle' --config config.toml
python -m paper_digest feedback interval clear 'doi:10.5555/paper-circle' --config config.toml
python -m paper_digest feedback clear-note 'doi:10.5555/paper-circle' --config config.toml
python -m paper_digest feedback sync --direction push --config config.toml
python -m paper_digest feedback sync --direction pull --config config.toml
python -m paper_digest feedback sync --direction pull --merge-strategy newer --config config.toml
python -m paper_digest feedback clear 'doi:10.5555/paper-circle' --config config.toml
python -m paper_digest feedback list --config config.tomlTo sync your local feedback state into or back out of GitHub Actions without hand-copying JSON:
python -m paper_digest feedback sync --direction push --config config.toml
python -m paper_digest feedback sync --direction push --repo X-PG13/paper-digest --secret-name PAPER_DIGEST_FEEDBACK_JSON --config config.toml
python -m paper_digest feedback sync --direction pull --config config.toml
python -m paper_digest feedback sync --direction pull --merge-strategy local --config config.toml
python -m paper_digest feedback sync --direction pull --merge-strategy remote --config config.toml
python -m paper_digest feedback sync --direction pull --dry-run --show-diff --config config.toml
python -m paper_digest feedback sync --direction push --dry-run --show-diff --config config.toml
python -m paper_digest state action list --config config.toml
python -m paper_digest state action reset 'doi:10.5555/paper-circle' --config config.toml
python -m paper_digest state action reset --reason overdue_3d --config config.toml
python -m paper_digest state action reset --reason overdue_3d --dry-run --show-match --config config.toml
python -m paper_digest state action reset --reason due_soon --before 2026-04-15 --config config.toml
python -m paper_digest state action sync --direction push --config config.toml
python -m paper_digest state action sync --direction pull --config config.toml
python -m paper_digest state action sync --direction pull --dry-run --show-diff --config config.tomlNotes:
feedback sync --direction pushwrites the current local feedback payload into a GitHub Actions repository secret by callinggh secret set.feedback sync --direction pulldispatches a short-lived GitHub Actions workflow that materializes the current feedback secret into a one-day artifact, then downloads it back into your localfeedback.json.- Pull supports
--merge-strategy newer|local|remote.newerprefers the entry with the latestupdated_at,localpreserves the current file when both sides define the same paper, andremoteforce-prefers the GitHub secret copy. --dry-runpreviews the sync result without writing the localfeedback.jsonor mutating the GitHub Actions secret.--show-diffprints a field-level diff so you can inspect changes tostatus,note,next_action,due_date,snoozed_until,review_interval_days, andupdated_atbefore you apply them.state action listshows the rememberedcanonical_id + reasonentries that currently suppress repeated action reminders.state action resetre-arms action notifications for one paper or one reason code without requiring a manual edit to the persisted state file.state action reset --dry-run --show-matchpreviews the exactcanonical_id + reason + notified_atrows that would be re-armed.state action reset --before YYYY-MM-DDnarrows resets to older remembered notifications, which is useful when you only want to re-arm stale entries.state action sync --direction pushwrites the current remembered action notification state into the GitHub Actions cache used by scheduled digest runs, without overwriting feed-level seen-paper history.state action sync --direction pullexports the current GitHub Actions-side action notification snapshot into your localstate.json, so local resets can inspect or mirror the online suppression state.state action sync --dry-run --show-diffpreviews added, updated, and removedcanonical_id + reason + notified_atentries before either local or remote action state is written.- Push previews fetch the current remote feedback state through the same
short-lived pull workflow, so
feedback sync --direction push --dry-runshows what the secret would change to before it is overwritten. - If
--repois omitted, the command derivesowner/repofrom the current gitoriginremote. - Pulling uses the dedicated
feedback-secret-sync.ymlworkflow because GitHub Actions secrets are write-only through the direct API. - Action-state sync uses the dedicated
action-state-sync.ymlworkflow to restore or replace only the rememberedaction_notificationscache that drivesAction Briefsuppression andnotification-history.html. - Because pull temporarily exports the secret into an artifact, use it only on repositories and GitHub accounts you trust.
- When a digest run reaches
snoozed_until, that paper automatically leaves the snoozed state and can re-enter the active review queue the same day. - Recurring review intervals are only reactivated for
star,follow_up, andreadingpapers.doneentries keep their interval metadata but do not auto-resurface into action reminders.
Analysis notes:
- Analysis is disabled by default. If the section is omitted or
enabled = false, the digest keeps using the original abstract summary only. - Analysis runs after filtering and deduplication, so you only spend tokens on papers that actually make it into the digest.
max_paperscaps analysis cost for a single run. Papers beyond that limit still appear in the digest with their raw abstract summaries.- When analysis is enabled, the Markdown and notification outputs add: top-of-digest highlights, a one-sentence conclusion per paper, contribution bullets, best-fit audience, and likely limitations.
- A practical Chinese setup is
language = "Chinese"plus[digest] template = "zh_daily_brief". - For backward compatibility, legacy
template,top_highlights, andfeed_key_pointsvalues under[analysis]are still accepted when[digest]is omitted.
Preferred notification setup:
[[deliveries]]
type = "email"
smtp_host = "smtp.example.com"
smtp_port = 465
username = "bot@example.com"
password_env = "PAPER_DIGEST_SMTP_PASSWORD"
from_address = "bot@example.com"
to_addresses = ["you@example.com"]
use_tls = true
use_starttls = false
subject_prefix = "[Paper Digest]"
skip_if_empty = true
target = "digest"
include_focus = true
focus_target = "digest"
focus_statuses = ["star", "follow_up"]
focus_reasons = ["new_starred", "follow_up_resurfaced", "starred_momentum"]
focus_max_items = 5
include_actions = true
action_target = "digest"
action_only = false
action_statuses = ["star", "follow_up", "reading"]
action_reasons = ["overdue", "due_soon", "next_action_pending"]
action_max_items = 5
action_overdue_only = false
action_due_within_days = 7
[[deliveries]]
type = "feishu_webhook"
webhook_url = "https://open.feishu.cn/open-apis/bot/v2/hook/your-token"
title_prefix = "[Paper Digest]"
skip_if_empty = true
target = "per_feed"
include_focus = true
focus_target = "separate"
focus_statuses = ["star"]
focus_reasons = ["new_starred", "starred_momentum"]
focus_max_items = 3
include_actions = true
action_target = "separate"
action_only = false
action_statuses = ["reading"]
action_reasons = ["overdue"]
action_max_items = 2
action_overdue_only = true
action_due_within_days = 3
[[deliveries]]
type = "wecom_webhook"
webhook_url = "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=your-key"
title_prefix = "[Paper Digest]"
skip_if_empty = true
target = "per_feed"
include_focus = false
focus_target = "digest"
include_actions = true
action_target = "digest"
action_only = false
action_statuses = ["follow_up"]
action_reasons = ["due_soon", "next_action_pending"]
action_max_items = 3
action_overdue_only = false
action_due_within_days = 7
[[deliveries]]
type = "slack_webhook"
webhook_url = "https://hooks.slack.com/services/T000/B000/your-secret"
title_prefix = "[Paper Digest]"
skip_if_empty = true
target = "per_feed"
include_actions = true
action_target = "separate"
action_only = false
action_statuses = ["follow_up"]
action_reasons = ["due_soon"]
action_max_items = 2
action_overdue_only = false
action_due_within_days = 3
[[deliveries]]
type = "discord_webhook"
webhook_url = "https://discord.com/api/webhooks/123456789012345678/your-secret"
title_prefix = "[Paper Digest]"
skip_if_empty = true
target = "per_feed"
include_actions = true
action_target = "digest"
action_only = false
action_statuses = ["star"]
action_reasons = ["next_action_pending"]
action_max_items = 2
action_overdue_only = false
action_due_within_days = 14
[[deliveries]]
type = "telegram_bot"
bot_token = "123456:telegram-bot-token"
chat_id = "-1001234567890"
title_prefix = "[Paper Digest]"
skip_if_empty = true
target = "per_feed"
include_actions = true
action_target = "digest"
action_only = false
action_statuses = ["star", "follow_up", "reading"]
action_reasons = ["overdue", "due_soon", "next_action_pending"]
action_max_items = 4
action_overdue_only = false
action_due_within_days = 7Notes:
- Keep the SMTP password in an environment variable instead of the config file.
- Feishu delivery uses the incoming webhook URL directly; keep it in your
untracked
config.tomlor a GitHub secret-backed config. - WeCom delivery uses the group robot webhook URL directly; keep it in your
untracked
config.tomlor a GitHub secret-backed config. - Slack delivery uses an incoming webhook URL directly; keep it in your
untracked
config.tomlor a GitHub secret-backed config. - Discord delivery uses an incoming webhook URL directly; keep it in your
untracked
config.tomlor a GitHub secret-backed config. - Telegram delivery uses a bot token plus target chat ID; keep them in your
untracked
config.tomlor a GitHub secret-backed config. - OpenAlex can run without an API key for lightweight usage, but an
OPENALEX_API_KEYwired throughapp.openalex_api_key_envis the safer production path for newer OpenAlex rate-limit rules. - Use either
use_tls = truefor implicit TLS, usually port465, oruse_starttls = truefor STARTTLS, usually port587. skip_if_empty = truesuppresses notifications when a digest or feed has no new papers.target = "digest"sends one message for the whole run.target = "per_feed"sends one message per feed, with the title including the date and that feed's hit count.include_focus = falsekeeps the delivery on the normal digest path without the feedback-driven Focus block.focus_target = "digest"keeps Focus inline with the main digest, whilefocus_target = "separate"emits a secondFocus Briefmessage for that delivery when focus items exist.include_actions = falsekeeps one delivery on the normal digest path without the weekly action section.action_target = "digest"keeps action reminders inline with the main digest, whileaction_target = "separate"emits a dedicatedAction Briefmessage for that delivery when action items exist.action_only = trueturns one delivery into an action-reminder-only channel without suppressing the normal digest for other deliveries.action_statuses = ["star", "follow_up", "reading"]narrows action reminders to specific feedback states for that delivery.action_reasons = ["snooze_resumed", "overdue", "overdue_7d", "due_soon", "next_action_pending", "recurring_review", "recurring_due"]narrows action reminders by why they surfaced.action_max_items = 2caps how many action reminders one delivery gets, independent of the global[notify].max_action_items.action_overdue_only = truekeeps one delivery focused on overdue work only.action_due_within_days = 3keeps one delivery focused on near-term work.snooze_resumedmarks papers whosesnoozed_untilends today, andrecurring_duemarks recurring-review items whose interval is now due.- Delivery-level action filters only narrow the global action pool; they do not
widen past what
[notify]already emitted. focus_statuses = ["star", "follow_up"]narrows Focus to specific feedback states for that delivery. Leave it empty to accept all Focus statuses.focus_reasons = ["new_starred", "follow_up_resurfaced", "starred_momentum"]narrows Focus to specific trigger types for that delivery. Leave it empty to accept all Focus reasons.focus_max_items = 3overrides the global[notify].max_focus_itemscap for one delivery, so you can keep chat channels tighter than email digests.- Legacy
[email]config is still supported for backward compatibility. - Delivery failures return a non-zero exit code, keep generated artifacts on disk, and do not persist dedup state for that run.
Additional source examples:
[[feeds]]
name = "Crossref AI"
source = "crossref"
queries = ["agent reasoning benchmark"]
types = ["journal-article", "proceedings-article"]
keywords = ["agent", "reasoning"]
exclude_keywords = []
max_results = 50
max_items = 10
[[feeds]]
name = "PubMed AI"
source = "pubmed"
queries = ["agent systems", "clinical benchmark"]
types = ["Journal Article", "Review"]
keywords = ["agent", "benchmark"]
exclude_keywords = ["protocol"]
max_results = 50
max_items = 10
[[feeds]]
name = "Semantic Scholar AI"
source = "semantic_scholar"
queries = ["large language model", "agent systems"]
types = ["Review", "JournalArticle"]
keywords = ["agent", "benchmark"]
exclude_keywords = ["survey"]
max_results = 50
max_items = 10
[[feeds]]
name = "OpenAlex AI"
source = "openalex"
queries = ["large language model", "agent systems"]
types = ["article", "preprint"]
keywords = ["agent", "benchmark"]
exclude_keywords = ["survey"]
max_results = 50
max_items = 10Common commands:
pre-commit install
python tools/sync_lifecycle_docs.py
make check
make docs-check
make docs-check-json
make docs-check-markdown
make workflow-tools
make workflow-check
make build
make release-check
make runpython tools/sync_lifecycle_docs.py refreshes managed lifecycle blocks in
maintainer docs, issue forms, and release/ops workflows.
make docs-check-json emits the same repository-local docs-check result as a
machine-readable JSON report with structured findings.
make docs-check-markdown emits the same result as a GitHub-step-summary-ready
Markdown report.
If you want those reports written to stable paths for CI or local inspection,
run python tools/check_docs.py --json-report-file reports/docs-check-report.json --markdown-report-file reports/docs-check-summary.md or make docs-check-pr-comment to materialize the PR comment body.
If you want GitHub Actions annotations or a Markdown summary rendered back from
that JSON report, run python tools/render_docs_report.py reports/docs-check-report.json --format github-annotations or
python tools/render_docs_report.py reports/docs-check-report.json --format markdown or python tools/render_docs_report.py reports/docs-check-report.json --format pr-comment.
The current docs-check report schema is v4 and includes stable per-check
check_id values plus per-finding message, severity, and best-effort
path / line / end_line metadata so GitHub annotations, trusted PR
comment rerenders, and other machine consumers can target the affected file
and keep per-check contracts stable.
Failing pull requests now also get a maintained docs-check comment via the
trusted workflow_run workflow in
.github/workflows/docs-check-pr-comment.yml, which re-renders the comment
from the uploaded JSON artifact on the default branch and removes the comment
again once docs-check passes.
The link, registry, and lifecycle checks now emit native structured findings before report serialization, so most docs-check failures no longer rely on string parsing to recover file metadata.
Section-driven docs checks now also carry heading, issue-form-field, or workflow-block line ranges when the repository parser can resolve a stable origin, so Checks UI annotations land closer to the actual policy drift.
make workflow-check runs local GitHub workflow linting through
tools/check_workflows.py. The wrapper looks for actionlint in
ACTIONLINT_BIN, then .tools/actionlint/actionlint, then PATH.
make workflow-tools installs the pinned actionlint release into
.tools/actionlint/actionlint for macOS/Linux amd64/arm64 hosts and verifies
the downloaded archive checksum before replacing the repo-local binary.
The project currently uses only the Python standard library at runtime.
Additional maintainer docs:
docs/architecture.mddocs/compatibility-matrix.mddocs/config-recipes.mddocs/maintainer-guide.mdRELEASING.md
The repository includes a scheduled workflow at
daily-digest.yml.
The default schedule is 7 1 * * *, which means:
01:07 UTCevery day- about
09:07every day inAsia/Shanghai
GitHub Actions cron is not a real-time scheduler, so delivery can still be
delayed by a few minutes when the hosted runner queue is busy. The workflow
intentionally avoids minute 0 because top-of-hour schedules are more likely
to be delayed or skipped by GitHub.
To use it, create these GitHub repository secrets:
PAPER_DIGEST_CONFIG_TOML: your fullconfig.tomlcontentPAPER_DIGEST_FEEDBACK_JSON: optional localfeedback.jsoncontent used to seed.paper-digest-state/feedback.jsonbefore each runOPENAI_API_KEY: needed when[analysis] enabled = trueOPENALEX_API_KEY: optional, only needed when an OpenAlex feed setsapp.openalex_api_key_env = "OPENALEX_API_KEY"PAPER_DIGEST_SMTP_PASSWORD: only needed when email delivery is enabled
For manual validation runs, workflow_dispatch also accepts an optional
config_toml_override input. When you provide it, that run uses the temporary
config instead of PAPER_DIGEST_CONFIG_TOML.
For the common "LM, agent runtime security, and Terminal/SWE agent papers from
arXiv to Feishu every morning" setup, start from
examples/feishu-lm-arxiv.toml, replace the
placeholder Feishu webhook, store the full file content in
PAPER_DIGEST_CONFIG_TOML, and trigger Daily Digest manually once on main.
The same workflow also accepts an optional feedback_json_override input.
When you provide it, that run materializes the given JSON into
.paper-digest-state/feedback.json before digest generation. This is useful
for syncing a local reading-list state into GitHub Actions without hand-editing
repository secrets first.
The workflow restores and saves .paper-digest-state/ through the GitHub
Actions cache so deduplication and local feedback state survive across runs.
Feedback-state precedence for scheduled and manual runs is:
feedback_json_overridePAPER_DIGEST_FEEDBACK_JSON- cached
.paper-digest-state/feedback.json
For bidirectional local sync, the repository also includes
feedback-secret-sync.yml,
which exports the configured feedback secret into a short-lived artifact for
paper_digest feedback sync --direction pull.
It also restores and saves output/ history through the GitHub Actions cache.
That keeps dated digest folders alive across runs, so feed pages, keyword pages,
trend views, and RSS subscriptions can reflect accumulated history instead of
only the latest execution.
Temporary manual runs with config_toml_override are intentionally isolated:
- they skip digest state cache restore and save
- they skip archive history cache restore and save
- they skip GitHub Pages deployment
That makes them safe for validating new feeds or delivery channels without polluting the formal archive, dedup state, or live Pages site.
For repositories that added archive caching after the project was already
running, there is also a manual backfill workflow at
backfill-archive-history.yml.
It downloads historical successful Daily Digest artifacts, imports the
strongest snapshot for each day into output/YYYY-MM-DD/, rebuilds the archive
site and RSS feeds, and then seeds the same output/ cache used by the daily
workflow. Synthetic validation runs such as delivery-check digests are skipped
so they do not pollute the long-term archive.
That workflow now accepts three manual inputs:
run_limit: how many successfulDaily Digestruns to inspectdate_from: optional inclusive earliest digest date to importdate_to: optional inclusive latest digest date to importdry_run: preview what would change without writingoutput/, cache, or Pages
That makes it practical to do a narrow backfill such as "only recover the last
30 successful runs" or "rebuild just 2026-04-01 through 2026-04-07" without
editing workflow code. It also lets you preview a risky backfill first, inspect
the run log for imported and replaced dates, and then re-run without dry_run.
For scheduled stability, source fetches use bounded retry and backoff for
transient 429, 5xx, and timeout-style failures. You can tune that behavior
through request_timeout_seconds, fetch_retry_attempts, and
fetch_retry_backoff_seconds in [app].
Operational expectations for workflows, supported runners, and release
validation live in docs/maintainer-guide.md
and docs/compatibility-matrix.md.
The CLI also rebuilds output/site/index.html on every run. That static site:
- shows daily hit counts and per-feed summaries
- links to each day's Markdown and JSON
- supports feed filtering, title keyword search, and recent
7d/30dwindows - emits canonical paper detail pages under
output/site/papers/with merged source links, match reasons, and lightweight related-paper suggestions - emits a
output/site/momentum.htmlview for papers that keep resurfacing across multiple dates or feeds, with first-seen and last-seen timestamps - emits a
output/site/reading-list.htmlview for papers you have starred or marked as follow-up or reading in the local feedback state - emits a
output/site/review-queue.htmlview for actionable review work: new high-signal unmarked papers, resurfaced follow-ups, starred papers that still need attention, and recurring reviews ordered by effective due date - emits a
output/site/weekly-review.htmlview that groups papers into overdue, snoozed, pending, reading, completed, and resurfaced weekly review sections - surfaces personal feedback notes across detail pages, the reading list, review queue, weekly review, and feedback-driven Focus blocks
- surfaces
snoozed_until, recurring review intervals, next actions, and effective due dates across detail pages and feedback-centric archive views - emits fixed feed pages under
output/site/feeds/ - emits feed RSS files under
output/site/feeds/*.xml - emits keyword tracking pages under
output/site/topics/from configured feed keywords - emits keyword RSS files under
output/site/topics/*.xml - emits a
output/site/trends.htmloverview for feed and keyword subscription trends - exposes
canonical_idplus copyable feedback CLI snippets on each canonical paper detail page, so you can move from browsing to local feedback updates quickly
When GitHub Pages is enabled for the repository, the scheduled workflow uploads
output/site and deploys it automatically after each successful digest run.
On macOS or Linux you can run the digest every morning with cron:
0 8 * * * /absolute/path/to/.venv/bin/python -m paper_digest --config /absolute/path/to/config.tomlUse the GitHub issue forms for real usage feedback:
- Bug reports: broken source fetches, delivery failures, archive rendering issues, or regressions in CLI behavior.
- Support requests: setup questions, webhook configuration problems, or scheduled workflow debugging.
- Feature requests: new sources, delivery channels, ranking rules, or archive views.
Security reports should follow SECURITY.md, not public
issues. General support expectations are in SUPPORT.md.
Roadmap intake and prioritization rules live in
docs/roadmap-policy.md. Governance and maintainer
ownership rules live in GOVERNANCE.md.
- Add more literature sources such as Lens or CORE.
- Support more output adapters such as Matrix.
- Support additional LLM providers and richer feed-level briefings.
The project is usable today for daily arXiv monitoring, but it is still early.
Expect API and config changes while the repository matures. The project is
maintainer-led today; decision and ownership rules are documented in
GOVERNANCE.md.