A comprehensive suite of Claude Code skills for academic research, covering the full pipeline from research to publication.
AI is your copilot, not the pilot. This tool won't write your paper for you. It handles the grunt work — hunting down references, formatting citations, verifying data, checking logical consistency — so you can focus on the parts that actually require your brain: defining the question, choosing the method, interpreting what the data means, and writing the sentence after "I argue that."
Unlike a humanizer, this tool doesn't help you hide the fact that you used AI. It helps you write better. Style Calibration learns your voice from past work. Writing Quality Check catches the patterns that make prose feel machine-generated. The goal is quality, not cheating.
Lu et al. (2026, Nature 651:914-919) built The AI Scientist — the first fully autonomous AI research system to publish a paper through blind peer review at a top-tier ML venue (ICLR 2025 workshop, score 6.33/10 vs workshop average 4.87). Their Limitations section enumerates the failure modes that any fully-autonomous AI research pipeline inherits: implementation bugs, hallucinated results, shortcut reliance, bug-as-insight reframing, methodology fabrication, frame-lock, citation hallucinations.
ARS is built on the premise that a human researcher augmented by AI avoids these failure modes better than either alone. Stage 2.5 and Stage 4.5 integrity gates run a 7-mode blocking checklist (see academic-pipeline/references/ai_research_failure_modes.md); the reviewer offers an opt-in calibration mode that measures its own FNR/FPR against a user-supplied gold set.
v3.3 was inspired by PaperOrchestra (Song, Song, Pfister & Yoon, 2026, Google): Semantic Scholar API verification, anti-leakage protocol, VLM figure verification, and score trajectory tracking.
👉 docs/ARCHITECTURE.md — the full pipeline view: flow diagram, stage-by-stage matrix, data-access flow, skill dependency graph, quality gates, and mode list.
The architecture doc supersedes the sprawling pipeline description that used to live here. Everything about what runs in which stage now lives in one place.
👉 docs/SETUP.md — install Claude Code, set up API keys, optional Pandoc/tectonic for DOCX/PDF, cross-model verification (ARS_CROSS_MODEL), and four installation methods including claude.ai Project import.
👉 docs/PERFORMANCE.md — per-mode token budgets, full-pipeline estimate (~$4–6 for a 15k-word paper), and recommended Claude Code settings (Agent Team, Ralph Loop, Skip Permissions).
- Academic Writing Shouldn't Be a Solo Act — full pipeline walkthrough (English)
- 學術寫作不該是一個人的事:一套開源 AI 協作工具如何改變研究者的工作流 — 完整使用指南(繁體中文)
- Deep Research — 13-agent research team with Socratic guided mode, PRISMA systematic review, intent detection, dialogue health monitoring, optional cross-model DA, Semantic Scholar API verification.
- Academic Paper — 12-agent paper writing with Style Calibration, Writing Quality Check, LaTeX hardening, visualization, revision coaching, citation conversion, anti-leakage protocol, and VLM figure verification.
- Academic Paper Reviewer — 7-agent multi-perspective peer review with 0–100 quality rubrics (EIC + 3 dynamic reviewers + Devil's Advocate), concession threshold protocol, attack intensity preservation, optional cross-model DA critique / calibration, R&R traceability matrix, read-only constraint.
- Academic Pipeline — 10-stage pipeline orchestrator with adaptive checkpoints, claim verification, Material Passport, optional
repro_lock, optional cross-model integrity verification, mid-conversation reinforcement, and score trajectory tracking. - Data Access Level Metadata (v3.3.2+) — every skill declares
data_access_level(raw/redacted/verified_only); enforced byscripts/check_data_access_level.py. Pattern adapted from Anthropic's automated-w2s-researcher (2026). Seeshared/ground_truth_isolation_pattern.md. - Task Type Annotation (v3.3.2+) — every skill declares
task_type(open-endedoroutcome-gradable). All current ARS skills areopen-ended. - Benchmark Report Schema (v3.3.5+) — JSON Schema + lint for honest benchmark comparisons. See
shared/benchmark_report_pattern.md. - Artifact Reproducibility Lockfile (v3.3.5+) — optional
repro_locksub-block on Material Passport. Configuration documentation, not replay guarantee — LLM outputs are not byte-reproducible. Seeshared/artifact_reproducibility_pattern.md.
See the complete artifacts from a real 10-stage pipeline run — peer review reports, integrity verification reports, and the final paper:
Browse all pipeline artifacts →
| Artifact | Description |
|---|---|
| Final Paper (EN) | APA 7.0 formatted, LaTeX-compiled |
| Final Paper (ZH) | Chinese version, APA 7.0 |
| Integrity Report — Pre-Review | Stage 2.5: caught 15 fabricated refs + 3 statistical errors |
| Integrity Report — Final | Stage 4.5: zero regressions confirmed |
| Peer Review Round 1 | EIC + 3 Reviewers + Devil's Advocate |
| Re-Review | Verification after revisions |
| Peer Review Round 2 | Follow-up review |
| Response to Reviewers | Point-by-point author response |
| Post-Publication Audit Report | Independent full-reference audit: found 21/68 issues missed by 3 rounds of integrity checks |
If your research involves running experiments (code or human studies) before writing, the Experiment Agent skill fills the gap between ARS Stage 1 (RESEARCH) and Stage 2 (WRITE).
ARS Stage 1 RESEARCH → RQ Brief + Methodology Blueprint
↓
experiment-agent → run/manage experiments → validate results
↓
ARS Stage 2 WRITE → write paper with verified experiment results
What it does: executes code experiments (Python, R, etc.) with real-time monitoring, manages human study protocols with IRB ethics checklist, interprets statistics with 11-type fallacy detection, and verifies reproducibility.
How to use together: pause the ARS pipeline after Stage 1, run experiments in a separate experiment-agent session, then bring the results (with Material Passport) back to ARS Stage 2. ARS requires zero modification. See the experiment-agent README for setup instructions.
# Start a full research pipeline
You: "I want to write a research paper on AI's impact on higher education QA"
# Start with Socratic guidance
You: "Guide my research on AI in educational evaluation"
# Write a paper with guided planning
You: "Guide me through writing a paper on demographic decline"
# Review an existing paper
You: "Review this paper" (then provide the paper)
# Check pipeline status
You: "status"
"Research the impact of AI on higher education" → full mode
"Give me a quick brief on X" → quick mode
"Do a systematic review on X with PRISMA" → systematic-review mode
"Guide my research on X" → socratic mode (guided)
"Fact-check these claims" → fact-check mode
"Do a literature review on X" → lit-review mode
"Review this paper's research quality" → review mode
"Write a paper on X" → full mode
"Guide me through writing a paper" → plan mode (guided)
"Build a paper outline" → outline-only mode
"I have a draft, here are reviewer comments" → revision mode
"Parse these reviewer comments into a roadmap" → revision-coach mode
"Write an abstract for this paper" → abstract-only mode
"Turn this into a literature review paper" → lit-review mode
"Convert to LaTeX" / "Convert citations to IEEE" → format-convert mode
"Check citations" → citation-check mode
"Generate an AI disclosure statement for NeurIPS" → disclosure mode
"Review this paper" → full mode (EIC + R1/R2/R3 + Devil's Advocate)
"Quick assessment of this paper" → quick mode
"Guide me to improve this paper" → guided mode
"Check the methodology" → methodology-focus mode
"Verify the revisions" → re-review mode
"Calibrate this reviewer against my gold set" → calibration mode
"I want to write a complete research paper" → full pipeline from Stage 1
"I already have a paper, review it" → mid-entry at Stage 2.5 (integrity first)
"I received reviewer comments" → mid-entry at Stage 4
Pipeline ends with Stage 6: Process Summary — auto-generates a paper creation process record with 6-dimension Collaboration Quality Evaluation (1–100 scoring).
- Traditional Chinese (繁體中文) — default when user writes in Chinese
- English — default when user writes in English
- Bilingual abstracts (Chinese + English) for academic papers
Using a different language? Socratic mode (deep-research) and Plan mode (academic-paper) use intent-based activation — they detect the meaning of your request, not specific keywords. This means they work in any language without modification.
However, the general
Trigger Keywordssection (which determines whether the skill is activated at all) still lists English and Traditional Chinese keywords. If you find the skill isn't activating reliably in your language, you can add your language's keywords to the### Trigger Keywordssection in eachSKILL.mdfile to improve matching confidence.
- APA 7.0 (default, including Chinese citation rules)
- Chicago (Notes & Author-Date)
- MLA
- IEEE
- Vancouver
- IMRaD (empirical research)
- Thematic Literature Review
- Theoretical Analysis
- Case Study
- Policy Brief
- Conference Paper
Per-agent responsibilities and per-stage artifacts now live in docs/ARCHITECTURE.md. Version numbers are anchored here so release metadata stays in one place.
13-agent research team. Modes: full, quick, review, lit-review, fact-check, socratic, systematic-review. Full agent roster and artifacts: see ARCHITECTURE.md §3.
12-agent paper writing pipeline. Modes: full, plan, outline-only, revision, revision-coach, abstract-only, lit-review, format-convert, citation-check, disclosure. Output: MD + DOCX (via Pandoc when available) + LaTeX (APA 7.0 apa7 class / IEEE / Chicago) → PDF via tectonic. Full agent roster and per-phase responsibilities: see ARCHITECTURE.md §3.
7-agent multi-perspective review with 0-100 quality rubrics. Modes: full, re-review, quick, methodology-focus, guided, calibration. Decision mapping: ≥80 Accept, 65-79 Minor Revision, 50-64 Major Revision, <50 Reject. First-round review team vs. narrow re-review team boundary: see ARCHITECTURE.md §3 Stage 3 / Stage 3'.
10-stage orchestrator with integrity verification, two-stage review, Socratic coaching, and collaboration evaluation. Pipeline guarantees: every stage requires user confirmation checkpoint; integrity verification (Stage 2.5 + 4.5) cannot be skipped; R&R Traceability Matrix (Schema 11) independently verifies author revision claims. Stage-by-stage matrix with agents, artifacts, and gates: see ARCHITECTURE.md §3.
While using ARS to write a reflection article about AI in higher education, I ran into three structural problems that no amount of prompt engineering could fix:
-
Frame-lock: I asked the AI to run a devil's advocate debate against its own thesis. It did — four rounds, each more refined than the last. But every round stayed inside the frame I'd set. The DA attacked arguments, never premises. It never asked "are we even discussing the right question?" This is the same pattern that caused the 31% citation error rate in v2.7's stress test: the verifying AI and the generating AI share the same cognitive frame.
-
Sycophancy under pushback: Every time I challenged the DA's attacks, it conceded too quickly. It retracted findings faster than it launched them. The model's training rewards conversational harmony — so "the user pushed back" was treated as evidence that the attack was wrong, when often it just meant the user was persistent.
-
Intent misdetection: The Socratic Mentor kept trying to converge and produce deliverables ("Want me to write this up?") when I was still exploring. It couldn't distinguish "the user wants a deep philosophical discussion" from "the user wants an RQ brief." Both look like engagement, but they need opposite AI behaviors.
Devil's Advocate — Concession Threshold Protocol (deep-research + academic-paper-reviewer)
- DA must now score every rebuttal on a 1-5 scale before responding
- Concession only allowed at score ≥4 (rebuttal directly addresses core attack with evidence)
- Score ≤3: hold position and restate the original attack
- Anti-sycophancy rules: no consecutive concessions, concession rate tracking, frame-lock detection after each checkpoint
Socratic Mentor — Intent Detection Layer (deep-research)
- Classifies user intent as exploratory vs. goal-oriented at dialogue start and every 3 turns
- Exploratory mode: disables auto-convergence, raises max rounds to 60, prohibits "want me to summarize?" prompts
- Goal-oriented mode: standard convergence behavior
- Anti-premature-closure rules: in exploratory mode, the user decides when to stop
Socratic Mentor — Dialogue Health Indicator (deep-research)
- Silent self-assessment every 5 turns on three dimensions: persistent agreement, conflict avoidance, premature convergence
- Auto-injects challenging questions when agreement pattern detected
- Invisible to user (to prevent gaming), but log available for post-session review
These optimizations don't solve AI's structural limits — they make the limits visible and manageable. The DA will still eventually concede if pushed hard enough. The Socratic Mentor will still have some convergence bias. But now there are explicit checkpoints that slow down the sycophancy, force the DA to justify concessions, and prevent the Mentor from wrapping up before the user is ready.
The deeper lesson: AI literacy isn't about learning to use AI as a tool, following ethics rules, or fearing AI risks. It's about engaging AI deeply enough to discover its structural limits yourself — and your own thinking limits in the process.
This work is licensed under CC-BY-NC 4.0.
You are free to:
- Share — copy and redistribute the material
- Adapt — remix, transform, and build upon the material
Under the following terms:
- Attribution — You must give appropriate credit
- NonCommercial — You may not use the material for commercial purposes
Attribution format:
Based on Academic Research Skills by Cheng-I Wu
https://github.com/Imbad0202/academic-research-skills
Cheng-I Wu (吳政宜) — Author and maintainer
aspi6246 — Contributor. The v3.1 optimization was inspired by patterns from Claude-Code-Skills-for-Academics: read-only constraint pattern, anti-pattern codification as first-class design, cognitive framework approach (teaching "how to think" not just procedures), and lean skill size philosophy.
mchesbro1 — Contributor. Originally proposed and drafted the IS Basket of 8 journals for academic-paper-reviewer/references/top_journals_by_field.md (Issue #5).
cloudenochcsis — Contributor. Extended the IS section from the Basket of 8 to the full Senior Scholars' Basket of 11 — adding Decision Support Systems, Information & Management, and Information and Organization (Issue #7, PR #8). Sourced from the AIS Senior Scholars' List of Premier Journals.
- Added
docs/ARCHITECTURE.mdas the single source of truth for pipeline structure (flow, matrix, data-access, dependency graph, quality gates, modes). Merged into main via PR #18. - Added
docs/SETUP.md(prerequisites, API keys, Pandoc/tectonic, cross-model verification, installation methods) anddocs/PERFORMANCE.md(token budgets, recommended Claude Code settings). README links to both instead of inlining them. - Streamlined README: removed the ASCII pipeline diagram and 16-point key-feature list (superseded by ARCHITECTURE.md); Skill Details section now anchors version numbers and points readers to ARCHITECTURE.md §3 for per-agent rosters.
- Note: no functional change to any skill. Pure documentation reorganization. Suite version bumped to
3.3.6.
- Added
benchmark_report.schema.json+repro_lockoptional block on Material Passport. Both ship with pattern docs, lints, and examples. First formal Python dev dep manifest (requirements-dev.txt).
- Synced the embedded changelog sections in
README.mdandREADME.zh-TW.mdso they include the missingv3.3.3andv3.3.2release summaries. - Extended
scripts/check_spec_consistency.pyso future README changelog drift fails CI.
- Hardened SKILL frontmatter linting: missing closing
---fences now fail cleanly instead of being parsed as valid YAML. - Frontmatter that parses as valid YAML but not as a mapping now reports a readable error instead of crashing.
- Fixed the broken showcase link for the post-publication audit report in both READMEs.
- Added README relative-link validation to the spec consistency check so dead links fail CI.
- Aligned the DOCX output contract across the docs: direct
.docxgeneration is Pandoc-dependent, with Markdown + conversion instructions as fallback. - Prepared the
v3.3.3release: suite version bump,academic-paper-> v3.0.2,academic-pipeline-> v3.2.2.
- Added
metadata.data_access_levelto all top-levelSKILL.mdfiles with enforced vocabulary:raw,redacted,verified_only. - Added
metadata.task_typeto all top-levelSKILL.mdfiles with enforced vocabulary:open-ended,outcome-gradable. - Added lint scripts and unit tests for both metadata fields, wired into the GitHub Actions spec consistency workflow.
- Added
shared/ground_truth_isolation_pattern.mdand linked the new vocabulary fromshared/handoff_schemas.md.
- Synced README,
.claude/CLAUDE.md,MODE_REGISTRY.md, andSKILL.mdfiles to the current mode counts and published skill versions. - Corrected cross-model wording: integrity sample checks and independent DA critique are implemented today; sixth-reviewer peer review remains planned.
- Clarified adaptive checkpoint semantics so SLIM checkpoints still wait for explicit user confirmation.
- Reaffirmed that Stage 2.5 and Stage 4.5 integrity gates cannot be skipped.
- Added a lightweight spec consistency check and GitHub Actions workflow to catch future drift.
Integrates techniques from PaperOrchestra (Song, Song, Pfister & Yoon, 2026, Google).
- Semantic Scholar API Verification — Tier 0 programmatic reference existence check via S2 API. Levenshtein >= 0.70 title matching, DOI mismatch detection, bibliography deduplication via S2 IDs. Graceful degradation if API unavailable.
- Anti-Leakage Protocol — Knowledge Isolation Directive prioritizes session materials over LLM parametric memory. Flags
[MATERIAL GAP]for missing content instead of filling from memory. Reduces Mode 5/6 failure risk. - VLM Figure Verification (optional) — Closed-loop verification of rendered figures using vision-capable LLM. 10-point checklist, max 2 refinement iterations.
- Score Trajectory Protocol — Per-dimension rubric score delta tracking across revision rounds (7 dimensions). Detects regressions (delta < -3) and triggers mandatory checkpoint.
- Stage 2 Parallelization — Visualization and argument building can run in parallel after outline completion.
- New versions: deep-research v2.8, academic-paper v3.0, academic-pipeline v3.2
Integrates insights from Lu et al. (2026, Nature 651:914-919) — the first end-to-end autonomous AI research system to pass blind peer review.
- 7-mode AI Research Failure Mode Checklist — blocks pipeline at Stage 2.5/4.5 on suspected implementation bugs, hallucinated results, shortcut reliance, bug-as-insight, methodology fabrication, frame-lock. Extends existing 5-type citation hallucination taxonomy.
- Reviewer Calibration Mode (academic-paper-reviewer v1.8) — opt-in FNR/FPR/balanced-accuracy measurement against user-supplied gold set. 5× ensembling, cross-model default-on, session-scoped confidence disclosure.
- Disclosure Mode (academic-paper v2.9) — venue-specific AI-usage statement generator. v1 covers ICLR, NeurIPS, Nature, Science, ACL, EMNLP.
- Early-Stopping Criterion (academic-pipeline v3.1) — convergence check + budget transparency at pipeline start.
- Fidelity-Originality Mode Spectrum — classifies all modes across 3 skills per Lu 2026 Fig 1c.
- New versions: academic-paper v2.9, academic-paper-reviewer v1.8, academic-pipeline v3.1
External contributions: @mchesbro1 originally proposed and drafted the IS Basket of 8 journals (Issue #5); @cloudenochcsis extended it to the full Senior Scholars' Basket of 11 (Issue #7, PR #8). Updated academic-paper-reviewer/references/top_journals_by_field.md Section 7, adding Decision Support Systems, Information & Management, and Information and Organization. Source: AIS Senior Scholars' List of Premier Journals.
Inspired by patterns from aspi6246/Claude-Code-Skills-for-Academics.
Wave 1: Anti-Context-Rot Anchors
- 29 explicit Anti-Patterns across all 4 skills (7-8 per skill, tabular format with "Why It Fails" + "Correct Behavior")
- 22 IRON RULE markers on critical rules that must not be violated even in long conversations
- Read-only constraint on academic-paper-reviewer (reviewers cannot modify the manuscript)
Wave 2: Traceability + Cognitive Frameworks + Reinforcement
- R&R Traceability Matrix (Schema 11): adds "Author's Claim" and "Verified?" columns to re-review output, enabling independent verification of revision claims
- 3 cognitive framework reference files teaching agents "how to think" not just "what to do":
argumentation_reasoning_framework.md— Toulmin model, Bradford Hill causal reasoning, inference to best explanation, epistemic status classificationreview_quality_thinking.md— three lenses (internal validity, external validity, contribution), common reviewer traps, calibration questionswriting_judgment_framework.md— clarity test, reader's journey, discipline-specific voice, revision decision matrix
- Mid-conversation reinforcement protocol: stage-specific IRON RULE + Anti-Pattern reminders at every pipeline transition
- Self-check questions at every FULL checkpoint (citation integrity, sycophantic concession, quality trajectory, scope discipline, completeness)
Wave 3: Lean Skill Size
- SKILL.md total size reduced from 142KB to 85KB (−40%) by extracting detailed protocols to
references/files - ~15 new reference files created (re-review protocol, guided mode, systematic review, process summary, external review, etc.)
- All IRON RULE markers preserved in SKILL.md; detailed content loaded on demand
- New versions: deep-research v2.7, academic-paper v2.8, academic-paper-reviewer v1.7, academic-pipeline v3.0
- Devil's Advocate Concession Threshold (deep-research + academic-paper-reviewer): DA must score rebuttals 1-5 before responding. Concession only at ≥4. No consecutive concessions. Concession rate tracking. Frame-lock detection after each checkpoint.
- Attack Intensity Preservation (academic-paper-reviewer): DA does not soften under pushback. Rebuttal assessment protocol with explicit deflection detection. Anti-sycophancy rules prevent persistent pushback from being treated as valid evidence.
- Intent Detection Layer (deep-research socratic): Classifies user intent as exploratory vs. goal-oriented. Exploratory mode disables auto-convergence, raises max rounds, prohibits premature closure. Re-assesses every 3 turns.
- Dialogue Health Indicator (deep-research socratic): Silent self-check every 5 turns for persistent agreement, conflict avoidance, premature convergence. Auto-injects challenges when agreement pattern detected.
- Cross-Model Verification Protocol (shared, optional): Use GPT-5.4 Pro or Gemini 3.1 Pro for integrity verification sample cross-checks and independent DA critique. Sixth-reviewer peer review remains planned, not yet implemented. Activated by setting
ARS_CROSS_MODELenv var — without it, everything works as before. Seeshared/cross_model_verification.mdfor full setup guide, API patterns, and cost estimates. - AI Self-Reflection Report (academic-pipeline Stage 6): Post-pipeline self-assessment of AI behavioral patterns — DA concession rate, checkpoint skip rate, health alerts, sycophancy risk rating (LOW/MEDIUM/HIGH), frame-lock incidents, convergence pattern analysis. Includes irony caveat: "this self-reflection is itself produced by the same AI that may have been sycophantic."
- Origin: Discovered through a 4-round dialectic experiment where the DA conceded too quickly, the Socratic Mentor tried to converge prematurely, and the entire debate stayed locked in a frame the human set.
- Versions: deep-research v2.5, academic-paper-reviewer v1.5, academic-pipeline v2.8
- Style Calibration (academic-paper intake Step 10, optional): Provide 3+ past papers and the pipeline learns your writing voice — sentence rhythm, vocabulary preferences, citation integration style. Applied as a soft guide during drafting; discipline conventions always take priority. Priority system: discipline norms (hard) > journal conventions (strong) > personal style (soft). See
shared/style_calibration_protocol.md - Writing Quality Check (
academic-paper/references/writing_quality_check.md): Writing quality checklist applied during draft self-review. 5 categories: AI high-frequency term warnings (25 terms), punctuation pattern control (em dash ≤3), throat-clearing opener detection, structural pattern warnings (Rule of Three, uniform paragraphs, synonym cycling), and burstiness checks (sentence length variation). These are good writing rules — not detection evasion - Style Profile carried through academic-pipeline Material Passport (Schema 10 in
shared/handoff_schemas.md) - deep-research report compiler also consumes both features optionally
- Versions: academic-paper v2.5, deep-research v2.4, academic-pipeline v2.7
- Socratic Mentor Agent (deep-research + academic-paper): SCR (State-Challenge-Reflect) protocol integration
- Commitment Gates: Collect user predictions before presenting evidence at each layer/chapter transition
- Certainty-Triggered Contradiction: Detect high-confidence language ("obviously", "clearly") and introduce counterpoints
- Adaptive Intensity: Track commitment accuracy, dynamically adjust challenge frequency
- Self-Calibration Signal (S5): New convergence signal tracking user's self-calibration growth across dialogue
- SCR Switch: Users can say "skip the predictions" to disable or "turn predictions back on" to re-enable mid-dialogue; Socratic questioning continues normally
deep-research/references/socratic_questioning_framework.md: SCR Overlay Protocol mapping SCR phases to Socratic functions- Added
CHANGELOG.md
- integrity_verification_agent v2.0: Anti-Hallucination Mandate (no AI memory verification), eliminated gray-zone classifications (VERIFIED/NOT_FOUND/MISMATCH only), mandatory WebSearch audit trail for every reference, Stage 4.5 fresh independent verification, Gray-Zone Prevention Rule
- Known Hallucination Patterns: 5-type taxonomy (TF/PAC/IH/PH/SH) from GPTZero × NeurIPS 2025 study, 5 compound deception patterns, real-world case study, literature statistics
- Post-publication audit: Full WebSearch verification of all 68 references found 21 issues (31% error rate) that passed 3 rounds of integrity checks — proving the necessity of external verification
- Paper corrections: Removed 4 fabricated references, fixed 6 author errors, corrected 7 metadata errors, fixed 2 format issues
- deep-research: Socratic mode now uses intent-based activation instead of keyword matching. Works in any language — detects meaning (e.g., "user wants guided thinking") rather than matching specific strings.
- academic-paper: Plan mode now uses intent-based activation. Detects intent signals like "user is uncertain how to start" or "user wants step-by-step guidance" in any language.
- Both modes now have a default rule: when intent is ambiguous, prefer
socratic/planoverfull— safer to guide first. - Two-layer architecture: Layer 1 (skill activation) uses bilingual keywords for matching confidence; Layer 2 (mode routing) uses language-agnostic intent signals.
- deep-research: Added Traditional Chinese trigger keywords for general activation and Socratic mode.
- academic-paper: Added Traditional Chinese trigger keywords and Plan Mode trigger section.
- Both mode selection guides now include bilingual examples and Chinese-specific misselection scenarios.
- deep-research v2.3: New systematic-review / PRISMA mode (7th); 3 new agents (risk_of_bias, meta_analysis, monitoring); PRISMA protocol/report templates; Socratic convergence criteria (4 signals + auto-end); Quick Mode Selection Guide
- academic-paper v2.4: 2 new agents (visualization, revision_coach); revision tracking template with 4 status types; citation format conversion (APA↔Chicago↔MLA↔IEEE↔Vancouver); statistical visualization standards; Socratic convergence criteria; revision recovery example; LaTeX output hardening — mandatory
apa7document class, text justification fix (ragged2e+etoolbox), table column width formula, bilingual abstract centering, standardized font stack (Times New Roman + Source Han Serif TC VF + Courier New), PDF via tectonic only - academic-paper-reviewer v1.4: Quality rubrics with 0-100 scoring and behavioral indicators; decision mapping (≥80 Accept, 65-79 Minor, 50-64 Major, <50 Reject); Quick Mode Selection Guide
- academic-pipeline v2.6: Adaptive checkpoint system (FULL/SLIM/MANDATORY); Phase E Claim Verification in integrity checks; Material Passport for mid-entry provenance; cross-skill mode advisor (14 scenarios); team collaboration protocol; enhanced handoff schemas (9 schemas); integrity failure recovery example
- academic-pipeline v2.4: New Stage 6 PROCESS SUMMARY — auto-generates structured paper creation process record (MD → LaTeX → PDF, bilingual); mandatory final chapter: Collaboration Quality Evaluation with 6 dimensions scored 1–100 (Direction Setting, Intellectual Contribution, Quality Gatekeeping, Iteration Discipline, Delegation Efficiency, Meta-Learning), honest feedback, and improvement recommendations; pipeline expanded from 9 to 10 stages
- academic-pipeline v2.3: Stage 5 FINALIZE now prompts for formatting style (APA 7.0 / Chicago / IEEE); PDF must compile from LaTeX via
tectonic(no HTML-to-PDF); APA 7.0 usesapa7document class (manmode) with XeCJK for bilingual CJK support; font stack: Times New Roman + Source Han Serif TC VF + Courier New
- Cross-Agent Quality Alignment: unified definitions (peer-reviewed, currency rule, CRITICAL severity, source tier) across all agents
- deep-research v2.2: synthesis anti-patterns, Socratic auto-end conditions, DOI+WebSearch verification, enhanced ethics integrity check, mode transition matrix
- academic-paper v2.2: 4-level argument scoring, plagiarism screening, 2 new failure paths (F11 Desk-Reject Recovery, F12 Conference-to-Journal), Plan→Full mode conversion
- academic-paper-reviewer v1.3: DA vs R3 role boundaries, CRITICAL finding criteria, consensus classification (4/3/SPLIT/DA-CRITICAL), confidence score weighting, Asian & Regional Journals reference
- academic-pipeline v2.2: checkpoint confirmation semantics, mode switching matrix, failure fallback matrix, state ownership protocol, material version control
- Simplify 4 SKILL.md (-371 lines, -16.5%): remove cross-skill duplication, inline templates → file references, redundant routing tables, duplicate mode selection sections
- Fix revision loop cap contradiction between academic-paper and academic-pipeline
- academic-pipeline v2.0: 5→9 stages, mandatory integrity verification, two-stage review, Socratic revision coaching, reproducibility guarantees
- academic-paper-reviewer v1.1: +Devil's Advocate Reviewer (7th agent), +re-review mode (verification), +post-review Socratic coaching
- New agent:
integrity_verification_agent— 100% reference/data verification with audit trail - New agent:
devils_advocate_reviewer_agent— 8-dimension thesis challenger - Output order: MD → DOCX via Pandoc when available (else instructions) → ask LaTeX → confirm → PDF
- Initial release
- deep-research v2.0 (10 agents, 6 modes including socratic)
- academic-paper v2.0 (10 agents, 8 modes including plan)
- academic-paper-reviewer v1.0 (6 agents, 4 modes including guided)
- academic-pipeline v1.0 (orchestrator)