Professional-grade user interviews at scale —
powered by prompt engineering, not bigger budgets.
34 → 94 % follow-up hit rate · 50+ iteration rounds · 2 500+ production calls
50 s demo · research question → auto-classify → pick methodology → generate full prompt
git clone https://github.com/CyannSHI/ai-interview-kit.git
cd ai-interview-kit
# Open with any supported AI tool and say "generate prompt"| Skill | Trigger | What It Does |
|---|---|---|
generate-prompt |
"generate prompt" / "new project" | Guided info collection → methodology pick → auto-assembled prompt |
generate-input |
"prepare input variables" | Natural language → structured input variables |
evaluate |
"evaluate calls" | Batch review call transcripts; detect bad cases & output Excel report |
|
Your ceiling: 5–8 deep interviews a week. Outsourcing just trades one problem for another. When the budget runs out, sample size gets cut — quality never goes up. |
You know you should talk to users — but what do you ask, and how deep do you go? ChatGPT gives surface-level answers that dead-end after two turns. You finally invest the time, only to realize you missed every key question. |
The idea: engineer research methodologies into AI prompts. Non-experts get professional interview quality. Experts get 10× scale.
Most AI interview prompts are either too rigid or too loose. Different goals need different AI latitude — and seasoned researchers dial this by instinct. We parameterized that instinct.
AI freedom: Low ◄━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━► High
Precise Control Balanced Mode Exploratory Mode
confirmatory most projects discovery
| Precise Control | Balanced | Exploratory | |
|---|---|---|---|
| Use case | Targeted validation | Clear direction, some flex | Open-ended discovery |
| Key-info markers | Per question | Per question | None — AI decides |
| Probing limit | 4 rounds / Q | 6 rounds / Q | AI discretion |
| Min coverage | 80 % required Qs | 75 % | 60 % |
| Example | NPS callback | JTBD migration | New-product exploration |
We distilled interviewers' tacit knowledge — when to drill down, when to skip, when to follow a thread — into three tunable parameters: info-point density, probing-round cap, and minimum coverage rate.
→ auto-assembled into a production-ready prompt
Add a new methodology by writing just 3 SLOTs — no framework changes needed. Framework upgrades automatically benefit every methodology.
| Methodology | Best For | Core Dimensions |
|---|---|---|
| JTBD Migration | User decisions · churn · competitor switching | Push · Pull · Anxiety · Habit · Destination |
| Journey Mapping | Experience flows · friction points · action chains | Stage · Touchpoint · Behavior · Emotion · Breakpoint |
| NPS / Satisfaction | Satisfaction callback · service improvement | Positive driver · Negative driver · Expectation gap |
| Laddering | Deep motivation · value discovery | Attribute · Functional benefit · Emotional benefit · Core value |
| User Lifecycle | Conversion · retention · churn | Acquisition · Conversion · Usage · Retention · Churn |
| Brand Diagnostics | Brand perception · competitive positioning | Awareness · Association · Preference · Comparison · Loyalty |
Custom methodology? Copy methodologies/_template.md, fill in 3 SLOTs, save — done.
v0.1 ███████░░░░░░░░░░░░░ 34% Flat question list — no probing
v0.2 ████████████░░░░░░░░ 61% + Key-info markers
v0.3 ██████████████████░░ 89% + Probing cap & 3-strike rule
v0.4 ███████████████████░ 94% + Methodology SLOT mechanism
Stress-test plan: 6 extreme scenarios
| # | Scenario | What It Tests |
|---|---|---|
| 1 | Memory activation | Can AI gently help users recall when they say "I don't remember"? |
| 2 | Deep drill-down without leading | Can AI ask purely open-ended questions — no options, no nudging? |
| 3 | Factual contradiction detection | Can AI catch and probe when users contradict themselves? |
| 4 | High-pressure emotion handling | Can AI de-escalate anger and steer back on track? |
| 5 | Signal extraction from noise | Can AI identify key info when users ramble? |
| 6 | Identity stability under challenge | How does AI respond when users ask "Are you a robot?" |
Evaluation dimensions: pacing · probing depth · information leakage · abnormal hang-up rate · prompt robustness
Production validation data
| Metric | Industry Baseline | Project A (Test) | Project B (Prod) | Project C (Prod) |
|---|---|---|---|---|
| Call volume | 50–100 / day | 1 267 | 202 | 1 031 |
| Connect rate | 30–40 % | 47 % | 61 % | 51 % |
| Effective interview rate | 10–15 % | 6 % | 21 % | 7 % |
| Time cost | 1–2 ppl × 2–3 days | 2 lines × 4 h | 2 lines × 30 min | 2 lines × 3.5 h |
Project B achieved 21 % effective interview rate — above the industry baseline of 10–15 %.
The campaign isn't the finish line. Feed call transcripts back to AI — say "evaluate this campaign" or "find bad cases" — and it returns an Excel report with per-call scoring, issue pinpointing, and concrete improvement suggestions that feed directly into your next prompt iteration.
All skills are written in plain natural language — zero API dependencies, auto-compatible with major AI coding tools:
| AI Tool | Entry File |
|---|---|
| Qoder | AGENTS.md → .qoder/skills/ |
| Claude Code | CLAUDE.md |
| Cursor | .cursor/rules/ai-interview-skills.mdc |
| GitHub Copilot | .github/copilot-instructions.md |
| Windsurf | .windsurfrules |
| Others | INSTRUCTIONS.md |
Every entry file points to skills/ — skill logic lives in one place, zero duplication.
Project Structure
.
├── skills/ # Skill instructions (single source of truth)
│ ├── generate-prompt.md # Prompt generation
│ ├── generate-input.md # Input variable generation
│ └── evaluate.md # Call quality evaluation
├── framework/
│ └── base.md # Universal interview framework (with SLOT placeholders)
├── methodologies/ # Methodology library (pluggable)
│ ├── jtbd.md # JTBD Migration
│ ├── journey.md # Journey Mapping
│ ├── nps.md # NPS / Satisfaction
│ ├── laddering.md # Laddering
│ ├── lifecycle.md # User Lifecycle
│ └── brand.md # Brand Diagnostics
├── examples/ # Example files
└── assets/ # Image assets
Issues and PRs welcome — especially new methodology modules, real-world case studies, and improvements to the probing logic.
Vision: Democratize Insight
Let bootstrapped startups and nonprofits — teams that can't afford a research agency —
hear their users at low cost, so product design truly returns to human-centered.
If this project helps you, consider giving it a ⭐





