Production-grade adaptive testing engine powered by Item Response Theory (IRT). Dynamically selects optimal questions based on real-time student ability estimation, delivering precise assessments in fewer questions than traditional fixed-length tests.
Traditional tests give every student the same questions. Adaptive tests are smarter — they adjust in real time:
- Start with a medium-difficulty question
- Student answers correctly → next question is harder
- Student answers incorrectly → next question is easier
- Converge on the student's true ability in fewer questions
This is the same approach used by the GRE, GMAT, and many standardized assessments. The underlying math is Item Response Theory (IRT) — a psychometric framework that models the relationship between student ability and question difficulty.
Student answers → Update ability estimate → Select optimal next question → Repeat
↑ |
└──────────────────────────────────────────────────────────────────────┘
- 2-Parameter Logistic (2PL) IRT model — models both difficulty and discrimination
- Maximum Likelihood Estimation (MLE) — precise ability estimation with Bayesian regularization
- Fisher Information question selection — picks the most informative question at each step
- Adaptive stopping rules — ends when measurement precision is sufficient
- Real-time REST API — create sessions, submit answers, get next question
- Simulation endpoint — validate algorithm behavior with known true abilities
- Security hardened — input validation, session limits, TTL eviction, overflow protection
- Health monitoring —
/healthendpoint for orchestration and load balancers
# Clone and setup
git clone https://github.com/woodstocksoftware/adaptive-question-selector.git
cd adaptive-question-selector
python3.12 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Start the server
python -m uvicorn src.server:app --reload --port 8002The API is now running at http://localhost:8002. Interactive docs at http://localhost:8002/docs.
curl -X POST http://localhost:8002/sessions \
-H "Content-Type: application/json" \
-d '{
"question_pool": [
{"id": "q1", "difficulty": -2.0, "discrimination": 1.2, "content": "What is 2+2?"},
{"id": "q2", "difficulty": -1.0, "discrimination": 1.0, "content": "Solve: 3x = 12"},
{"id": "q3", "difficulty": 0.0, "discrimination": 1.5, "content": "Factor: x² - 4"},
{"id": "q4", "difficulty": 1.0, "discrimination": 0.8, "content": "Derivative of sin(x)"},
{"id": "q5", "difficulty": 2.0, "discrimination": 1.3, "content": "Evaluate: ∫ e^x dx"}
],
"selection_method": "max_info",
"max_questions": 10,
"stopping_se": 0.4
}'The response includes the first selected question and the initial ability estimate (θ = 0.0).
curl -X POST http://localhost:8002/sessions/{session_id}/answer \
-H "Content-Type: application/json" \
-d '{"question_id": "q3", "correct": true}'Each answer returns:
- Updated ability estimate (θ) with standard error
- The next optimal question (or session completion)
- 95% confidence interval on ability
When the session completes (SE threshold met or max questions reached), the response includes a full summary:
{
"theta": 0.85,
"standard_error": 0.38,
"confidence_interval": [-0.105, 1.805],
"percentile": 80.2,
"performance_level": "Proficient",
"questions_answered": 7,
"correct": 5,
"accuracy": 71.4
}Test the algorithm against a known true ability:
curl "http://localhost:8002/simulate?true_theta=1.5&num_questions=20&pool_size=100"Returns step-by-step convergence history showing how the estimate approaches the true value.
The probability of a correct response is:
Where:
- θ (theta) — student ability, typically in [-3, +3]
- b — item difficulty, same scale as θ
- a — item discrimination, how well the item differentiates ability levels (0.1 to 3.0)
The information a question provides about ability:
Where Q(θ) = 1 - P(θ). Information is maximized when P(θ) = 0.5 — when the question difficulty matches the student's ability. Higher discrimination (a) yields more information.
Student ability is estimated by finding the θ that maximizes the log-likelihood:
A weak N(0, σ²) prior provides regularization, preventing extreme estimates with sparse data. Standard error is derived from the inverse of total Fisher information.
| Method | Path | Status | Description |
|---|---|---|---|
GET |
/health |
200 | Health check and session count |
POST |
/sessions |
201 | Create adaptive test session |
GET |
/sessions/{id} |
200 | Get session status and ability estimate |
POST |
/sessions/{id}/answer |
200 | Submit answer, receive next question |
DELETE |
/sessions/{id} |
200 | Delete session |
POST |
/estimate |
200 | Standalone ability estimation |
GET |
/simulate |
200 | Simulate adaptive test |
Returns server status, version, and active session count. Use for health checks and monitoring.
Create a new adaptive testing session. Question pool is capped at 1,000 items; duplicate IDs are rejected.
Request body:
| Field | Type | Default | Description |
|---|---|---|---|
question_pool |
QuestionCreate[] |
required | Questions with IRT parameters (max 1,000) |
selection_method |
"max_info" | "target_50" |
"max_info" |
Selection strategy |
max_questions |
int |
20 |
Maximum questions to administer (1-100) |
stopping_se |
float |
0.3 |
Stop when SE falls below this (0.1-1.0) |
Question parameters:
| Field | Type | Range | Description |
|---|---|---|---|
id |
string |
— | Unique identifier (auto-generated if omitted) |
difficulty |
float |
[-3, 3] | Item difficulty (b parameter) |
discrimination |
float |
[0.1, 3] | Item discrimination (a parameter) |
content |
string |
— | Question text |
topic_id |
string |
— | Optional topic grouping |
Submit an answer and receive the next question.
Request body:
| Field | Type | Description |
|---|---|---|
question_id |
string |
ID of the question being answered |
correct |
bool |
Whether the answer was correct |
Standalone ability estimation from a batch of responses. Each response is validated via Pydantic.
Request body: Array of ResponseInput objects:
[
{"difficulty": -1.0, "discrimination": 1.0, "correct": true},
{"difficulty": 0.5, "discrimination": 1.2, "correct": false},
{"difficulty": -0.5, "discrimination": 0.8, "correct": true}
]| Field | Type | Range | Default | Description |
|---|---|---|---|---|
difficulty |
float |
[-3, 3] | required | Item difficulty (b) |
discrimination |
float |
[0.1, 3] | 1.0 |
Item discrimination (a) |
correct |
bool |
— | required | Whether the response was correct |
| Parameter | Type | Range | Default | Description |
|---|---|---|---|---|
true_theta |
float |
[-4, 4] | 0.0 |
Simulated student ability |
num_questions |
int |
[1, 200] | 20 |
Questions to administer |
pool_size |
int |
[1, 1000] | 100 |
Random question pool size |
| Method | Strategy | Best For |
|---|---|---|
max_info |
Maximize Fisher Information at current θ | Fastest convergence, most precise |
target_50 |
Select question closest to 50% probability | Balanced student experience |
| θ Range | Level | Percentile |
|---|---|---|
| ≥ 1.5 | Advanced | ~93rd+ |
| 0.5 to 1.5 | Proficient | ~69th - 93rd |
| -0.5 to 0.5 | Basic | ~31st - 69th |
| -1.5 to -0.5 | Below Basic | ~7th - 31st |
| < -1.5 | Needs Support | Below ~7th |
The API is hardened for production use:
- Input validation — all endpoints use Pydantic models with enforced parameter ranges
- Session limits — max 10,000 concurrent sessions with 1-hour TTL eviction
- Session IDs — 128-bit entropy via
secrets.token_hex(16) - DoS protection — question pool capped at 1,000; simulation params bounded
- Overflow guards — exponent clamping in probability calculation;
math.isfinite()on MLE output - Error handling — global exception handler prevents stack trace leakage
- CORS — wildcard origins without credentials (safe default)
- Duplicate rejection — duplicate question IDs in a pool return 400
# Run all tests
python -m pytest tests/ -v
# With coverage
python -m pytest tests/ -v --cov=src --cov-report=term-missing72 tests with 99% coverage:
- IRT probability calculations and overflow protection
- MLE ability estimation (normal, all-correct, all-incorrect, empty)
- Fisher information computation
- Question selection (max_info and target_50 methods)
- API session lifecycle (create → answer → complete → delete)
- Stopping rules (SE threshold, max questions, pool exhaustion)
- Input validation (parameter ranges, duplicate IDs, invalid methods)
- Session security (capacity limits, TTL eviction)
- Standalone endpoints (/estimate validation, /simulate bounds)
- Baker, F. B., & Kim, S.-H. (2004). Item Response Theory: Parameter Estimation Techniques. Marcel Dekker.
- Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Lawrence Erlbaum.
- van der Linden, W. J., & Glas, C. A. W. (2010). Elements of Adaptive Testing. Springer.
- Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473-492.
- Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Sage.
| Component | Description |
|---|---|
| Adaptive Question Selector | IRT-based adaptive testing (this repo) |
| Question Bank MCP | Question management |
| Student Progress Tracker | Performance analytics |
| Simple Quiz Engine | Real-time quizzes |
| Learning Curriculum Builder | Curriculum design |
| Real-Time Event Pipeline | Event routing |
Contributions welcome! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/improvement) - Ensure tests pass (
python -m pytest tests/ -v) - Ensure linting passes (
ruff check src/ tests/) - Submit a pull request
MIT
Built by Jim Williams | GitHub