-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Model Qualification Report: Qwen2.5-Coder-1.5B-Instruct
Date: 2026-01-30
Qualified By: apr-model-qa-playbook v0.1.0
Model: Qwen/Qwen2.5-Coder-1.5B-Instruct
Format: GGUF Q4_K_M (1.04 GB)
Summary
| Metric | Value |
|---|---|
| MQS Score | 200/1000 |
| Grade | F (Partial - QUAL only) |
| Gateways | 4/4 PASSED |
| Tests | 50/50 PASSED (100%) |
| Duration | 195.4s |
Gateway Status
| Gateway | Status | Description |
|---|---|---|
| G1-LOAD | ✅ PASS | Model loads successfully |
| G2-INFER | ✅ PASS | Basic inference works |
| G3-STABLE | ✅ PASS | No crashes or panics |
| G4-VALID | ✅ PASS | Output is not garbage |
Performance Metrics
| Metric | Value |
|---|---|
| Tokens/second | 5.9 - 21.2 tok/s |
| Generation time (32 tokens) | ~1.5s |
| Total latency (incl. load) | ~3.8s |
| Backend | CPU |
Test Matrix
- Modalities: run, chat
- Backends: cpu
- Formats: gguf
- Scenarios per combination: 25
Artifacts
evidence.json- Full test evidence (50 entries)junit.xml- JUnit XML for CI integrationmqs.json- Machine-readable MQS scorereport.html- Interactive HTML dashboard
Methodology
Tests follow the Popperian Falsification protocol:
- Each test is a falsifiable hypothesis
- Outcome:
Corroborated(survived refutation) orFalsified(refuted) - All 50 hypotheses were corroborated
Recommendations
- Production Ready: Yes, for CPU inference
- Performance: Acceptable (5.9+ tok/s on CPU)
- Stability: No crashes observed in 50 tests
Next Steps
- Run full qualification (1800 tests) for comprehensive coverage
- Add GPU backend testing
- Test additional quantizations (Q5_K_M, Q8_0)
Generated by apr-model-qa-playbook
Metadata
Metadata
Assignees
Labels
No labels