System Hallucination Scale (SHS)

The System Hallucination Scale (SHS) is a lightweight, human-centered evaluation instrument for assessing hallucination tendencies in Large Language Models (LLMs).

Inspired by established psychometric tools such as the System Usability Scale (SUS) and the System Causability Scale (SCS), SHS provides a fast, interpretable, and domain-agnostic way to capture how humans perceive factual inconsistency, incoherence, and misleading reasoning in model-generated text.

Reference

Müller, H., Steiger, D., Plass, M., & Holzinger, A. (2026).
"The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models."
In submission.

Theoretical Foundation

Background

The term hallucination in the context of LLMs describes instances where a model generates content that is not grounded in input data, contextual constraints, or verifiable external knowledge. Such outputs may be subtle or overt and often appear reliable due to the model's fluent and coherent language. Hallucinations differ from adversarial errors, as they typically arise from the model's normal generative behavior and are therefore difficult to detect automatically, particularly in open-domain or under-specified settings.

Design Philosophy

SHS is explicitly not an automatic hallucination detector or benchmark metric. Instead, it serves as a subjective measurement instrument that captures how hallucinations manifest from a user perspective under realistic interaction conditions. This human-centered approach addresses the gap between quantifiable performance indicators and the broader dimensions of trust, reliability, and human–AI interaction.

Scale Structure

SHS consists of:

10 items organized into 5 dimension pairs
5-point Likert scale (-2 to +2: strongly disagree to strongly agree)
Alternating positive and negative statements to reduce response bias
Dimension-based scoring with consistency checking

Dimensions

The five dimensions evaluated by SHS are:

Factual Accuracy (q1, q2): Assesses the factual reliability of model outputs
Source Reliability (q3, q4): Evaluates source traceability and verification
Logical Coherence (q5, q6): Measures logical structure and reasoning quality
Deceptiveness (q7, q8): Captures how misleading false information is presented
Responsiveness to Guidance (q9, q10): Assesses the model's ability to correct errors when prompted

Scoring Algorithm

For each dimension pair (questions a and b):

Dimension Score: score = (response_a - response_b) / 4
Yields a score from -1.0 (high hallucination risk) to +1.0 (low hallucination risk)
Consistency Score: consistency = (response_a + response_b) / 4
Low absolute value (≤ 0.1): Very consistent
Medium absolute value (≤ 0.5): Good consistency
High absolute value (> 0.5): Inconsistent (warning)

Overall Score: Average of all 5 dimension scores (range: -1.0 to +1.0)

Idea

While hallucinations are widely recognized as a core limitation of modern LLMs, the field still lacks a simple, standardized, and user-facing instrument to assess them systematically.

SHS addresses this gap by offering:

a 10-item Likert-scale questionnaire
alternating positive and negative statements to reduce response bias
a scoring scheme that yields interpretable scores in the range [-1, +1]

The scale is designed for quick application in research, evaluation studies, benchmarking, and real-world deployments.

What is hosted here

This repository hosts:

the SHS scale definition and questionnaire items
Reference implementations in Python and JavaScript
Interactive web calculator for conducting evaluations
Batch processing tools for large-scale studies
Documentation and usage guidelines
Supporting materials for applying SHS in research and deployment

Available Tools

Web Component (web-components/): Interactive browser-based calculator with multi-language support
Python Implementation (python/): Reference implementation for programmatic use and batch processing
Batch Processor (python/batch_processor.py): Tool for processing multiple evaluations and generating statistics

The public interactive calculator is available at:
👉 https://hmmc.at/shs/

Intended use cases

Human evaluation of LLM outputs: Quick assessment of model-generated text
Comparative testing: Systematic comparison of multiple models, prompting strategies, or configurations
Deployment monitoring: Ongoing evaluation of hallucination tendencies in production systems
Research applications: Studies on trust, reliability, and human–AI interaction
User studies: Structured data collection with built-in consistency checking
Educational purposes: Teaching and demonstrating hallucination assessment concepts
Development integration: Embedding SHS evaluation in automated testing pipelines

Quick Start

Web Calculator

The easiest way to use SHS is through the interactive web calculator:

Visit https://hmmc.at/shs/ or use the local web-components/demo.html
Answer all 10 questions using the 5-point Likert scale
Click "Calculate SHS" to see your results
Export results as JSON or CSV for further analysis

Python Implementation

from python.shs_calculator import SHSCalculator

# Define responses (q1-q10, values from -2 to +2)
responses = {
    "q1": 2, "q2": -2, "q3": 1, "q4": -1, "q5": 2,
    "q6": -2, "q7": 1, "q8": -1, "q9": 1, "q10": -1
}

# Calculate scores
result = SHSCalculator.calculate(responses)
print(f"Overall Score: {result.overall_score:.3f}")

Batch Processing

# Process multiple evaluations from JSON file
python python/batch_processor.py input.json --output results.json --stats

# Process CSV file and export statistics
python python/batch_processor.py input.csv --format csv --stats-output stats.json

See the web-components README and python README for detailed documentation.

Repository Structure

system-hallucination-scale/
├── README.md                    # This file
├── LICENSE                      # Apache-2.0 (code)
├── LICENSE-SCALE                # CC BY-NC-ND 4.0 (scale text)
├── System_Hallucination_Scale_SHS_V2-2.pdf  # Research paper
├── web-components/             # Interactive web calculator
│   ├── shs-calculator.js        # Main calculator class
│   ├── shs-calculator.css       # Styles
│   ├── demo.html                # Demo page
│   ├── README.md                # Web component documentation
│   └── tests/                   # Unit tests
└── python/                      # Python reference implementation
    ├── shs_calculator.py        # Core calculator class
    ├── batch_processor.py      # Batch processing tool
    ├── requirements.txt         # Dependencies (none required)
    └── README.md                # Python documentation

Status

SHS is an active research instrument and continues to evolve.

Feedback, discussion, and contributions are welcome.

Citation

If you use SHS in your research, please cite:

@article{muller2026system,
  title={The System Hallucination Scale (SHS): A Minimal yet Effective Human-Centered Instrument for Evaluating Hallucination-Related Behavior in Large Language Models},
  author={M{\"u}ller, Heimo and Steiger, Dominik and Plass, Markus and Holzinger, Andreas},
  journal={In submission},
  year={2026}
}

Related Work

SHS is inspired by established psychometric instruments:

System Usability Scale (SUS): Quick and dirty usability scale
System Causability Scale (SCS): Explainability assessment tool

These instruments demonstrate how complex, subjective phenomena can be operationalized through standardized measurement tools.

License

This repository uses dual licensing:

Scale text (questionnaire items, instructions, scoring descriptions): CC BY-NC-ND 4.0
See LICENSE-SCALE for details.
You can share the scale for non-commercial purposes with attribution. Commercial use and modifications require separate permission.
Code (scripts, calculators, dashboards, notebooks): Apache-2.0
See LICENSE for details.

System Hallucination Scale (SHS)
Human-centered evaluation of hallucinations in AI systems.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

System Hallucination Scale (SHS)

Reference

Theoretical Foundation

Background

Design Philosophy

Scale Structure

Dimensions

Scoring Algorithm

Idea

What is hosted here

Available Tools

Intended use cases

Quick Start

Web Calculator

Python Implementation

Batch Processing

Repository Structure

Status

Citation

Related Work

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
python		python
web-components		web-components
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE-SCALE		LICENSE-SCALE
README.md		README.md
USAGE_GUIDE.md		USAGE_GUIDE.md

Folders and files

Latest commit

History

Repository files navigation

System Hallucination Scale (SHS)

Reference

Theoretical Foundation

Background

Design Philosophy

Scale Structure

Dimensions

Scoring Algorithm

Idea

What is hosted here

Available Tools

Intended use cases

Quick Start

Web Calculator

Python Implementation

Batch Processing

Repository Structure

Status

Citation

Related Work

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages