Inter-Observer Reliability Analysis

Statistical validation of labeling consistency across three independent raters for a handwritten digit classification dataset.

ZPD-Numbers -- 2025/2026

About

This repository contains the inter-observer reliability analysis for a handwritten digit classification task. Three raters -- Michal, Olivier, and Vincenzo -- independently labeled 800 images into 10 categories (digits 0 through 9). The goal is to statistically verify that the human annotations are consistent and trustworthy before using them as ground truth.

Methodology

The analysis applies standard inter-rater reliability metrics:

Metric	Scope	Result
Fleiss' Kappa	All 3 raters simultaneously	0.9981 (almost perfect)
Cohen's Kappa	Michal vs Olivier	0.9972
Cohen's Kappa	Michal vs Vincenzo	0.9972
Cohen's Kappa	Olivier vs Vincenzo	1.0000 (perfect)

Out of 800 images, only 2 had any disagreement between raters -- both involving confusion between digits 8 and 9. No image had all three raters disagree.

Repository Contents

inter_observer_reliability.ipynb   Main analysis notebook
inter_observer_reliability.html    Rendered notebook (viewable in browser)
__merged_michal_.csv               Labels from Rater 1
__merged_olivier_.csv              Labels from Rater 2
__merged_vincenzo_.csv             Labels from Rater 3

How to Run

pip install pandas numpy scikit-learn statsmodels jupyter
jupyter notebook inter_observer_reliability.ipynb

Interpretation

A Fleiss' Kappa of 0.9981 falls into the "almost perfect" agreement range (> 0.81) on the Landis and Koch scale. This confirms that the labeling process is highly reliable and the resulting annotations can be used with confidence as ground truth for downstream tasks.

Kappa Range	Interpretation
< 0.00	Poor
0.00 -- 0.20	Slight
0.21 -- 0.40	Fair
0.41 -- 0.60	Moderate
0.61 -- 0.80	Substantial
0.81 -- 1.00	Almost Perfect

ZPD-Numbers-2025-2026 | Michal Tarnowski, Olivier, Vincenzo

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
IMG_4263.jpg		IMG_4263.jpg
IMG_4378.jpg		IMG_4378.jpg
README.md		README.md
__merged_michal_.csv		__merged_michal_.csv
__merged_olivier_.csv		__merged_olivier_.csv
__merged_vincenzo_.csv		__merged_vincenzo_.csv
inter_observer_reliability.html		inter_observer_reliability.html
inter_observer_reliability.ipynb		inter_observer_reliability.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inter-Observer Reliability Analysis

About

Methodology

Repository Contents

How to Run

Interpretation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Inter-Observer Reliability Analysis

About

Methodology

Repository Contents

How to Run

Interpretation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages