Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Info: current statistics #22

Open
wollmers opened this issue Jul 18, 2020 · 0 comments
Open

Info: current statistics #22

wollmers opened this issue Jul 18, 2020 · 0 comments
Labels
Release 1.1.1 Refers to https://github.com/UB-Mannheim/AustrianNewspapers/tree/1.1.1

Comments

@wollmers
Copy link
Contributor

wollmers commented Jul 18, 2020

Compared original XML "ONB_newseye" to current line texts "AustrianNewspapers".

compare_xml.pl Version 0.01

Compare XML text output against ground truth (GRT):
XML: ONB_newseye
GRT: AustrianNewspapers

Summary:

              lines   words   chars
items ocr:    57541  326524 2198240 matches + inserts + substitutions
items grt:    57541  326394 2198051 matches + deletions + substitutions
matches:      23961  265356 2125325 matches
edits:        33580   61346   73806 inserts + deletions + substitutions
 subss:       33580   60860   71835 substitutions
 inserts:         0     308    1080 inserts
 deletions:       0     178     891 deletions
precision:   0.4164  0.8127  0.9668 matches / (matches + substitutions + inserts)
recall:      0.4164  0.8130  0.9669 matches / (matches + substitutions + deletions)
accuracy:    0.4164  0.8122  0.9664 matches / (matches + substitutions + inserts + deletions)
f-score:     0.4164  0.8128  0.9669 ( 2 * recall * precision ) / (recall + precision )

Shortened list of the edits/mismatches:

Character match (confusion) table:
GRT => OCR  ratio  errors   count
---    --- ------ ------- -------
'ſ' => 's' 0.9985   56885   56971
'⸗' => '-' 0.0052      61   11639
'⸗' => '=' 0.3232    3762   11639
'⸗' => '¬' 0.6691    7788   11639
                    -----
SUM                 68496
+ transcription      1000   estimated transcription level 1 -> 2
                    -----
TOTAL transcription 69496

edits               73806
- transcription    -69496
                    -----
corrections          4310  (0,20% of all characters)

Rough guess of errors still in the GRT: 1000 - 2000.

@JKamlah JKamlah added the Release 1.1.1 Refers to https://github.com/UB-Mannheim/AustrianNewspapers/tree/1.1.1 label Apr 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release 1.1.1 Refers to https://github.com/UB-Mannheim/AustrianNewspapers/tree/1.1.1
Projects
None yet
Development

No branches or pull requests

2 participants