If you're using any Speech-to-Text or Speech Recognition system to generate transcriptions from your audio/video content, then you can use this tool to compare how well it is doing against a human generated transcription. If you're not sure how to generate transcription, you can take a look here for list of tutorials to help you get started.
This is a simple utility to perform a quick evaluation on the results generated by any Speech to text (STT) or Automatic Speech Recognition (ASR) System.
This utility can calculate following metrics -
- Word Error Rate (WER), which is a most common metric of measuring the performance of a Speech Recognition or Machine translation system
- Word Information Loss (WIL), which is a simple approximation to the proportion of word information lost. Refer to this paper for more info.
- Levenshtein Distance calculated at word level.
- Number of Word level insertions, deletions and mismatches between the original file and the generated file.
- Number of Phrase level insertions, deletions and mismatches between the original file and the generated file.
- Color Highlighted text Comparison to visualize the differences.
- General Statistics about the original and generated files (bytes, characters, words, new lines etc.)
The utility also performs the pre-processing or normalization of the text in the provided files based on following operations -
- Remove Speaker Name: Remove the Speaker name at the beginning of the line.
- Remove Annotations: Remove any custom annotations added during transcriptions.
- Remove Whitespaces: Remove any extra white spaces.
- Remove Quotes: Remove any double quotes
- Remove Dashes: Remove any dashes
- Remove Punctuations: Remove any punctuations (.,?!)
- Convert contents to lower case
Make sure that you have NodeJS v8+ installed on your system.
npm install -g speech-recognition-evaluation
Verify installation by simply running:
asr-eval
Simplest way to run your first evaluation is by simply passing original
and generated
options to asr-eval
command.
Where, original
is a plain text file containing original transcript to be used as reference; usually this is generated by human beings.
And generated
is a plain text file containing generated transcript by the STT/ASR system.
asr-eval --original ./original-file.txt --generated ./generated-file.txt
This would print simply the Word Error Rate (WER) between the provided files. This is how the output should look like:
Word Error Rate (WER): 13.61350109561817%
To find more information about all the available options:
asr-eval --help
All the available usage options would be printed:
Synopsis
$ asr-eval --original file --generated file
$ asr-eval [options] --original file --generated file
$ asr-eval --help
Options
-o, --original file Original File to be used as reference. Usually, this should be the
transcribed file by a Human being.
-g, --generated file File with the output generated by Speech Recognition System.
-e, --wer [true|false] Default: true. Print Word Error Rate (WER).
-i, --wil [true|false] Default: true. Print Word Information Loss (WIL).
--distance [true|false] Default: false. Print total word distance after comparison.
--stats [true|false] Default: false. Print statistics about original and generate files, before
and after pre-processing. Also prints statistics about word level and phrase
level differences.
--pairs [true|false] Default: false. Print all the difference pairs with type of difference.
--textcomparison [true|false] Default: false. Print the text comparison between two files with
highlighting.
--removespeakers [true|false] Default: true. Remove the speaker at the start of each line in files before
calculations. The speaker should be separated by colon ":" i.e. speaker_name:
text For e.g. "John Doe: Hello, I am John." would get converted to simply
"Hello, I am John."
--removeannotations [true|false] Default: true. Remove any custom annotations in the transcript before
calculations. This is useful when removing custom annotations done by human
transcribers. Anything in square brackets [] are detected as annotations.
For e.g. "Hello, I am [inaudible 00:12] because of few reasons." would get
converted to "Hello, I am because of few reasons."
--removewhitespaces [true|false] Default: true. Remove any extra white spaces before calculations.
--removequotes [true|false] Default: true. Remove any double quotes '"' from the files before
calculations.
--removedashes [true|false] Default: true. Remove any dashes (hyphens) "-" from the files before
calculations.
--removepunctuations [true|false] Default: true. Remove any punctuations ".,?!" from the files before
calculations.
--lowercase [true|false] Default: true. Convert both files to lower case before calculations. This is
useful if evaluation needs to be done in case-insensitive way.
--help [true|false] Print this usage guide.
If you need help installing or using the utility, please give a shout out in our slack channel
If you've instead found a bug or would like new features added, go ahead and open issues or pull requests against this repo!