Skip to content

symblai/speech-recognition-evaluation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automatic Speech Recognition (ASR) Evaluation

If you're using any Speech-to-Text or Speech Recognition system to generate transcriptions from your audio/video content, then you can use this tool to compare how well it is doing against a human generated transcription. If you're not sure how to generate transcription, you can take a look here for list of tutorials to help you get started.

What can this utility do?

This is a simple utility to perform a quick evaluation on the results generated by any Speech to text (STT) or Automatic Speech Recognition (ASR) System.

This utility can calculate following metrics -

  • Word Error Rate (WER), which is a most common metric of measuring the performance of a Speech Recognition or Machine translation system
  • Word Information Loss (WIL), which is a simple approximation to the proportion of word information lost. Refer to this paper for more info.
  • Levenshtein Distance calculated at word level.
  • Number of Word level insertions, deletions and mismatches between the original file and the generated file.
  • Number of Phrase level insertions, deletions and mismatches between the original file and the generated file.
  • Color Highlighted text Comparison to visualize the differences.
  • General Statistics about the original and generated files (bytes, characters, words, new lines etc.)

The utility also performs the pre-processing or normalization of the text in the provided files based on following operations -

  • Remove Speaker Name: Remove the Speaker name at the beginning of the line.
  • Remove Annotations: Remove any custom annotations added during transcriptions.
  • Remove Whitespaces: Remove any extra white spaces.
  • Remove Quotes: Remove any double quotes
  • Remove Dashes: Remove any dashes
  • Remove Punctuations: Remove any punctuations (.,?!)
  • Convert contents to lower case

Pre-requisites

Make sure that you have NodeJS v8+ installed on your system.

Installation

npm install -g speech-recognition-evaluation

Verify installation by simply running:

asr-eval

Usage

Simplest way to run your first evaluation is by simply passing original and generated options to asr-eval command. Where, original is a plain text file containing original transcript to be used as reference; usually this is generated by human beings. And generated is a plain text file containing generated transcript by the STT/ASR system.

asr-eval --original ./original-file.txt --generated ./generated-file.txt

This would print simply the Word Error Rate (WER) between the provided files. This is how the output should look like:

Word Error Rate (WER): 13.61350109561817%

To find more information about all the available options:

asr-eval --help

All the available usage options would be printed:

Synopsis

  $ asr-eval --original file --generated file           
  $ asr-eval [options] --original file --generated file 
  $ asr-eval --help                                     

Options

  -o, --original file                 Original File to be used as reference. Usually, this should be the            
                                      transcribed file by a Human being.                                            
  -g, --generated file                File with the output generated by Speech Recognition System.                  
  -e, --wer [true|false]              Default: true. Print Word Error Rate (WER).                                   
  -i, --wil [true|false]              Default: true. Print Word Information Loss (WIL).                             
  --distance [true|false]             Default: false. Print total word distance after comparison.                   
  --stats [true|false]                Default: false. Print statistics about original and generate files, before    
                                      and after pre-processing. Also prints statistics about word level and phrase  
                                      level differences.                                                            
  --pairs [true|false]                Default: false. Print all the difference pairs with type of difference.       
  --textcomparison [true|false]       Default: false. Print the text comparison between two files with              
                                      highlighting.                                                                 
  --removespeakers [true|false]       Default: true. Remove the speaker at the start of each line in files before   
                                      calculations. The speaker should be separated by colon ":" i.e. speaker_name: 
                                      text For e.g. "John Doe: Hello, I am John." would get converted to simply     
                                      "Hello, I am John."                                                           
  --removeannotations [true|false]    Default: true. Remove any custom annotations in the transcript before         
                                      calculations. This is useful when removing custom annotations done by human   
                                      transcribers.  Anything in square brackets [] are detected as annotations.    
                                      For e.g. "Hello, I am [inaudible 00:12] because of few reasons." would get    
                                      converted to "Hello, I am because of few reasons."                            
  --removewhitespaces [true|false]    Default: true. Remove any extra white spaces before calculations.             
  --removequotes [true|false]         Default: true. Remove any double quotes '"' from the files before             
                                      calculations.                                                                 
  --removedashes [true|false]         Default: true. Remove any dashes (hyphens) "-" from the files before          
                                      calculations.                                                                 
  --removepunctuations [true|false]   Default: true. Remove any punctuations ".,?!" from the files before           
                                      calculations.                                                                 
  --lowercase [true|false]            Default: true. Convert both files to lower case before calculations. This is  
                                      useful if evaluation needs to be done in case-insensitive way.                
  --help [true|false]                 Print this usage guide.                                                                                   

Getting help

If you need help installing or using the utility, please give a shout out in our slack channel

If you've instead found a bug or would like new features added, go ahead and open issues or pull requests against this repo!