Merge pull request #160 from parlance/feature/doc_fixes

Doc fixes
parlance · Aug 22, 2020 · 9cc6e54 · 9cc6e54
2 parents 4487b31 + 1b1f425
commit 9cc6e54
Show file tree

Hide file tree

Showing 2 changed files with 28 additions and 18 deletions.
diff --git a/README.md b/README.md
@@ -50,7 +50,7 @@ beam_results, beam_scores, timesteps, out_lens = decoder.decode(output)
  will make your beam search exponentially slower. Furthermore, the longer your outputs, the more time large beams will take.
   This is an important parameter that represents a tradeoff you need to make based on your dataset and needs.
  - `num_processes` Parallelize the batch using num_processes workers. You probably want to pass the number of cpus your computer has. You can find this in python with `import multiprocessing` then `n_cpus = multiprocessing.cpu_count()`. Default 4.
- - `blank_id` This should be the index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding. 
+ - `blank_id` This should be the index of the CTC blank token (probably 0). 
  - `log_probs_input` If your outputs have passed through a softmax and represent probabilities, this should be false, if they passed through a LogSoftmax and represent negative log likelihood, you need to pass True. If you don't understand this, run `print(output[0][0].sum())`, if it's a negative number you've probably got NLL and need to pass True, if it sums to ~1.0 you should pass False. Default False.
 
 ### Inputs to the `decode` method
@@ -60,7 +60,7 @@ beam_results, beam_scores, timesteps, out_lens = decoder.decode(output)
 
 4 things get returned from `decode`
  1. `beam_results` - Shape: BATCHSIZE x N_BEAMS X N_TIMESTEPS A batch containing the series of characters (these are ints, you still need to decode them back to your text) representing results from a given beam search. Note that the beams are almost always shorter than the total number of timesteps, and the additional data is non-sensical, so to see the top beam (as int labels) from the first item in the batch, you need to run `beam_results[0][0][:out_len[0][0]]`.
- 1. `beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the likelihood of each beam (I think this is p=1/e\**beam_score). If this is true, you can get the model's confidence that that beam is correct with `p=1/np.exp(beam_score)` **more info needed**
+ 1. `beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the approximate CTC score of each beam (look at the code [here](https://github.com/parlance/ctcdecode/blob/master/ctcdecode/src/ctc_beam_search_decoder.cpp#L191-L192) for more info). If this is true, you can get the model's confidence that the beam is correct with `p=1/np.exp(beam_score)`.
  1. `timesteps` - Shape: BATCHSIZE x N_BEAMS The timestep at which the nth output character has peak probability. Can be used as alignment between the audio and the transcript.
  1. `out_lens` - Shape: BATCHSIZE x N_BEAMS. `out_lens[i][j]` is the length of the jth beam_result, of item i of your batch. 
 

diff --git a/ctcdecode/__init__.py b/ctcdecode/__init__.py
@@ -4,21 +4,24 @@
 
 class CTCBeamDecoder(object):
     """
-    Pytorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder
-
+    PyTorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder.
     Args:
-        labels (list): The tokens/vocab used to train your model. They should be in the same order as they are in your model's outputs.
+        labels (list): The tokens/vocab used to train your model.
+                        They should be in the same order as they are in your model's outputs.
         model_path (basestring): The path to your external KenLM language model(LM)
-        alpha (float): Weighting associated with the LMs probabilities. A weight of 0 means the LM has no effect.
+        alpha (float): Weighting associated with the LMs probabilities.
+                        A weight of 0 means the LM has no effect.
         beta (float):  Weight associated with the number of words within our beam.
-        cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.
+        cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters
+                            with the highest probability in the vocab will be used in beam search.
         cutoff_prob (float): Cutoff probability in pruning. 1.0 means no pruning.
-        beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams, but they also
-        will make your beam search exponentially slower.
+        beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams,
+                            but they also will make your beam search exponentially slower.
         num_processes (int): Parallelize the batch using num_processes workers. 
-        blank_id (int): Index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding.
-        log_probs_input (bool): Pass False if your model has passed through a softmax and output probabilities sum to 1. Pass True otherwise.
+        blank_id (int): Index of the CTC blank token (probably 0) used when training your model.
+        log_probs_input (bool): False if your model has passed through a softmax and output probabilities sum to 1.
     """
+
     def __init__(self, labels, model_path=None, alpha=0, beta=0, cutoff_top_n=40, cutoff_prob=1.0, beam_width=100,
                  num_processes=4, blank_id=0, log_probs_input=False):
         self.cutoff_top_n = cutoff_top_n
@@ -36,19 +39,26 @@ def __init__(self, labels, model_path=None, alpha=0, beta=0, cutoff_top_n=40, cu
 
     def decode(self, probs, seq_lens=None):
         """
-        Conduct the beamsearch on model outputs and return results
-
+        Conducts the beamsearch on model outputs and return results.
         Args:
         probs (Tensor) - A rank 3 tensor representing model outputs. Shape is batch x num_timesteps x num_labels.
-        seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional, if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
+        seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional,
+        if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
 
         Returns:
         tuple: (beam_results, beam_scores, timesteps, out_lens)
 
-        beam_results (Tensor): A rank 3 tensor representing the top n beams of a batch of items. Shape: batchsize x num_beams x num_timeteps. Results are still encoded as ints at this stage.
-        beam_scores (Tensor): A rank 3 tensor representing the likelihood of each beam in beam_results. Shape: batchsize x num_beams x num_timeteps
-        timesteps (Tensor): A rank 2 tensor representing the timesteps at which the nth output character has peak probability. To be used as alignment between audio and transcript. Shape: batchsize x num_beams
-        out_lens (Tensor): A rank 2 tensor representing the length of each beam in beam_results. Shape: batchsize x n_beams.
+        beam_results (Tensor): A 3-dim tensor representing the top n beams of a batch of items.
+                                Shape: batchsize x num_beams x num_timesteps.
+                                Results are still encoded as ints at this stage.
+        beam_scores (Tensor): A 3-dim tensor representing the likelihood of each beam in beam_results.
+                                Shape: batchsize x num_beams x num_timesteps
+        timesteps (Tensor): A 2-dim tensor representing the timesteps at which the nth output character
+                                has peak probability.
+                                To be used as alignment between audio and transcript.
+                                Shape: batchsize x num_beams
+        out_lens (Tensor): A 2-dim tensor representing the length of each beam in beam_results.
+                                Shape: batchsize x n_beams.
 
         """
         probs = probs.cpu().float()