Skip to content

Commit

Permalink
Merge pull request #160 from parlance/feature/doc_fixes
Browse files Browse the repository at this point in the history
Doc fixes
  • Loading branch information
SeanNaren authored Aug 22, 2020
2 parents 4487b31 + 1b1f425 commit 9cc6e54
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 18 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ beam_results, beam_scores, timesteps, out_lens = decoder.decode(output)
will make your beam search exponentially slower. Furthermore, the longer your outputs, the more time large beams will take.
This is an important parameter that represents a tradeoff you need to make based on your dataset and needs.
- `num_processes` Parallelize the batch using num_processes workers. You probably want to pass the number of cpus your computer has. You can find this in python with `import multiprocessing` then `n_cpus = multiprocessing.cpu_count()`. Default 4.
- `blank_id` This should be the index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding.
- `blank_id` This should be the index of the CTC blank token (probably 0).
- `log_probs_input` If your outputs have passed through a softmax and represent probabilities, this should be false, if they passed through a LogSoftmax and represent negative log likelihood, you need to pass True. If you don't understand this, run `print(output[0][0].sum())`, if it's a negative number you've probably got NLL and need to pass True, if it sums to ~1.0 you should pass False. Default False.

### Inputs to the `decode` method
Expand All @@ -60,7 +60,7 @@ beam_results, beam_scores, timesteps, out_lens = decoder.decode(output)

4 things get returned from `decode`
1. `beam_results` - Shape: BATCHSIZE x N_BEAMS X N_TIMESTEPS A batch containing the series of characters (these are ints, you still need to decode them back to your text) representing results from a given beam search. Note that the beams are almost always shorter than the total number of timesteps, and the additional data is non-sensical, so to see the top beam (as int labels) from the first item in the batch, you need to run `beam_results[0][0][:out_len[0][0]]`.
1. `beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the likelihood of each beam (I think this is p=1/e\**beam_score). If this is true, you can get the model's confidence that that beam is correct with `p=1/np.exp(beam_score)` **more info needed**
1. `beam_scores` - Shape: BATCHSIZE x N_BEAMS x N_TIMESTEPS A batch with the approximate CTC score of each beam (look at the code [here](https://github.com/parlance/ctcdecode/blob/master/ctcdecode/src/ctc_beam_search_decoder.cpp#L191-L192) for more info). If this is true, you can get the model's confidence that the beam is correct with `p=1/np.exp(beam_score)`.
1. `timesteps` - Shape: BATCHSIZE x N_BEAMS The timestep at which the nth output character has peak probability. Can be used as alignment between the audio and the transcript.
1. `out_lens` - Shape: BATCHSIZE x N_BEAMS. `out_lens[i][j]` is the length of the jth beam_result, of item i of your batch.

Expand Down
42 changes: 26 additions & 16 deletions ctcdecode/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,24 @@

class CTCBeamDecoder(object):
"""
Pytorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder
PyTorch wrapper for DeepSpeech PaddlePaddle Beam Search Decoder.
Args:
labels (list): The tokens/vocab used to train your model. They should be in the same order as they are in your model's outputs.
labels (list): The tokens/vocab used to train your model.
They should be in the same order as they are in your model's outputs.
model_path (basestring): The path to your external KenLM language model(LM)
alpha (float): Weighting associated with the LMs probabilities. A weight of 0 means the LM has no effect.
alpha (float): Weighting associated with the LMs probabilities.
A weight of 0 means the LM has no effect.
beta (float): Weight associated with the number of words within our beam.
cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.
cutoff_top_n (int): Cutoff number in pruning. Only the top cutoff_top_n characters
with the highest probability in the vocab will be used in beam search.
cutoff_prob (float): Cutoff probability in pruning. 1.0 means no pruning.
beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams, but they also
will make your beam search exponentially slower.
beam_width (int): This controls how broad the beam search is. Higher values are more likely to find top beams,
but they also will make your beam search exponentially slower.
num_processes (int): Parallelize the batch using num_processes workers.
blank_id (int): Index of the blank token (probably 0) used when training your model so that ctcdecode can remove it during decoding.
log_probs_input (bool): Pass False if your model has passed through a softmax and output probabilities sum to 1. Pass True otherwise.
blank_id (int): Index of the CTC blank token (probably 0) used when training your model.
log_probs_input (bool): False if your model has passed through a softmax and output probabilities sum to 1.
"""

def __init__(self, labels, model_path=None, alpha=0, beta=0, cutoff_top_n=40, cutoff_prob=1.0, beam_width=100,
num_processes=4, blank_id=0, log_probs_input=False):
self.cutoff_top_n = cutoff_top_n
Expand All @@ -36,19 +39,26 @@ def __init__(self, labels, model_path=None, alpha=0, beta=0, cutoff_top_n=40, cu

def decode(self, probs, seq_lens=None):
"""
Conduct the beamsearch on model outputs and return results
Conducts the beamsearch on model outputs and return results.
Args:
probs (Tensor) - A rank 3 tensor representing model outputs. Shape is batch x num_timesteps x num_labels.
seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional, if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
seq_lens (Tensor) - A rank 1 tensor representing the sequence length of the items in the batch. Optional,
if not provided the size of axis 1 (num_timesteps) of `probs` is used for all items
Returns:
tuple: (beam_results, beam_scores, timesteps, out_lens)
beam_results (Tensor): A rank 3 tensor representing the top n beams of a batch of items. Shape: batchsize x num_beams x num_timeteps. Results are still encoded as ints at this stage.
beam_scores (Tensor): A rank 3 tensor representing the likelihood of each beam in beam_results. Shape: batchsize x num_beams x num_timeteps
timesteps (Tensor): A rank 2 tensor representing the timesteps at which the nth output character has peak probability. To be used as alignment between audio and transcript. Shape: batchsize x num_beams
out_lens (Tensor): A rank 2 tensor representing the length of each beam in beam_results. Shape: batchsize x n_beams.
beam_results (Tensor): A 3-dim tensor representing the top n beams of a batch of items.
Shape: batchsize x num_beams x num_timesteps.
Results are still encoded as ints at this stage.
beam_scores (Tensor): A 3-dim tensor representing the likelihood of each beam in beam_results.
Shape: batchsize x num_beams x num_timesteps
timesteps (Tensor): A 2-dim tensor representing the timesteps at which the nth output character
has peak probability.
To be used as alignment between audio and transcript.
Shape: batchsize x num_beams
out_lens (Tensor): A 2-dim tensor representing the length of each beam in beam_results.
Shape: batchsize x n_beams.
"""
probs = probs.cpu().float()
Expand Down

0 comments on commit 9cc6e54

Please sign in to comment.