Skip to content
This repository has been archived by the owner on Oct 13, 2022. It is now read-only.

[WIP] 2-state HMM topo as an alternative to CTC topo #126

Open
wants to merge 12 commits into
base: master
Choose a base branch
from
Prev Previous commit
Next Next commit
Fix HMM topo sorting and followup tokens
  • Loading branch information
pzelasko committed Mar 13, 2021
commit d2696ca9f4d18ed52408a773118c46a6e2730c00
8 changes: 6 additions & 2 deletions snowfall/training/hmm_topo.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@ def build_hmm_topo_2state(tokens: List[int]) -> k2.Fsa:
Returns:
An FST that converts a sequence of HMM state IDs to a sequence of token IDs.
"""
followup_tokens = range(len(tokens), len(tokens) * 2)
min_token_id = min(tokens)
followup_tokens = list(range(
len(tokens) + min_token_id,
2 * len(tokens) + min_token_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you making an assumption here that tokens is contiguous?

))
num_states = len(tokens) + 2 # + start state, + final state
arcs = []

Expand All @@ -42,7 +46,7 @@ def build_hmm_topo_2state(tokens: List[int]) -> k2.Fsa:
arcs += [f'{num_states - 1}']

# Build the FST
arcs = '\n'.join(sorted(arcs))
arcs = '\n'.join(sorted(arcs, key=lambda arc: int(arc.split()[0])))
ans = k2.Fsa.from_str(arcs)
ans = k2.arc_sort(ans)
return ans