Closed
Description
NOTE
并不只是简单照搬nemo的规则,在其基础上进行了简化,fst编译时间从460s降低到56s,fst大小从40M降低到4M,同时基本功能依旧
高优
- basic component, like cardinal and word (整数,12345) feat(tn): [cr_id_skip] Support english tn, cardinal and word #203 @xingchensong
- ordinal (序数,1st -> first) [tn] english tn, support ordinal #204 @xingchensong
- date (日期,2024/04/03) [tn] english tn, support date #205 @xingchensong
- decimal (小数,12.01 dollers) [tn] english tn, support decimal #207 @xingchensong
- fraction (分数,1/2) [tn] english tn, support fraction #209 @xingchensong
- time (时间,2:30 a.m.)[tn] english tn, support time #210 @xingchensong
- measure (量词,23kg) [tn] english tn, support measure #211 @xingchensong
- money (钱,$23) [tn] english, support money #212 @xingchensong
- whitelist (白名单) [tn] english tn whitelist for asr and tts #206 [tn] english tn, support whitelist #216 @whiteshirt0429
低优
- telephone (电话号码,010-2345-232323) [tn] english, support telephone #213 @xingchensong
- electronic (URLs, email addresses,cdf1@abc.edu) [tn] english, support electronic #214 @xingchensong
- roman (罗马数字,IV->four) [tn] tn english, support roman #215
低低优 (暂时可以不实现)
serial (数字和字母,特殊字符组合,c2365-zh)
- range (范围, 10% to 20%) [tn] english tn, support range #233
ref:
- build_tagger(): https://github.com/NVIDIA/NeMo-text-processing/tree/main/nemo_text_processing/text_normalization/en/taggers
- buid_verbalizer(): https://github.com/NVIDIA/NeMo-text-processing/tree/main/nemo_text_processing/text_normalization/en/verbalizers
TODO
- c++ runtime 适配 [runtime] support english tn #219 [runtime] fix english tn #220
- 编译优化,目前编译时间接近十分钟,fst过大,超过40M [tn] simplify tn #221
BUGFIX
Metadata
Metadata
Assignees
Labels
No labels