Model described in Kriman et al., 2019 (QuartzNet: Deep Automatic Speech Recognition with 1D Time-Channel Separable Convolutions).
Data can be downloaded here: https://commonvoice.mozilla.org/en/datasets
More files for running here (sorted indexes, preprecessed tsv, model weights): https://yadi.sk/d/tT-N6DRHkB5XTw?w=1