Diamondfan
diff --git a/‎.gitignore
+2 b/‎.gitignore
+2
diff --git a/‎README.md
+25-27 b/‎README.md
+25-27
diff --git a/‎requirements.txt
+1-2 b/‎requirements.txt
+1-2
diff --git a/‎timit/conf/ctc_model_setting.conf renamed to ‎timit/conf/backup.conf b/‎timit/conf/ctc_model_setting.conf renamed to ‎timit/conf/backup.conf
diff --git a/‎timit/conf/ctc_config.yaml
+60 b/‎timit/conf/ctc_config.yaml
+60
diff --git a/‎timit/conf/ctc_model_setting_fbank.conf
-46 b/‎timit/conf/ctc_model_setting_fbank.conf
-46
diff --git a/‎timit/conf/ctc_model_setting_mfcc.conf
-46 b/‎timit/conf/ctc_model_setting_mfcc.conf
-46
diff --git a/‎timit/conf/fbank.conf
+2-1 b/‎timit/conf/fbank.conf
+2-1
diff --git a/‎timit/decode_map_48-39/phones.60-48-39.map renamed to ‎timit/conf/phones.60-48-39.map b/‎timit/decode_map_48-39/phones.60-48-39.map renamed to ‎timit/conf/phones.60-48-39.map
@@ -3,3 +3,5 @@ __pycache__/
 
 my_863_corpus/*
 log/
+checkpoint/
+data/
@@ -1,9 +1,12 @@
-# End-to-End Automatic Speech recogniton
-This is an END-To-END system for speech recognition based on CTC implemented with pytorch.  
+## Update:
+Update to pytorch1.2 and python3.
+
+# CTC-based Automatic Speech Recogniton
+This is a CTC-based speech recognition system with pytorch.
 
 At present, the system only supports phoneme recognition.  
 
-You can also do it at word-level, but you may get a high error rate.
+You can also do it at word-level and may get a high error rate.
 
 Another way is to decode with a lexcion and word-level language model using WFST which is not included in this system.
 
@@ -36,41 +39,35 @@ Chinese Corpus: 863 Corpus
 
 ## Install
 - Install [Pytorch](http://pytorch.org/)
-- Install [warp-ctc](https://github.com/SeanNaren/warp-ctc) and bind it to pytorch.  
-	Notice: If use python2, reinstall the pytorch with source code instead of pip.  
-- Install pytorch audio:
-```bash 
-sudo apt-get install sox libsox-dev libsox-fmt-all
-git clone https://github.com/pytorch/audio.git
-cd audio
-pip install cffi
-python setup.py install
-```
+- ~~Install [warp-ctc](https://github.com/SeanNaren/warp-ctc) and bind it to pytorch.~~  
+    ~~Notice: If use python2, reinstall the pytorch with source code instead of pip.~~
+    Use pytorch1.2 built-in CTC function(nn.CTCLoss) Now.
 - Install [Kaldi](https://github.com/kaldi-asr/kaldi). We use kaldi to extract mfcc and fbank.
-- Install [KenLM](https://github.com/kpu/kenlm). Training n-gram Languange Model if needed.
-- Install other python packages
+- Install pytorch [torchaudio](https://github.com/pytorch/audio.git)(This is needed when using waveform as input).
+- ~~Install [KenLM](https://github.com/kpu/kenlm). Training n-gram Languange Model if needed~~.
+    Use Irstlm in kaldi tools instead.
+- Install and start visdom
 ```
-pip install -r requirements.txt
+pip3 install visdom
+python -m visdom.server
 ```
-- Start visdom
+- Install other python packages
 ```
-python -m visdom.server
+pip install -r requirements.txt
 ```
 
 ## Usage
-1. Install all the things according to the Install part.  
-2. Open the top script run.sh and alter the directory of data and config file.  
-3. Change the $feats if you want to use fbank or mfcc and revise conf file under the directory conf.  
-4. Open the config file to revise the super-parameters about everything  
+1. Install all the packages according to the Install part.  
+2. Revise the top script run.sh.  
+4. Open the config file to revise the super-parameters about everything.  
 5. Run the top script with four conditions
 ```bash
 bash run.sh    data_prepare + AM training + LM training + testing
 bash run.sh 1  AM training + LM training + testing
 bash run.sh 2  LM training + testing
 bash run.sh 3  testing
 ```
-LM training are not implemented yet. They are added to the todo-list.  
-So only when you prepare the data, run.sh will work.
+RNN LM training is not implemented yet. They are added to the todo-list.  
 
 ## Data Prepare
 1. Extract 39dim mfcc and 40dim fbank feature from kaldi. 
@@ -81,17 +78,17 @@ So only when you prepare the data, run.sh will work.
 - RNN + DNN + CTC 
     RNN here can be replaced by nn.LSTM and nn.GRU
 - CNN + RNN + DNN + CTC  
-	CNN is use to reduce the variety of spectrum which can be caused by the speaker and environment difference.
+    CNN is use to reduce the variety of spectrum which can be caused by the speaker and environment difference.
 - How to choose  
-	Use add_cnn to choose one of two models. If add_cnn is True, then CNN+RNN+DNN+CTC will be chosen.
+    Use add_cnn to choose one of two models. If add_cnn is True, then CNN+RNN+DNN+CTC will be chosen.
 
 ## Training:
 - initial-lr = 0.001
 - decay = 0.5
 - wight-decay = 0.005   
 
 Adjust the learning rate if the dev loss is around a specific loss for ten times.  
-Times of adjusting learning rate is 8 which can be alter in steps/ctc_train.py(line367).  
+Times of adjusting learning rate is 8 which can be alter in steps/train_ctc.py(line367).  
 Optimizer is nn.optimizer.Adam with weigth decay 0.005 
 
 ## Decoder
@@ -108,3 +105,4 @@ Phoneme-level language model is inserted to beam search decoder now.
 - Combine with RNN-LM  
 - Beam search with RNN-LM  
 - The code in 863_corpus is a mess. Need arranged.
+
@@ -1,4 +1,3 @@
-h5py
 numpy
 scipy
-librosa
+visdom
@@ -0,0 +1,60 @@
+#exp name and save dir
+exp_name: 'ctc_fbank_cnn'
+checkpoint_dir: 'checkpoint/'
+
+#Data
+vocab_file: 'data/units'
+train_scp_path: 'data/train/fbank.scp'
+train_lab_path: 'data/train/phn_text'
+valid_scp_path: 'data/dev/fbank.scp'
+valid_lab_path: 'data/dev/phn_text'
+left_ctx: 0
+right_ctx: 2
+n_skip_frame: 2
+n_downsample: 2
+num_workers: 1
+shuffle_train: True
+feature_dim: 81
+output_class_dim: 39
+mel: False
+feature_type: "fbank"
+
+#Model
+rnn_input_size: 243
+rnn_hidden_size: 384
+rnn_layers: 4
+rnn_type: "nn.LSTM"
+bidirectional: True
+batch_norm: True
+drop_out: 0.2
+
+#CNN
+add_cnn: True
+layers: 2
+channel: "[(1, 32), (32, 32)]"
+kernel_size: "[(3, 3), (3, 3)]"
+stride: "[(1, 2), (2, 2)]"
+padding: "[(1, 1), (1, 1)]"
+pooling: "None"
+batch_norm: True
+activation_function: "relu"
+
+#[Training]
+use_gpu: True
+init_lr: 0.001
+num_epoches: 500
+end_adjust_acc: 2
+lr_decay: 0.5
+batch_size: 8
+weight_decay: 0.0005
+seed: 1
+verbose_step: 50
+
+#[test]
+test_scp_path: 'data/test/fbank.scp'
+test_lab_path: 'data/test/phn_text'
+decode_type: "Greedy"
+beam_width: 10
+lm_alpha: 0.1
+lm_path: 'data/lm_phone_bg.arpa'
+
@@ -1,3 +1,4 @@
 --window-type=hamming
---num-mel-bins=40
+--num-mel-bins=80
+--use-energy
-Original file line number
+Diff line change
 -h5py
 numpy
 scipy
 -librosa
 +visdom