Skip to content

Latest commit

 

History

History
43 lines (23 loc) · 1.34 KB

File metadata and controls

43 lines (23 loc) · 1.34 KB

#Read/write kaldi features

##Raw Feature location

Most kaldi features are stored in mfcc folder.

###The ark file

The ark file is the raw feature file are with extension '.ark'. The size of ark is normally in few hundred MBs. Eg: 20 dimensional MFCC features matrix is stored in the ark file like following:

UtteranceID1 [d1 d2 d3 d4 d5 .. d20\n d1 d2 d3 d4 d5 .. d20\n d1 d2 d3 d4 d5 .. d20\n ...]\n UtteranceID2 [d1 d2 d3 d4 d5 .. d20\n d1 d2 d3 d4 d5 .. d20\n ]\n

Where \n means new line.

To view the feature, type the following command in the terminal

copy-feats ark:./abc.ark ark,t:

This command means copy the feature form input source (ark:./abc.ark) to output target (ark,t:) in this we leave it empty so the feature will print to the terminal. Following two commands will dumpy the features to text file

copy-feats ark:./abc.ark ark,t: > a.txt copy-feats ark:./abc.ark ark,t:a.txt

###The scp file

It is often saw a scp file with the same file name which describes the content of the ark file. The scp is only a text file,format for scp is in the utteranceid and feature location pairs per line.

for example abc.ark abc.scp

Following two commands will give same results

copy-feats scp:./abc.scp ark,t: copy-feats ark:./abc.ark ark,t:

##Features in the data folder

feats.scp and vad.scp are two feature descriptors in the Kaldi data folder.