Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
e45cfec
Creating a new recipe directory
jtrmal Nov 9, 2015
cd43ce0
adding lists
jtrmal Nov 9, 2015
7ba714a
Improvements in the pipeline, fixes, syllab search
jtrmal Nov 27, 2015
b720e41
Transplanting the diff to s5d
jtrmal Nov 27, 2015
2dfe1ca
added TDNN, LSTM and BLSTM scripts.
vijayaditya Dec 6, 2015
2faaa8a
added blstm script and top level commands
vijayaditya Dec 7, 2015
fe658b6
improved keyword search, new lang configs
jtrmal Dec 8, 2015
b850d91
removing not needed scripts
jtrmal Dec 8, 2015
df4409f
added blstm results
vijayaditya Dec 8, 2015
75b86b6
some keyword-search optimization binaries
jtrmal Dec 8, 2015
6c79ba0
removing some extra files + kwsearch pipeline improvement
jtrmal Dec 9, 2015
a9e2129
adding configs for the OP3 langs
jtrmal Dec 10, 2015
a3b33e3
configs for the rest of the OP3 langs
jtrmal Dec 10, 2015
6ba17f0
Added updated configs for IndusDB.20151208.Babel.tar.bz2
jtrmal Dec 11, 2015
1eebb25
fixes of the pipeline, added langp (re)estimation
jtrmal Dec 11, 2015
e107d09
adding the kaldi-native search pipeline and a bunch of changes relate…
jtrmal Feb 1, 2016
2b43dcc
removing extra files
jtrmal Feb 1, 2016
7e296b0
A couple of fixes
jtrmal Feb 2, 2016
a6737c0
KWS improvements and fixes
jtrmal Feb 15, 2016
d781a00
Fixes of a couple of issues reported by Fred Richardson <frichard@ll.…
jtrmal Feb 15, 2016
b4ac85e
A separate script for lexicon expansion
jtrmal Feb 15, 2016
96abe7f
A couple of fixes and tweaks. Added checks for tools, especially sox.
jtrmal Feb 18, 2016
1a39d66
adding a couple of changes -- new style options and results for BP langs
jtrmal Feb 18, 2016
9a2adec
adding new results(still will need to be updated)
jtrmal Mar 4, 2016
583541d
added langp and some details tweaked
jtrmal Mar 4, 2016
16c7631
updated STT results, new KWS results and a couple of small fixes all …
jtrmal Mar 31, 2016
a531f88
adding file lists for dev languages
jtrmal Aug 15, 2016
4776f23
miniature fixes and cleanups
jtrmal Aug 15, 2016
fea979c
one more batch of small fixes -- mostly whitespace cleanup
jtrmal Aug 15, 2016
e8ba6f7
small fixes -- location of files and removal of trailing slash inn th…
jtrmal Aug 15, 2016
2d8e473
enabling stage-2 KWS pipeline
jtrmal Aug 15, 2016
121006b
adding some directories to .gitignore
jtrmal Aug 15, 2016
f672ff7
some quick fixes
jtrmal Aug 15, 2016
823e0d3
latest fixes
jtrmal Aug 18, 2016
57d6226
making the script split_compound_set to conform to the naming
jtrmal Aug 18, 2016
9f84a71
some last minute fixes for the combination scoring
jtrmal Aug 18, 2016
4d7c31f
do not attempt to score when the scoring data is not available
jtrmal Aug 18, 2016
da6e2d4
bug fixes and --ntrue-from option
jtrmal Aug 19, 2016
9135a70
another batch of fixes
jtrmal Aug 19, 2016
4fa31f9
adding +x permission to split_compound_set.sh
jtrmal Aug 19, 2016
137ccab
fixing whitespaces
jtrmal Oct 13, 2016
c1922d0
fixing whitespaces
jtrmal Oct 13, 2016
21010bf
a couple of fixes
jtrmal Dec 8, 2016
35c596a
adding the cleanup script and chain models training
jtrmal Dec 8, 2016
379a164
adding the graphemic/unicode lexicon feature
jtrmal Dec 8, 2016
971975f
adding the graphemic/unicode lexicon feature
jtrmal Dec 8, 2016
ef10503
fixing the the cc files headers, adding c info
jtrmal Dec 8, 2016
f1f4e25
use the user-provided kwset id, not the filename
jtrmal Dec 12, 2016
7005a88
use _cleaned affix
jtrmal Dec 15, 2016
25660de
fixes w.r.t. getting chain models independent on other systems
jtrmal Jan 5, 2017
8ce5320
small fixes as reported by Fred Richardson and Yenda
jtrmal Jan 19, 2017
4153dac
another issue reported by Fred Richarson
jtrmal Jan 19, 2017
307578d
fixing KWS for the chain systems
jtrmal Jan 24, 2017
16d5859
fixes in the KWS hitlist combination
jtrmal Jan 26, 2017
731db0a
adding 40hrs pashto config and fixes for the unicode system
jtrmal Jan 31, 2017
0edc266
fixing some bugs as reported by Ni Chongjia (I2R)
jtrmal Jan 31, 2017
e64d544
fixing some bugs as reported by Fred Richardson
jtrmal Jan 31, 2017
04af0e7
adding 40hrs Pashto OP3 setup
jtrmal Jan 31, 2017
2214571
addressing Dan's comments, some further cleanup
jtrmal Feb 3, 2017
420eb60
improving the make_index script
jtrmal Feb 6, 2017
5f686d1
remove fsts-scale
jtrmal Feb 8, 2017
9bad2a6
adding 'see also' to some of the fst tools
jtrmal Feb 8, 2017
bb395ad
adding back accidentaly removed svn check
jtrmal Feb 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,9 @@ GRTAGS
GPATH
GSYMS

# python compiled sources
*.pyc

# Make dependencies
.depend.mk

Expand Down Expand Up @@ -111,5 +114,13 @@ GSYMS
/tools/pthreads*.zip
/tools/sequitur
/tools/srilm.tgz
/tools/liblbfgs-1.10.tar.gz
/tools/liblbfgs-1.10/
/tools/openfst-1.5.0.tar.gz
/tools/openfst-1.5.0/
/tools/srilm-1.7.2-beta.tar.gz
/tools/liblbfgs/
/tools/sequitur-g2p/

/kaldiwin_vs*

18 changes: 9 additions & 9 deletions egs/babel/s5/local/make_pitch.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/bin/bash
#!/bin/bash

# Copyright 2012-2013 Johns Hopkins University (Author: Daniel Povey)
# Bagher BabaAli
Expand Down Expand Up @@ -50,7 +50,7 @@ mkdir -p $expdir/log || exit 1;

scp=$data/wav.scp

[ ! -s $KALDI_ROOT ] && KALDI_ROOT=../../..
[ ! -s $KALDI_ROOT ] && KALDI_ROOT=../../..

( # this is for back compatiblity:
cd $KALDI_ROOT/tools
Expand Down Expand Up @@ -92,7 +92,7 @@ done
basename=`basename $data`
wavdir=$pitchdir/temp_wav_$basename
mkdir -p $wavdir

if [ -f $data/segments ] || grep '|' $data/wav.scp >/dev/null; then
wav_scp=$expdir/wav.scp
cat $data/segments | awk -v dir=$wavdir '{key=$1; printf("%s %s/%s.wav\n", key, dir, key);}' \
Expand All @@ -104,7 +104,7 @@ if [ -f $data/segments ] || grep '|' $data/wav.scp >/dev/null; then
else
# create a fake segments file that takes the whole file; this is an easy way
# to copy to static wav files. Note: probably this has not been tested.
cat $data/wav.scp | awk '{print $1, $1, 0.0, -1.0}' > $expdir/fake_segments
cat $data/wav.scp | awk '{print $1, $1, 0.0, -1.0}' > $expdir/fake_segments
segments=$expdir/fake_segments
fi
if [ $stage -le 0 ]; then
Expand Down Expand Up @@ -155,11 +155,11 @@ if [ $stage -le 1 ]; then
fi

# I don't want to put a separate script in svn just for this, so creating a temporary
# script file in the experimental directory. Quotes around 'EOF' disable any
# script file in the experimental directory. Quotes around 'EOF' disable any
# interpretation in the here-doc.
cat <<'EOF' > $expdir/convert.sh
#!/bin/bash
sacc_flist=$1
sacc_flist=$1
scpfile=$2
[ $# -ne 2 ] && echo "Usage: convert.sh <sacc-flist-in> <scpfile-out>" && exit 1;

Expand Down Expand Up @@ -247,7 +247,7 @@ exit 0;
# rm $expdir/.error 2>/dev/null

# # for ((n=1; n<=nj; n++)); do
# # mkdir -p "$expdir/$n"
# # mkdir -p "$expdir/$n"
# # done

# # $cmd JOB=1:$nj $expdir/make_pitch.JOB.log \
Expand Down Expand Up @@ -297,8 +297,8 @@ exit 0;

# rm $expdir/wav.*.scp $expdir/segments.* 2>/dev/null

# nf=`cat $data/pitchs.scp | wc -l`
# nu=`cat $data/utt2spk | wc -l`
# nf=`cat $data/pitchs.scp | wc -l`
# nu=`cat $data/utt2spk | wc -l`
# if [ $nf -ne $nu ]; then
# echo "It seems not all of the feature files were successfully ($nf != $nu);"
# echo "consider using utils/fix_data_dir.sh $data"
Expand Down
18 changes: 9 additions & 9 deletions egs/babel/s5c/local/CHECKPOINT.sh
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
#!/bin/bash

function GETAPPROVAL {
until false ; do
until false ; do
echo "Do you want to run the command (y/n)?"
read -n 1 WISH
if [ "$WISH" == "y" ]; then

if [ "$WISH" == "y" ]; then
return true;
elif [ "$WISH" == "n" ]; then
return false;
Expand All @@ -21,11 +21,11 @@ function ESCAPE_PARAMS {

if [[ "$v" == *"<"* ]]; then
out="$out \"$v\""
elif [[ "$v" == *">"* ]] ; then
elif [[ "$v" == *">"* ]] ; then
out="$out \"$v\""
elif [[ "$v" == *"|"* ]] ; then
elif [[ "$v" == *"|"* ]] ; then
out="$out \'$v\'"
elif [[ "$v" == *" "* ]] ; then
elif [[ "$v" == *" "* ]] ; then
out="$out \"$v\""
else
out="$out $v"
Expand Down Expand Up @@ -76,7 +76,7 @@ function CHECKPOINT {

if [ !$INTERACTIVE_CHECKPOINT ] ; then
eval `ESCAPE_PARAMS "$@"`
else
else
APPROVAL=GETAPPROVAL
if $APPROVAL ; then
eval `ESCAPE_PARAMS $@`
Expand All @@ -87,7 +87,7 @@ function CHECKPOINT {
echo -e ${COLOR_RED}"CHECKPOINT FAILURE: The command returned non-zero status" >&2
echo -e " rerun the script with the parameter -c $LAST_GOOD_NAME=$COUNTER" >&2
echo -e "COMMAND">&2
echo -e " " "$@" ${COLOR_RED} >&2
echo -e " " "$@" ${COLOR_RED} >&2

exit 1
fi
Expand All @@ -97,7 +97,7 @@ function CHECKPOINT {
echo -e "$@"${COLOR_DEFAULT} >&2
fi

COUNTER=$(( $COUNTER + 1 ))
COUNTER=$(( $COUNTER + 1 ))
eval export $COUNTER_NAME=$COUNTER
}

Expand Down
4 changes: 2 additions & 2 deletions egs/babel/s5c/local/ali_to_rttm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ if [ $# != 3 ]; then
exit 1;
fi

set -e
set -e
set -o pipefail
set -u

Expand All @@ -65,7 +65,7 @@ fi
$cmd $dir/log/align_to_words.log \
ali-to-phones $dir/final.mdl "ark:gunzip -c $dir/ali.*.gz|" ark,t:- \| \
phones-to-prons $lang/L_align.fst $wbegin $wend ark:- "ark,s:utils/sym2int.pl -f 2- --map-oov '$oov' $lang/words.txt <$data/text|" ark,t:- \| \
prons-to-wordali ark:- "ark:ali-to-phones --write-lengths=true $dir/final.mdl 'ark:gunzip -c $dir/ali.*.gz|' ark,t:- |" ark,t:$dir/align.txt
prons-to-wordali ark:- "ark:ali-to-phones --write-lengths=true $dir/final.mdl 'ark:gunzip -c $dir/ali.*.gz|' ark,t:- |" ark,t:$dir/align.txt

echo "$0: done writing alignments."

Expand Down
4 changes: 2 additions & 2 deletions egs/babel/s5c/local/annotated_kwlist_to_KWs.pl
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
Allowed options:
EOU

GetOptions();
GetOptions();

@ARGV >= 2 || die $Usage;

Expand Down Expand Up @@ -77,7 +77,7 @@
if ($count == 0) {
$output .= "$value";
$count ++; next;
}
}
if ($count == 6) {
$output .= ", ...";
last;
Expand Down
12 changes: 6 additions & 6 deletions egs/babel/s5c/local/apply_g2p.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
# Copyright 2014 Johns Hopkins University (Author: Yenda Trmal)
# Apache 2.0

# Begin configuration section.
# Begin configuration section.
iters=5
stage=0
encoding='utf-8'
Expand Down Expand Up @@ -82,15 +82,15 @@ cat $output/output.* > $output/output
#Remap the words from output file back to the original casing
#Conversion of some of thems might have failed, so we have to be careful
#and use the transform_map file we generated beforehand
#Also, because the sequitur output is not readily usable as lexicon (it adds
#Also, because the sequitur output is not readily usable as lexicon (it adds
#one more column with ordering of the pron. variants) convert it into the proper lexicon form
output_lex=$output/lexicon.lex
if [ ! -z $icu_transform ] ; then
#also, the transform is generally N -> 1, i.e. we have to take
#extra care of words that might have been mapped into the same one
perl -e 'open(WORDS, $ARGV[0]) or die "Could not open file $ARGV[0]";
while(<WORDS>) { chomp; @F=split;
if ($MAP{$F[0]} ) { push @{$MAP{$F[0]}}, $F[1]; }
perl -e 'open(WORDS, $ARGV[0]) or die "Could not open file $ARGV[0]";
while(<WORDS>) { chomp; @F=split;
if ($MAP{$F[0]} ) { push @{$MAP{$F[0]}}, $F[1]; }
else { $MAP{$F[0]} = [$F[1]]; }
}
close(WORDS);
Expand All @@ -101,7 +101,7 @@ if [ ! -z $icu_transform ] ; then
next;
}
foreach $word (@{$MAP{$F[0]}} ) {
print "$word\t$F[2]\t$F[3]\n";
print "$word\t$F[2]\t$F[3]\n";
}
}
close(LEX);
Expand Down
12 changes: 6 additions & 6 deletions egs/babel/s5c/local/apply_map_tab_preserving.pl
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@
# this version preserves tabs.

if (@ARGV > 0 && $ARGV[0] eq "-f") {
shift @ARGV;
$field_spec = shift @ARGV;
shift @ARGV;
$field_spec = shift @ARGV;
if ($field_spec =~ m/^\d+$/) {
$field_begin = $field_spec - 1; $field_end = $field_spec - 1;
}
Expand All @@ -26,7 +26,7 @@
}
}
if (!defined $field_begin && !defined $field_end) {
die "Bad argument to -f option: $field_spec";
die "Bad argument to -f option: $field_spec";
}
}

Expand Down Expand Up @@ -70,20 +70,20 @@
$field_offset = 0;
for ($n = 0; $n < @A; $n++) {
@B = split(" ", $A[$n]);

for ($x = 0; $x < @B; $x++) {
$y = $x + $field_offset;
if ( (!defined $field_begin || $y >= $field_begin)
&& (!defined $field_end || $y <= $field_end)) {
$b = $B[$x];
if (!defined $map{$b}) {
if (!$permissive) {
die "apply_map.pl: undefined key $a\n";
die "apply_map.pl: undefined key $a\n";
} else {
print STDERR "apply_map.pl: warning! missing key $a\n";
}
} else {
$B[$x] = $map{$b};
$B[$x] = $map{$b};
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion egs/babel/s5c/local/augment_original_stm.pl
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
#As a result, the scoring will be done on per-speaker basis as well
#As the segment from segment mapping generally do not correspond to
#the segmentation of the original STM file, it combines the files
#segments and utt2spk to work out the correct speaker ID for
#segments and utt2spk to work out the correct speaker ID for
#the reference segment
#In case of overlay, it will either use the previous speaker or
#prints out an error message
Expand Down
18 changes: 9 additions & 9 deletions egs/babel/s5c/local/best_path_weights.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,19 @@
# limitations under the License.


# This script combines frame-level posteriors from different decode
# directories. The first decode directory is assumed to be the primary
# This script combines frame-level posteriors from different decode
# directories. The first decode directory is assumed to be the primary
# and is used to get the best path. The posteriors from other decode
# directories are interpolated with the posteriors of the best path.
# The output is a new directory with final.mdl, tree from the primary
# decode-dir and the best path alignments and weights in a decode-directory
# directories are interpolated with the posteriors of the best path.
# The output is a new directory with final.mdl, tree from the primary
# decode-dir and the best path alignments and weights in a decode-directory
# with the same basename as the primary directory.
# This is typically used to get better posteriors for semisupervised training
# of DNN
# e.g. local/combine_posteriors.sh exp/tri6_nnet/decode_train_unt.seg
# e.g. local/combine_posteriors.sh exp/tri6_nnet/decode_train_unt.seg
# exp/sgmm_mmi_b0.1/decode_fmllr_train_unt.seg_it4 exp/combine_dnn_sgmm
# Here the final.mdl and tree are copied from exp/tri6_nnet to
# exp/combine_dnn_sgmm. best_path_ali.*.gz obtained from the primary dir and
# Here the final.mdl and tree are copied from exp/tri6_nnet to
# exp/combine_dnn_sgmm. best_path_ali.*.gz obtained from the primary dir and
# the interpolated posteriors in weights.*.gz are placed in
# exp/combine_dnn_sgmm/decode_train_unt.seg

Expand Down Expand Up @@ -115,7 +115,7 @@ for i in `seq 0 $[num_sys-1]`; do
echo $nj > $out_decode/num_jobs
else
if [ $nj != `cat $decode_dir/num_jobs` ]; then
echo "$0: number of decoding jobs mismatches, $nj versus `cat $decode_dir/num_jobs`"
echo "$0: number of decoding jobs mismatches, $nj versus `cat $decode_dir/num_jobs`"
exit 1;
fi
fi
Expand Down
2 changes: 1 addition & 1 deletion egs/babel/s5c/local/check_models.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
check_model () {
model=$1
if [ -s $model ]; then echo $model
else
else
dir=`dirname $model`
latest_model=`ls -lt $dir/{?,??}.mdl 2>/dev/null | head -1 | awk '{print $9}'`
echo "*$model is not there, latest is: $latest_model"
Expand Down
4 changes: 2 additions & 2 deletions egs/babel/s5c/local/check_wers.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

check_wer () {
dir=$1
if [ -d $dir ]; then
if [ -d $dir ]; then
seen_dir=false
for ddir in $dir/decode*; do
if [ -d $ddir ]; then
Expand Down Expand Up @@ -34,7 +34,7 @@ for n in `seq 10`; do
fi
done

if [ $# != 0 ]; then
if [ $# != 0 ]; then
echo "Usage: local/check_wers.sh [--final] [--char]"
exit 1;
fi
Expand Down
28 changes: 14 additions & 14 deletions egs/babel/s5c/local/cmu_uem2kaldi_dir.sh
Original file line number Diff line number Diff line change
Expand Up @@ -30,12 +30,12 @@ mkdir -p $datadir

echo "Converting `basename $database` to kaldi directory $datadir "
cat $database | perl -pe 's:.+(BABEL):BABEL:; s:\}\s+\{FROM\s+: :; s:\}\s+\{TO\s+: :; s:\}.+::;' | \
perl -ne '@K = split;
$utteranceID = @K[0];
$utteranceID =~ s:[^_]+_[^_]+_[^_]+_::;
$utteranceID =~ s:([^_]+)_(.+)_(inLine|scripted):${1}_A_${2}:;
$utteranceID =~ s:([^_]+)_(.+)_outLine:${1}_B_${2}:;
$utteranceID .= sprintf ("_%06i", (100*@K[2]));
perl -ne '@K = split;
$utteranceID = @K[0];
$utteranceID =~ s:[^_]+_[^_]+_[^_]+_::;
$utteranceID =~ s:([^_]+)_(.+)_(inLine|scripted):${1}_A_${2}:;
$utteranceID =~ s:([^_]+)_(.+)_outLine:${1}_B_${2}:;
$utteranceID .= sprintf ("_%06i", (100*@K[2]));
printf("%s %s %.2f %.2f\n", $utteranceID, @K[0], @K[1], @K[2]);' | sort > $datadir/segments

if [ ! -z $filelist ] ; then
Expand Down Expand Up @@ -66,12 +66,12 @@ perl -ne '{chomp; @K=split; $utt{@K[1]}.=" @K[0]";}
# 4. Create the wav.scp file:
sph2pipe=`which sph2pipe || which $KALDI_ROOT/tools/sph2pipe_v2.5/sph2pipe`
if [ $? -ne 0 ] ; then
echo "Could not find sph2pipe binary. Add it to PATH"
echo "Could not find sph2pipe binary. Add it to PATH"
exit 1;
fi
sox=`which sox`
if [ $? -ne 0 ] ; then
echo "Could not find sox binary. Add it to PATH"
echo "Could not find sox binary. Add it to PATH"
exit 1;
fi

Expand All @@ -84,19 +84,19 @@ echo "Creating the $datadir/wav.scp file"
elif [ -f $audiopath/audio/$file.wav ] ; then
echo "$file $sox $audiopath/audio/$file.wav -r 8000 -c 1 -b 16 -t wav - downsample |"
else
echo "Audio file $audiopath/audio/$file.sph does not exist!" >&2
echo "Audio file $audiopath/audio/$file.sph does not exist!" >&2
exit 1
fi
done | sort -u > $datadir/wav.scp
if [ $? -ne 0 ] ; then
echo "Error producing the wav.scp file"
done | sort -u > $datadir/wav.scp
if [ $? -ne 0 ] ; then
echo "Error producing the wav.scp file"
exit 1
fi
) || exit 1
) || exit 1

l1=`wc -l $datadir/wav.scp | cut -f 1 -d ' ' `
echo "wav.scp contains $l1 files"
if [ ! -z $filelist ] ; then
if [ ! -z $filelist ] ; then
l2=`wc -l $filelist | cut -f 1 -d ' '`
echo "filelist `basename $filelist` contains $l2 files"

Expand Down
Loading