Skip to content

Conversation

jtrmal
Copy link
Contributor

@jtrmal jtrmal commented Jan 19, 2017

replaces #1282
couple of fixes as reported by Fred.
rebased
I did a new PR instead of rebasing the old one and doing push -f, as other people are probably using the branch and I don't want to mess up their clones in case they do pull.

jtrmal and others added 30 commits January 19, 2017 10:16
added Telugu conf files.
@jtrmal jtrmal mentioned this pull request Jan 19, 2017
Copy link
Contributor

@danpovey danpovey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm reviewing the parts outside of the local egs directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we normally do this like

print STDERR  "str1\n" .
    "str2\n" .
    "str3\n";

(not that it really matters).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two print statements seem redundant given that it says 'finished at date' below and prints the time taken.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrong program name and usage message.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be if (po.NumArgs() != 3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be if (po.NumArgs() != 2)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should indicate the optional weights as:
[<weights-wspecifier>]
here and remove the [] in the "e.g." line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

&& should be ||, and no need for parentheses.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if weight_wspecifier is really optional, then you should do:
if (weight_wspecifier != "")
here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the usage message at line 51, need angle brackets around command line args, and need to indicate the optional weights-rspecifier using
[<weights-rspecifier>]
there should probably be some explanation there of what the weights are.
Actually I am not really happy about it being called "weights-rspecifier" if they actually represent
the minus-log of the weight, because "weights" on Kaldi command lines tend to refer to things
that should be >= 0, i.e. actual weights. I'd prefer to see it called "costs-rspecifier" and the
variable be called a "cost".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is a bit ugly, and I think unnecessary... the weights_reader should be made a RandomAccessDoubleReader, and you can use HasKey(key) and Value(key). ,s,cs can be added on the command line to make it efficient, if they are actually sorted. If they are not sorted (and this might relate to the special things that are done in the keyword-search pipeline), then we'll just take the memory hit. I prefer tools like this to be general purpose.

@jtrmal
Copy link
Contributor Author

jtrmal commented Feb 3, 2017 via email

@jtrmal
Copy link
Contributor Author

jtrmal commented Feb 6, 2017

@danpovey this is open for review again

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a "see also:" line? Like:
"see also: fstproject (from the OpenFst toolkit)"
and the same for similar tools?

@jtrmal
Copy link
Contributor Author

jtrmal commented Feb 8, 2017 via email

@jtrmal
Copy link
Contributor Author

jtrmal commented Feb 8, 2017 via email

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove this? Some recipes use it to download CMUDict.

@jtrmal
Copy link
Contributor Author

jtrmal commented Feb 8, 2017 via email

@danpovey danpovey merged commit dc454cc into kaldi-asr:master Feb 8, 2017
danpovey pushed a commit that referenced this pull request Feb 9, 2017
* [build]: resolving OpenFst compilation issue with  gcc-6.x (#1392)

* [egs] Add new graphemic system for Gale Arabic, with newer nnet scripts (#1298)

* [build] Windows build: generate missing base/version.h; cosmetic changes (#1397)

* [build]: Enable cross compilation, including to android. (#726)

If a user has a number of tool chains installed and they do not want to
use the default, they must currently edit the kaldi.mk file after
running configure to change the CC, CXX, AR, AS, and RANLIB variables.
This is something that should be exposed via the configure script. This
patch exposes an option to set the host triple for the desired tool
chain in the configure script.

Building Kaldi on my Raspberry Pi boards is not particularly fast.  I
have been using the following patch to build kaldi executables for use
on the Pi boards for the better part of a year.  A typical invocation
for me is something like:

$ ./configure --static --atlas-root=/opt/cross/armv8hf \
--fst-root=/opt/cross/armv8hf --host=armv8-rpi3-linux-gnueabihf \
--fst-version=1.4.1

This way I can build on my much faster x86 desktop, but still run
experiments on ARM.

I have included support for cross compiling for ppc64le and it works for
me (at least it produces binaries for ppc64le I don't have a ppc64
machine to test it).

Signed-off-by: Eric B Munson <eric@cobaltspeech.com>

* Add mk file and configure options for building for Android

Building for Android requires a toolchain that can be built using the
Android NDK.  It works similiarly to the linux build except that it only
uses clang, only supports the openBLAS math library, and requires an
additional include directory for the system C++ headers.

A typical configure invocation looks like:

./configure --static --openblas-root=/opt/cross/arm-linux-androideabi \
--fst-root=/opt/cross/arm-linux-androideabi \
--host=arm-linux-androideabi --fst-version=1.4.1 \
--android-includes=/opt/cross/arm-linux-androideabi/sysroot/usr/include

Signed-off-by: Eric B Munson <eric@cobaltspeech.com>

* Make pthread cancel symbols noops for Android

The Android C library does not support cancelling pthreads so the
symbols PTHREAD_CANCEL_STATE and pthread_setcancelstate are undefined.
Because a pthread cannot be cancelled in Android, it is reasonable to
make the pthread_setcancelstate() call a noop.

Signed-off-by: Eric B Munson <eric@cobaltspeech.com>

* [build] fixing issue introduced in the previous win commit (#1399)

* [egs] Fix to HKUST nnet2/3 scripts. (#1401)

when training ubm, we should just use the 40 dimention mfcc
so change the train directory for avoiding dimention mismatching
this script won't get error when run after nnet2's scripts.

* [egs,scripts,src] Add BABEL s5d recipe; various associated fixes (#1356)

* Creating a new recipe directory

* adding lists

* Improvements in the pipeline, fixes, syllab search

* Transplanting the diff to s5d

* added TDNN, LSTM and BLSTM scripts.
added Telugu conf files.

* added blstm script and top level commands

* improved keyword search, new lang  configs

* removing not needed scripts

* added blstm results

* some keyword-search optimization binaries

* removing some extra files + kwsearch pipeline improvement

* adding configs for the OP3 langs

* configs for the rest of the OP3 langs

* Added updated configs for IndusDB.20151208.Babel.tar.bz2

* fixes of the pipeline, added langp (re)estimation

* adding the kaldi-native search pipeline and a bunch of changes related to this

* removing extra files

* A couple of fixes

* KWS improvements and fixes

* Fixes of a couple of issues reported by Fred Richardson <frichard@ll.mit.edu>

* A separate script for lexicon expansion

* A couple of fixes and tweaks. Added checks for tools, especially sox.

* adding a couple of changes -- new style options and results for BP langs

* adding new results(still will need to be updated)

* added langp and some details tweaked

* updated STT results, new KWS results and a couple of small fixes all around

* adding file lists for dev languages

* miniature fixes and cleanups

* one more batch of small fixes -- mostly whitespace cleanup

* small fixes -- location of files and removal of trailing slash inn the pathname

* enabling stage-2 KWS pipeline

* adding some directories to .gitignore

* some quick fixes

* latest fixes

* making the script split_compound_set to conform to the naming

* some last minute fixes for the combination scoring

* do not attempt to score when the scoring data is not available

* bug fixes and --ntrue-from option

* another batch of fixes

* adding +x permission to split_compound_set.sh

* fixing whitespaces

* fixing whitespaces

* a couple of fixes

* adding the cleanup script and chain models training

* adding the graphemic/unicode lexicon feature

* adding the graphemic/unicode lexicon feature

* fixing the the cc files headers, adding c info

* use the user-provided kwset id, not the filename

* use _cleaned affix

* fixes w.r.t. getting chain models independent on other systems

* small fixes as reported by Fred Richardson and Yenda

* another issue reported by Fred Richarson

* fixing KWS for the chain systems

* fixes in the KWS hitlist combination

* adding 40hrs pashto config and fixes for the unicode system

* fixing some bugs as reported by Ni Chongjia (I2R)

* fixing some bugs as reported by Fred Richardson

* adding 40hrs Pashto OP3 setup

* addressing Dan's comments, some further cleanup

* improving the make_index script

* remove  fsts-scale

* adding 'see also' to some of the fst tools

* adding back accidentaly removed svn check

* [egs] removing empty files in BABEL recipe (#1406)

These caused a problem on MacOS, as reported by @dogancan.

* Add online extension to travis build.

* Fix parallel online extension build. Randomly choose between single and double precision BaseFloats in travis build.

* Remove parantheses that were unintentinally added to the travis script in the previous commit.
@jtrmal jtrmal deleted the babel-s5d-to-pr-2 branch February 22, 2017 21:42
david-ryan-snyder pushed a commit to david-ryan-snyder/kaldi that referenced this pull request Apr 12, 2017
…#1407)

* [build]: resolving OpenFst compilation issue with  gcc-6.x (kaldi-asr#1392)

* [egs] Add new graphemic system for Gale Arabic, with newer nnet scripts (kaldi-asr#1298)

* [build] Windows build: generate missing base/version.h; cosmetic changes (kaldi-asr#1397)

* [build]: Enable cross compilation, including to android. (kaldi-asr#726)

If a user has a number of tool chains installed and they do not want to
use the default, they must currently edit the kaldi.mk file after
running configure to change the CC, CXX, AR, AS, and RANLIB variables.
This is something that should be exposed via the configure script. This
patch exposes an option to set the host triple for the desired tool
chain in the configure script.

Building Kaldi on my Raspberry Pi boards is not particularly fast.  I
have been using the following patch to build kaldi executables for use
on the Pi boards for the better part of a year.  A typical invocation
for me is something like:

$ ./configure --static --atlas-root=/opt/cross/armv8hf \
--fst-root=/opt/cross/armv8hf --host=armv8-rpi3-linux-gnueabihf \
--fst-version=1.4.1

This way I can build on my much faster x86 desktop, but still run
experiments on ARM.

I have included support for cross compiling for ppc64le and it works for
me (at least it produces binaries for ppc64le I don't have a ppc64
machine to test it).

Signed-off-by: Eric B Munson <eric@cobaltspeech.com>

* Add mk file and configure options for building for Android

Building for Android requires a toolchain that can be built using the
Android NDK.  It works similiarly to the linux build except that it only
uses clang, only supports the openBLAS math library, and requires an
additional include directory for the system C++ headers.

A typical configure invocation looks like:

./configure --static --openblas-root=/opt/cross/arm-linux-androideabi \
--fst-root=/opt/cross/arm-linux-androideabi \
--host=arm-linux-androideabi --fst-version=1.4.1 \
--android-includes=/opt/cross/arm-linux-androideabi/sysroot/usr/include

Signed-off-by: Eric B Munson <eric@cobaltspeech.com>

* Make pthread cancel symbols noops for Android

The Android C library does not support cancelling pthreads so the
symbols PTHREAD_CANCEL_STATE and pthread_setcancelstate are undefined.
Because a pthread cannot be cancelled in Android, it is reasonable to
make the pthread_setcancelstate() call a noop.

Signed-off-by: Eric B Munson <eric@cobaltspeech.com>

* [build] fixing issue introduced in the previous win commit (kaldi-asr#1399)

* [egs] Fix to HKUST nnet2/3 scripts. (kaldi-asr#1401)

when training ubm, we should just use the 40 dimention mfcc
so change the train directory for avoiding dimention mismatching
this script won't get error when run after nnet2's scripts.

* [egs,scripts,src] Add BABEL s5d recipe; various associated fixes (kaldi-asr#1356)

* Creating a new recipe directory

* adding lists

* Improvements in the pipeline, fixes, syllab search

* Transplanting the diff to s5d

* added TDNN, LSTM and BLSTM scripts.
added Telugu conf files.

* added blstm script and top level commands

* improved keyword search, new lang  configs

* removing not needed scripts

* added blstm results

* some keyword-search optimization binaries

* removing some extra files + kwsearch pipeline improvement

* adding configs for the OP3 langs

* configs for the rest of the OP3 langs

* Added updated configs for IndusDB.20151208.Babel.tar.bz2

* fixes of the pipeline, added langp (re)estimation

* adding the kaldi-native search pipeline and a bunch of changes related to this

* removing extra files

* A couple of fixes

* KWS improvements and fixes

* Fixes of a couple of issues reported by Fred Richardson <frichard@ll.mit.edu>

* A separate script for lexicon expansion

* A couple of fixes and tweaks. Added checks for tools, especially sox.

* adding a couple of changes -- new style options and results for BP langs

* adding new results(still will need to be updated)

* added langp and some details tweaked

* updated STT results, new KWS results and a couple of small fixes all around

* adding file lists for dev languages

* miniature fixes and cleanups

* one more batch of small fixes -- mostly whitespace cleanup

* small fixes -- location of files and removal of trailing slash inn the pathname

* enabling stage-2 KWS pipeline

* adding some directories to .gitignore

* some quick fixes

* latest fixes

* making the script split_compound_set to conform to the naming

* some last minute fixes for the combination scoring

* do not attempt to score when the scoring data is not available

* bug fixes and --ntrue-from option

* another batch of fixes

* adding +x permission to split_compound_set.sh

* fixing whitespaces

* fixing whitespaces

* a couple of fixes

* adding the cleanup script and chain models training

* adding the graphemic/unicode lexicon feature

* adding the graphemic/unicode lexicon feature

* fixing the the cc files headers, adding c info

* use the user-provided kwset id, not the filename

* use _cleaned affix

* fixes w.r.t. getting chain models independent on other systems

* small fixes as reported by Fred Richardson and Yenda

* another issue reported by Fred Richarson

* fixing KWS for the chain systems

* fixes in the KWS hitlist combination

* adding 40hrs pashto config and fixes for the unicode system

* fixing some bugs as reported by Ni Chongjia (I2R)

* fixing some bugs as reported by Fred Richardson

* adding 40hrs Pashto OP3 setup

* addressing Dan's comments, some further cleanup

* improving the make_index script

* remove  fsts-scale

* adding 'see also' to some of the fst tools

* adding back accidentaly removed svn check

* [egs] removing empty files in BABEL recipe (kaldi-asr#1406)

These caused a problem on MacOS, as reported by @dogancan.

* Add online extension to travis build.

* Fix parallel online extension build. Randomly choose between single and double precision BaseFloats in travis build.

* Remove parantheses that were unintentinally added to the travis script in the previous commit.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants