-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Babel s5d recipe #1356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Babel s5d recipe #1356
Conversation
added Telugu conf files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm reviewing the parts outside of the local egs directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we normally do this like
print STDERR "str1\n" .
"str2\n" .
"str3\n";
(not that it really matters).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two print statements seem redundant given that it says 'finished at date
' below and prints the time taken.
src/fstbin/fsts-difference.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrong program name and usage message.
src/fstbin/fsts-difference.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be if (po.NumArgs() != 3)
src/fstbin/fsts-project.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be if (po.NumArgs() != 2)
src/kwsbin/print-proxy-keywords.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should indicate the optional weights as:
[<weights-wspecifier>]
here and remove the [] in the "e.g." line.
src/kwsbin/print-proxy-keywords.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
&& should be ||, and no need for parentheses.
src/kwsbin/print-proxy-keywords.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if weight_wspecifier is really optional, then you should do:
if (weight_wspecifier != "")
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the usage message at line 51, need angle brackets around command line args, and need to indicate the optional weights-rspecifier using
[<weights-rspecifier>]
there should probably be some explanation there of what the weights are.
Actually I am not really happy about it being called "weights-rspecifier" if they actually represent
the minus-log of the weight, because "weights" on Kaldi command lines tend to refer to things
that should be >= 0, i.e. actual weights. I'd prefer to see it called "costs-rspecifier" and the
variable be called a "cost".
src/kwsbin/transcripts-to-fsts.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is a bit ugly, and I think unnecessary... the weights_reader should be made a RandomAccessDoubleReader, and you can use HasKey(key) and Value(key). ,s,cs can be added on the command line to make it efficient, if they are actually sorted. If they are not sorted (and this might relate to the special things that are done in the keyword-search pipeline), then we'll just take the memory hit. I prefer tools like this to be general purpose.
Thanks, I'll address those.
Y.
…On Feb 2, 2017 8:58 PM, "Daniel Povey" ***@***.***> wrote:
***@***.**** commented on this pull request.
I'm reviewing the parts outside of the local egs directory.
------------------------------
In egs/wsj/s5/utils/make_lexicon_fst.pl
<#1356 (review)>:
> $pron_probs = 1;
shift @ARGV;
}
if ***@***.*** != 1 && @ARGV != 3 && @ARGV != 4) {
- print STDERR
- "Usage: make_lexicon_fst.pl [--pron-probs] lexicon.txt [silprob silphone [sil_disambig_sym]] >lexiconfst.txt
-Creates a lexicon FST that transduces phones to words, and may allow optional silence.
-Note: ordinarily, each line of lexicon.txt is: word phone1 phone2 ... phoneN; if the --pron-probs option is
-used, each line is: word pronunciation-probability phone1 phone2 ... phoneN. The probability 'prob' will
-typically be between zero and one, and note that it's generally helpful to normalize so the largest one
-for each word is 1.0, but this is your responsibility. The silence disambiguation symbol, e.g. something
-like #5, is used only when creating a lexicon with disambiguation symbols, e.g. L_disambig.fst, and was
-introduced to fix a particular case of non-determinism of decoding graphs.\n";
+ print STDERR "Usage: make_lexicon_fst.pl [--pron-probs] lexicon.txt [silprob silphone [sil_disambig_sym]] >lexiconfst.txt\n\n";
we normally do this like
print STDERR "str1\n" .
"str2\n" .
"str3\n";
(not that it really matters).
------------------------------
In egs/wsj/s5/utils/slurm.pl
<#1356 (review)>:
> print Q "time2=\`date +\"%s\"\`\n";
+print Q "echo '#' Accounting: begin_time=\$time1 >>$logfile\n";
These two print statements seem redundant given that it says 'finished at
date' below and prints the time taken.
------------------------------
In src/fstbin/fsts-difference.cc
<#1356 (review)>:
> +
+
+int main(int argc, char **argv) {
+ try {
+ using namespace kaldi;
+ using namespace fst;
+ typedef kaldi::int32 int32;
+ typedef kaldi::uint64 uint64;
+
+ const char *usage =
+ "Reads a table of FSTs; for each element, performs the fst subtract\n"
+ "operation. This operation computes the difference between two FSAs.\n"
+ "Only strings that are in the first automaton but not in second are\n"
+ "retained in the result.\n"
+ "\n"
+ "Usage: fsts-subtract [options] <fsts-rspecifier> "
wrong program name and usage message.
------------------------------
In src/fstbin/fsts-difference.cc
<#1356 (review)>:
> + "\n"
+ "Usage: fsts-subtract [options] <fsts-rspecifier> "
+ "<fsts-rspecifier> "
+ "<fsts-wspecifier>\n"
+ " e.g.: fsts-subtract ark:A.fsts ark:B.fsts ark,t:C.fsts\n";
+
+ ParseOptions po(usage);
+
+ bool project_output = false;
+
+ po.Register("project-output", &project_output,
+ "If true, project output vs input");
+
+ po.Read(argc, argv);
+
+ if (po.NumArgs() < 3 || po.NumArgs() > 4) {
this should be if (po.NumArgs() != 3)
------------------------------
In src/fstbin/fsts-project.cc
<#1356 (review)>:
> + "operation either on input (default) or on the output (if the option\n"
+ "--project-output is true).\n"
+ "\n"
+ "Usage: fsts-project [options] <fsts-rspecifier> <fsts-wspecifier>\n"
+ " e.g.: fsts-project ark:train.fsts ark,t:train.fsts\n";
+
+ ParseOptions po(usage);
+
+ bool project_output = false;
+
+ po.Register("project-output", &project_output,
+ "If true, project output vs input");
+
+ po.Read(argc, argv);
+
+ if (po.NumArgs() < 2 || po.NumArgs() > 3) {
should be if (po.NumArgs() != 2)
------------------------------
In src/fstbin/fsts-scale.cc
<#1356 (review)>:
> +
+int main(int argc, char *argv[]) {
+ try {
+ using namespace kaldi;
+ using namespace fst;
+ using kaldi::int32;
+
+ double alpha = 1.0;
+ double beta = 0.0;
+
+ const char *usage =
+ "Scales the FST scores using new_score = alpha * old_score + beta\n"
+ "where alpha and beta can be set as command line parameters. Typically\n"
+ "one would set beta!=0 and alpha=0 for logarithmic weights (tropical\n"
+ "semiring, for example) and alpha!=0 and beta=0 for probabilistic\n"
+ "weightsa\n"
typo weightsa
------------------------------
In src/fstbin/fsts-scale.cc
<#1356 (review)>:
> + try {
+ using namespace kaldi;
+ using namespace fst;
+ using kaldi::int32;
+
+ double alpha = 1.0;
+ double beta = 0.0;
+
+ const char *usage =
+ "Scales the FST scores using new_score = alpha * old_score + beta\n"
+ "where alpha and beta can be set as command line parameters. Typically\n"
+ "one would set beta!=0 and alpha=0 for logarithmic weights (tropical\n"
+ "semiring, for example) and alpha!=0 and beta=0 for probabilistic\n"
+ "weightsa\n"
+ "\n"
+ "Usage: fsts-scale --alpha=1 --beta=0 (fst-rxfilename|fst-rspecifier) "
Should be:
Usage: fsts-scale --alpha=<alpha> --beta=<beta> (<fst-rxfilename>|<fst-
rspecifier>)
------------------------------
In src/fstbin/fsts-union.cc
<#1356 (review)>:
> + using namespace kaldi;
+ using namespace fst;
+ typedef kaldi::int32 int32;
+ typedef kaldi::uint64 uint64;
+
+ const char *usage = "Reads a archive of FSTs. Performs FST operation union"
+ "on all fsts having the same key. Assumes the archive is sorted by key"
+ "\n"
+ "Usage: fsts-union [options] <fsts-rspecifier> <fsts-wspecifier>\n"
+ " e.g.: fsts-union ark:keywords_tmp.fsts ark,t:keywords.fsts\n";
+
+ ParseOptions po(usage);
+
+ po.Read(argc, argv);
+
+ if (po.NumArgs() < 2 || po.NumArgs() > 3) {
should be if (po.NumArgs() != 2)
------------------------------
In src/fstbin/fsts-union.cc
<#1356 (review)>:
> +
+ if (po.NumArgs() < 2 || po.NumArgs() > 3) {
+ po.PrintUsage();
+ exit(1);
+ }
+
+ std::string fsts_rspecifier = po.GetArg(1),
+ fsts_wspecifier = po.GetArg(2);
+
+
+ SequentialTableReader<VectorFstHolder> fst_reader(fsts_rspecifier);
+ TableWriter<VectorFstHolder> fst_writer(fsts_wspecifier);
+
+ int32 n_out_done = 0,
+ n_in_done = 0;
+ std::string res_key = "" ;
remove extra space
------------------------------
In src/fstbin/fsts-union.cc
<#1356 (review)>:
> + std::string res_key = "" ;
+ VectorFst<StdArc> res_fst;
+
+ for (; !fst_reader.Done(); fst_reader.Next()) {
+ std::string key = fst_reader.Key();
+ VectorFst<StdArc> fst(fst_reader.Value());
+
+ n_in_done++;
+ if (key == res_key) {
+ fst::Union(&res_fst, fst);
+ } else {
+ if (res_key != "") {
+ VectorFst<StdArc> out_fst;
+ fst::Minimize(&res_fst);
+ fst::RmEpsilon(&res_fst);
+ // fst::Determinize(res_fst, &out_fst);
remove comments
------------------------------
In src/fstbin/fsts-union.cc
<#1356 (review)>:
> + // fst::RmEpsilon(&out_fst);
+ // fst::Minimize(&out_fst);
+ // fst_writer.Write(res_key, out_fst);
+ fst_writer.Write(res_key, res_fst);
+ n_out_done++;
+ }
+ res_fst = fst;
+ res_key = key;
+ }
+
+ }
+ if (res_key != "") {
+ VectorFst<StdArc> out_fst;
+ fst::Minimize(&res_fst);
+ fst::RmEpsilon(&res_fst);
+ // fst::Determinize(res_fst, &out_fst);
remove comments
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
> + vector<double> out;
+ double score;
+ int32 tbeg, tend, uid;
+
+ uint64 osymbol = label_decoder.find(paths[i].last)->second;
+ uid = kaldi::DecodeLabelUid(osymbol);
+ tbeg = paths[i].weight.Value2().Value1().Value();
+ tend = paths[i].weight.Value2().Value2().Value();
+ score = paths[i].weight.Value1().Value();
+
+ out.push_back(uid);
+ out.push_back(tbeg);
+ out.push_back(tend);
+ out.push_back(score);
+
+ //KALDI_ASSERT(paths[i].path[0] == 0);
remove comments
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
>
- fst::MapSymbolsAction OutputSymbolsAction() const { return fst::MAP_COPY_SYMBOLS;}
+ fst::MapSymbolsAction OutputSymbolsAction() const {
+ return fst::MAP_COPY_SYMBOLS;
+ }
uint64 Properties(uint64 props) const { return props; }
};
need documentation for this struct and the two functions.
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
> "kw utterance_id beg_frame end_frame negated_log_probs\n"
- " e.g.: KW1 1 23 67 0.6074219\n"
+ " e.g.: \n"
+ "KW1 1 23 67 0.6074219\n\n"
+ "The second parameter is voluntary and allows the user to gather more\n"
voluntary->optional
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
>
const char *usage =
- "Search the keywords over the index. This program can be executed parallely, either\n"
- "on the index side or the keywords side; we use a script to combine the final search\n"
- "results. Note that the index archive has a only key \"global\".\n"
- "The output file is in the format:\n"
+ "Search the keywords over the index. This program can be executed\n"
+ "parallely, either on the index side or the keywords side; we use\n"
parallely -> in parallel
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
>
const char *usage =
- "Search the keywords over the index. This program can be executed parallely, either\n"
- "on the index side or the keywords side; we use a script to combine the final search\n"
- "results. Note that the index archive has a only key \"global\".\n"
- "The output file is in the format:\n"
+ "Search the keywords over the index. This program can be executed\n"
+ "parallely, either on the index side or the keywords side; we use\n"
+ "a script to combine the final search results. Note that the index\n"
+ "archive has a only key \"global\".\n\n"
a only key -> only a single key
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
>
const char *usage =
- "Search the keywords over the index. This program can be executed parallely, either\n"
- "on the index side or the keywords side; we use a script to combine the final search\n"
- "results. Note that the index archive has a only key \"global\".\n"
- "The output file is in the format:\n"
+ "Search the keywords over the index. This program can be executed\n"
+ "parallely, either on the index side or the keywords side; we use\n"
+ "a script to combine the final search results. Note that the index\n"
+ "archive has a only key \"global\".\n\n"
+ "Search has one or two outputs. The first one, is mandatory and will\n"
remove comma
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
> "\n"
- "Usage: kws-search [options] index-rspecifier keywords-rspecifier results-wspecifier\n"
- " e.g.: kws-search ark:index.idx ark:keywords.fsts ark:results\n";
+ "Usage: kws-search [options] index-rspecifier keywords-rspecifier "
use angle brackets, and indicate optional args. [however the code wasn't
quite right,
I'll suggest separate changes. E.g.:
Usage: kws-search [options] <index-rspecifier> <keywords-rspecifier>
<results-wspecifier> [<stats-wspecifier>]
I think the stats need more explanation.
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
> po.PrintUsage();
exit(1);
}
std::string index_rspecifier = po.GetArg(1),
keyword_rspecifier = po.GetOptArg(2),
- result_wspecifier = po.GetOptArg(3);
change the 1st two GetOptArg() -> GetArg(), they are not really optional.
Note: empty string is not a usable r/wspecifier.
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
> }
po.Read(argc, argv);
- if (po.NumArgs() < 3 || po.NumArgs() > 4) {
+ if (po.NumArgs() < 3 || po.NumArgs() > 5) {
should be if (po.NumArgs() < 3 || po.NumArgs() > 4)
no need to change because the original code was wrong, and it cancels out.
------------------------------
In src/kwsbin/kws-search.cc
<#1356 (review)>:
> @@ -198,6 +295,17 @@ int main(int argc, char *argv[]) {
KwsLexicographicFst result_fst;
Map(keyword, &keyword_fst, VectorFstToKwsLexicographicFstMapper());
Compose(keyword_fst, index, &result_fst);
+
+ if (stats_wspecifier != "") {
+ //match_writer.Write(key, result_fst);
remove code in comment
------------------------------
In src/kwsbin/lattice-to-kws-index.cc
<#1356 (review)>:
> po.Register("max-silence-frames", &max_silence_frames, "Maximum #frames for"
- " silence arc.");
+ " silence arc. The actuall number of frames will be computed as"
I think it would be clearer to change this so is reads:
If --frame-subsampling-factor is used, --max-silence-frames is relative to
the the input, not the output frame rate (we divide by
frame-subsampling-factor and round to the closest integer, to get the
number of symbols in the lattice).
------------------------------
In src/kwsbin/lattice-to-kws-index.cc
<#1356 (review)>:
> @@ -67,6 +71,7 @@ int main(int argc, char *argv[]) {
exit(1);
}
+ max_silence_frames = 0.5 + max_silence_frames / static_cast<float>(frame_subsampling_factor);
line length
------------------------------
In src/kwsbin/lattice-to-kws-index.cc
<#1356 (review)>:
> @@ -67,6 +71,7 @@ int main(int argc, char *argv[]) {
exit(1);
}
+ max_silence_frames = 0.5 + max_silence_frames / static_cast<float>(frame_subsampling_factor);
std::string usymtab_rspecifier = po.GetOptArg(1),
lats_rspecifier = po.GetArg(2),
index_wspecifier = po.GetOptArg(3);
change GetOptArg() -> GetArg(). If it were optional, the code should be
testing if the string is nonempty or if the table is open. And fix the
check of po.NumArgs().
------------------------------
In src/fstbin/fsts-project.cc
<#1356 (review)>:
> + ParseOptions po(usage);
+
+ bool project_output = false;
+
+ po.Register("project-output", &project_output,
+ "If true, project output vs input");
+
+ po.Read(argc, argv);
+
+ if (po.NumArgs() < 2 || po.NumArgs() > 3) {
+ po.PrintUsage();
+ exit(1);
+ }
+
+ std::string fsts_rspecifier = po.GetArg(1),
+ fsts_wspecifier = po.GetOptArg(2);
GetOptArg->GetArg
------------------------------
In src/kwsbin/lattice-to-kws-index.cc
<#1356 (review)>:
> @@ -67,6 +71,7 @@ int main(int argc, char *argv[]) {
exit(1);
}
+ max_silence_frames = 0.5 + max_silence_frames / static_cast<float>(frame_subsampling_factor);
std::string usymtab_rspecifier = po.GetOptArg(1),
Change GetOptArg() -> GetArg() here [program anyway cannot be called with
only one arg.]
------------------------------
In src/kwsbin/lattice-to-kws-index.cc
<#1356 (review)>:
> @@ -67,6 +71,7 @@ int main(int argc, char *argv[]) {
exit(1);
should be if (po.NumArgs() != 3) [obviously this predates your PR.]
------------------------------
In src/kwsbin/print-proxy-keywords.cc
<#1356 (review)>:
> @@ -0,0 +1,129 @@
+// kwsbin/print-proxy-keywords.cc
+//
+// Copyright 2014 Johns Hopkins University (Author: Guoguo Chen)
I assume this header is wrong?
------------------------------
In src/kwsbin/print-proxy-keywords.cc
<#1356 (review)>:
> +
+int main(int argc, char *argv[]) {
+ try {
+ using namespace kaldi;
+ using namespace fst;
+ typedef kaldi::int32 int32;
+ typedef kaldi::uint64 uint64;
+ typedef StdArc::StateId StateId;
+ typedef StdArc::Weight Weight;
+
+ const char *usage =
+ "Reads in the proxy keywords FSTs and print them to a file where each\n"
+ "line is \"kwid weight proxies\"\n"
+ "\n"
+ "Usage: print-proxy-keywords [options] <proxy-rspecifier> \\\n"
+ " <kwlist-wspecifier>\n"
should indicate the optional weights as:
[<weights-wspecifier>]
here and remove the [] in the "e.g." line.
------------------------------
In src/kwsbin/print-proxy-keywords.cc
<#1356 (review)>:
> + typedef StdArc::StateId StateId;
+ typedef StdArc::Weight Weight;
+
+ const char *usage =
+ "Reads in the proxy keywords FSTs and print them to a file where each\n"
+ "line is \"kwid weight proxies\"\n"
+ "\n"
+ "Usage: print-proxy-keywords [options] <proxy-rspecifier> \\\n"
+ " <kwlist-wspecifier>\n"
+ " e.g.: print-proxy-keywords ark:proxy.fsts ark,t:kwlist.txt [ark,t:weights.txt]\n";
+
+ ParseOptions po(usage);
+
+ po.Read(argc, argv);
+
+ if ((po.NumArgs() < 2) && (po.NumArgs() > 3)) {
&& should be ||, and no need for parentheses.
------------------------------
In src/kwsbin/print-proxy-keywords.cc
<#1356 (review)>:
> + }
+
+ vector<vector<StdArc::Label> > paths;
+ vector<StdArc::Weight> weights;
+ PrintProxyFstPath(proxy, &paths, &weights, proxy.Start(),
+ vector<StdArc::Label>(), StdArc::Weight::One());
+ KALDI_ASSERT(paths.size() == weights.size());
+ for (int32 i = 0; i < paths.size(); i++) {
+ vector<int32> kwlist;
+ vector<double> weight;
+ weight.push_back(weights[i].Value());
+ for (int32 j = 0; j < paths[i].size(); j++) {
+ kwlist.push_back(paths[i][j]);
+ }
+ kwlist_writer.Write(key, kwlist);
+ weight_writer.Write(key, weight);
if weight_wspecifier is really optional, then you should do:
if (weight_wspecifier != "")
here.
------------------------------
In src/kwsbin/transcripts-to-fsts.cc
<#1356 (review)>:
> @@ -46,21 +60,34 @@ int main(int argc, char *argv[]) {
In the usage message at line 51, need angle brackets around command line
args, and need to indicate the optional weights-rspecifier using
[<weights-rspecifier>]
there should probably be some explanation there of what the weights are.
Actually I am not really happy about it being called "weights-rspecifier"
if they actually represent
the minus-log of the weight, because "weights" on Kaldi command lines tend
to refer to things
that should be >= 0, i.e. actual weights. I'd prefer to see it called
"costs-rspecifier" and the
variable be called a "cost".
------------------------------
In src/kwsbin/transcripts-to-fsts.cc
<#1356 (review)>:
> @@ -81,6 +108,15 @@ int main(int argc, char *argv[]) {
VectorFst<StdArc> fst;
MakeLinearAcceptor(transcript, &fst);
+ if (weights_rspecifier != "" ) {
+ std::string weights_key = weights_reader.Key();
+ double weight = weights_reader.Value();
+ weights_reader.Next();
+
+ KALDI_ASSERT(weights_key == key);
+
This code is a bit ugly, and I think unnecessary... the weights_reader
should be made a RandomAccessDoubleReader, and you can use HasKey(key) and
Value(key). ,s,cs can be added on the command line to make it efficient, if
they are actually sorted. If they are not sorted (and this might relate to
the special things that are done in the keyword-search pipeline), then
we'll just take the memory hit. I prefer tools like this to be general
purpose.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1356 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKisX2l17ubicSQk8tWrN4Sq4bUaubXzks5rYompgaJpZM4LoP1g>
.
|
@danpovey this is open for review again |
src/fstbin/fsts-project.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a "see also:" line? Like:
"see also: fstproject (from the OpenFst toolkit)"
and the same for similar tools?
Ok. Makes sense.
Y.
…On Feb 7, 2017 9:28 PM, "Daniel Povey" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/fstbin/fsts-project.cc
<#1356 (review)>:
> +
+
+int main(int argc, char *argv[]) {
+ try {
+ using namespace kaldi;
+ using namespace fst;
+ typedef kaldi::int32 int32;
+ typedef kaldi::uint64 uint64;
+
+ const char *usage =
+ "Reads a table of FSTs; for each element, performs the fst project\n"
+ "operation either on input (default) or on the output (if the option\n"
+ "--project-output is true).\n"
+ "\n"
+ "Usage: fsts-project [options] <fsts-rspecifier> <fsts-wspecifier>\n"
+ " e.g.: fsts-project ark:train.fsts ark,t:train.fsts\n";
can we have a "see also:" line? Like:
"see also: fstproject (from the OpenFst toolkit)"
and the same for similar tools?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1356 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKisX6kNhzh_p-qy5CEQB71OlUCsPG8bks5raShOgaJpZM4LoP1g>
.
|
ok, done.
…On Tue, Feb 7, 2017 at 9:29 PM, Jan Trmal ***@***.***> wrote:
Ok. Makes sense.
Y.
On Feb 7, 2017 9:28 PM, "Daniel Povey" ***@***.***> wrote:
> ***@***.**** commented on this pull request.
> ------------------------------
>
> In src/fstbin/fsts-project.cc
> <#1356 (review)>
> :
>
> > +
> +
> +int main(int argc, char *argv[]) {
> + try {
> + using namespace kaldi;
> + using namespace fst;
> + typedef kaldi::int32 int32;
> + typedef kaldi::uint64 uint64;
> +
> + const char *usage =
> + "Reads a table of FSTs; for each element, performs the fst project\n"
> + "operation either on input (default) or on the output (if the option\n"
> + "--project-output is true).\n"
> + "\n"
> + "Usage: fsts-project [options] <fsts-rspecifier> <fsts-wspecifier>\n"
> + " e.g.: fsts-project ark:train.fsts ark,t:train.fsts\n";
>
> can we have a "see also:" line? Like:
> "see also: fstproject (from the OpenFst toolkit)"
> and the same for similar tools?
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#1356 (review)>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AKisX6kNhzh_p-qy5CEQB71OlUCsPG8bks5raShOgaJpZM4LoP1g>
> .
>
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to remove this? Some recipes use it to download CMUDict.
added back. I remember we were talking about it a couple of months back and
you were in favor of keeping it there. Sorry about this, just sneaked under
my radar.
y.
…On Tue, Feb 7, 2017 at 9:55 PM, Daniel Povey ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In tools/extras/check_dependencies.sh
<#1356 (review)>:
> @@ -41,10 +41,6 @@ if ! which libtoolize >&/dev/null && ! which glibtoolize >&/dev/null; then
add_packages libtool libtool libtool
fi
-if ! which svn >&/dev/null; then
Did you intend to remove this? Some recipes use it to download CMUDict.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#1356 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AKisX81M4xsMaqdIna1Gd3oImJIa7jljks5raS6XgaJpZM4LoP1g>
.
|
* [build]: resolving OpenFst compilation issue with gcc-6.x (#1392) * [egs] Add new graphemic system for Gale Arabic, with newer nnet scripts (#1298) * [build] Windows build: generate missing base/version.h; cosmetic changes (#1397) * [build]: Enable cross compilation, including to android. (#726) If a user has a number of tool chains installed and they do not want to use the default, they must currently edit the kaldi.mk file after running configure to change the CC, CXX, AR, AS, and RANLIB variables. This is something that should be exposed via the configure script. This patch exposes an option to set the host triple for the desired tool chain in the configure script. Building Kaldi on my Raspberry Pi boards is not particularly fast. I have been using the following patch to build kaldi executables for use on the Pi boards for the better part of a year. A typical invocation for me is something like: $ ./configure --static --atlas-root=/opt/cross/armv8hf \ --fst-root=/opt/cross/armv8hf --host=armv8-rpi3-linux-gnueabihf \ --fst-version=1.4.1 This way I can build on my much faster x86 desktop, but still run experiments on ARM. I have included support for cross compiling for ppc64le and it works for me (at least it produces binaries for ppc64le I don't have a ppc64 machine to test it). Signed-off-by: Eric B Munson <eric@cobaltspeech.com> * Add mk file and configure options for building for Android Building for Android requires a toolchain that can be built using the Android NDK. It works similiarly to the linux build except that it only uses clang, only supports the openBLAS math library, and requires an additional include directory for the system C++ headers. A typical configure invocation looks like: ./configure --static --openblas-root=/opt/cross/arm-linux-androideabi \ --fst-root=/opt/cross/arm-linux-androideabi \ --host=arm-linux-androideabi --fst-version=1.4.1 \ --android-includes=/opt/cross/arm-linux-androideabi/sysroot/usr/include Signed-off-by: Eric B Munson <eric@cobaltspeech.com> * Make pthread cancel symbols noops for Android The Android C library does not support cancelling pthreads so the symbols PTHREAD_CANCEL_STATE and pthread_setcancelstate are undefined. Because a pthread cannot be cancelled in Android, it is reasonable to make the pthread_setcancelstate() call a noop. Signed-off-by: Eric B Munson <eric@cobaltspeech.com> * [build] fixing issue introduced in the previous win commit (#1399) * [egs] Fix to HKUST nnet2/3 scripts. (#1401) when training ubm, we should just use the 40 dimention mfcc so change the train directory for avoiding dimention mismatching this script won't get error when run after nnet2's scripts. * [egs,scripts,src] Add BABEL s5d recipe; various associated fixes (#1356) * Creating a new recipe directory * adding lists * Improvements in the pipeline, fixes, syllab search * Transplanting the diff to s5d * added TDNN, LSTM and BLSTM scripts. added Telugu conf files. * added blstm script and top level commands * improved keyword search, new lang configs * removing not needed scripts * added blstm results * some keyword-search optimization binaries * removing some extra files + kwsearch pipeline improvement * adding configs for the OP3 langs * configs for the rest of the OP3 langs * Added updated configs for IndusDB.20151208.Babel.tar.bz2 * fixes of the pipeline, added langp (re)estimation * adding the kaldi-native search pipeline and a bunch of changes related to this * removing extra files * A couple of fixes * KWS improvements and fixes * Fixes of a couple of issues reported by Fred Richardson <frichard@ll.mit.edu> * A separate script for lexicon expansion * A couple of fixes and tweaks. Added checks for tools, especially sox. * adding a couple of changes -- new style options and results for BP langs * adding new results(still will need to be updated) * added langp and some details tweaked * updated STT results, new KWS results and a couple of small fixes all around * adding file lists for dev languages * miniature fixes and cleanups * one more batch of small fixes -- mostly whitespace cleanup * small fixes -- location of files and removal of trailing slash inn the pathname * enabling stage-2 KWS pipeline * adding some directories to .gitignore * some quick fixes * latest fixes * making the script split_compound_set to conform to the naming * some last minute fixes for the combination scoring * do not attempt to score when the scoring data is not available * bug fixes and --ntrue-from option * another batch of fixes * adding +x permission to split_compound_set.sh * fixing whitespaces * fixing whitespaces * a couple of fixes * adding the cleanup script and chain models training * adding the graphemic/unicode lexicon feature * adding the graphemic/unicode lexicon feature * fixing the the cc files headers, adding c info * use the user-provided kwset id, not the filename * use _cleaned affix * fixes w.r.t. getting chain models independent on other systems * small fixes as reported by Fred Richardson and Yenda * another issue reported by Fred Richarson * fixing KWS for the chain systems * fixes in the KWS hitlist combination * adding 40hrs pashto config and fixes for the unicode system * fixing some bugs as reported by Ni Chongjia (I2R) * fixing some bugs as reported by Fred Richardson * adding 40hrs Pashto OP3 setup * addressing Dan's comments, some further cleanup * improving the make_index script * remove fsts-scale * adding 'see also' to some of the fst tools * adding back accidentaly removed svn check * [egs] removing empty files in BABEL recipe (#1406) These caused a problem on MacOS, as reported by @dogancan. * Add online extension to travis build. * Fix parallel online extension build. Randomly choose between single and double precision BaseFloats in travis build. * Remove parantheses that were unintentinally added to the travis script in the previous commit.
…#1407) * [build]: resolving OpenFst compilation issue with gcc-6.x (kaldi-asr#1392) * [egs] Add new graphemic system for Gale Arabic, with newer nnet scripts (kaldi-asr#1298) * [build] Windows build: generate missing base/version.h; cosmetic changes (kaldi-asr#1397) * [build]: Enable cross compilation, including to android. (kaldi-asr#726) If a user has a number of tool chains installed and they do not want to use the default, they must currently edit the kaldi.mk file after running configure to change the CC, CXX, AR, AS, and RANLIB variables. This is something that should be exposed via the configure script. This patch exposes an option to set the host triple for the desired tool chain in the configure script. Building Kaldi on my Raspberry Pi boards is not particularly fast. I have been using the following patch to build kaldi executables for use on the Pi boards for the better part of a year. A typical invocation for me is something like: $ ./configure --static --atlas-root=/opt/cross/armv8hf \ --fst-root=/opt/cross/armv8hf --host=armv8-rpi3-linux-gnueabihf \ --fst-version=1.4.1 This way I can build on my much faster x86 desktop, but still run experiments on ARM. I have included support for cross compiling for ppc64le and it works for me (at least it produces binaries for ppc64le I don't have a ppc64 machine to test it). Signed-off-by: Eric B Munson <eric@cobaltspeech.com> * Add mk file and configure options for building for Android Building for Android requires a toolchain that can be built using the Android NDK. It works similiarly to the linux build except that it only uses clang, only supports the openBLAS math library, and requires an additional include directory for the system C++ headers. A typical configure invocation looks like: ./configure --static --openblas-root=/opt/cross/arm-linux-androideabi \ --fst-root=/opt/cross/arm-linux-androideabi \ --host=arm-linux-androideabi --fst-version=1.4.1 \ --android-includes=/opt/cross/arm-linux-androideabi/sysroot/usr/include Signed-off-by: Eric B Munson <eric@cobaltspeech.com> * Make pthread cancel symbols noops for Android The Android C library does not support cancelling pthreads so the symbols PTHREAD_CANCEL_STATE and pthread_setcancelstate are undefined. Because a pthread cannot be cancelled in Android, it is reasonable to make the pthread_setcancelstate() call a noop. Signed-off-by: Eric B Munson <eric@cobaltspeech.com> * [build] fixing issue introduced in the previous win commit (kaldi-asr#1399) * [egs] Fix to HKUST nnet2/3 scripts. (kaldi-asr#1401) when training ubm, we should just use the 40 dimention mfcc so change the train directory for avoiding dimention mismatching this script won't get error when run after nnet2's scripts. * [egs,scripts,src] Add BABEL s5d recipe; various associated fixes (kaldi-asr#1356) * Creating a new recipe directory * adding lists * Improvements in the pipeline, fixes, syllab search * Transplanting the diff to s5d * added TDNN, LSTM and BLSTM scripts. added Telugu conf files. * added blstm script and top level commands * improved keyword search, new lang configs * removing not needed scripts * added blstm results * some keyword-search optimization binaries * removing some extra files + kwsearch pipeline improvement * adding configs for the OP3 langs * configs for the rest of the OP3 langs * Added updated configs for IndusDB.20151208.Babel.tar.bz2 * fixes of the pipeline, added langp (re)estimation * adding the kaldi-native search pipeline and a bunch of changes related to this * removing extra files * A couple of fixes * KWS improvements and fixes * Fixes of a couple of issues reported by Fred Richardson <frichard@ll.mit.edu> * A separate script for lexicon expansion * A couple of fixes and tweaks. Added checks for tools, especially sox. * adding a couple of changes -- new style options and results for BP langs * adding new results(still will need to be updated) * added langp and some details tweaked * updated STT results, new KWS results and a couple of small fixes all around * adding file lists for dev languages * miniature fixes and cleanups * one more batch of small fixes -- mostly whitespace cleanup * small fixes -- location of files and removal of trailing slash inn the pathname * enabling stage-2 KWS pipeline * adding some directories to .gitignore * some quick fixes * latest fixes * making the script split_compound_set to conform to the naming * some last minute fixes for the combination scoring * do not attempt to score when the scoring data is not available * bug fixes and --ntrue-from option * another batch of fixes * adding +x permission to split_compound_set.sh * fixing whitespaces * fixing whitespaces * a couple of fixes * adding the cleanup script and chain models training * adding the graphemic/unicode lexicon feature * adding the graphemic/unicode lexicon feature * fixing the the cc files headers, adding c info * use the user-provided kwset id, not the filename * use _cleaned affix * fixes w.r.t. getting chain models independent on other systems * small fixes as reported by Fred Richardson and Yenda * another issue reported by Fred Richarson * fixing KWS for the chain systems * fixes in the KWS hitlist combination * adding 40hrs pashto config and fixes for the unicode system * fixing some bugs as reported by Ni Chongjia (I2R) * fixing some bugs as reported by Fred Richardson * adding 40hrs Pashto OP3 setup * addressing Dan's comments, some further cleanup * improving the make_index script * remove fsts-scale * adding 'see also' to some of the fst tools * adding back accidentaly removed svn check * [egs] removing empty files in BABEL recipe (kaldi-asr#1406) These caused a problem on MacOS, as reported by @dogancan. * Add online extension to travis build. * Fix parallel online extension build. Randomly choose between single and double precision BaseFloats in travis build. * Remove parantheses that were unintentinally added to the travis script in the previous commit.
replaces #1282
couple of fixes as reported by Fred.
rebased
I did a new PR instead of rebasing the old one and doing push -f, as other people are probably using the branch and I don't want to mess up their clones in case they do pull.