-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Lattice-faster-decoder-combine #3061
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I talk with Hang and understand the main idea of his design.
Let's take Lattice-faster-decoder.cc as the baseline. Before explicit reviews, I have several general comments:
- next_cutoff in the code is theoretically worse than the baseline:
Say we are decoding frame T (the current decoding step). The baseline obtains next_cutoff from the best token of all emitting and non-emitting tokens in the last decoding step, while current code obtains it from the best token of all emitting tokens in the last decoding step (non-emitting tokens are processed in the current decoding step). Hence, the "best token" is not really the best in the current code. Moreover, thinking about line 811-817, it assumes the next emitting arc following previous emitting arc, which is unusual in speech and won't bring about a reasonable cutoff. Although we keep updating next_cutoff once we obtain better tokens, the next_cutoff will start from a bad one and become a much better one, which not only wastes many tokens in the beginning, but also makes beam tuning harder since "better one" changes greatly.
- the current code will have higher delay:
For the last frame, say T, baseline needs to decode emitting arcs of frame T and non-emitting arcs of frame T, while the current code needs to decode all above arcs and extra non-emitting arcs of frame T-1
- since the current code changes the pruning behaviors, if we want to replace the baseline, we need to test both 1-best & lattice performance versus decoding speed:
for 1-best, we need to test real time factor (RTF) versus WER
for lattice, we need to test lattice density versus lattice oracle error rate and lattice rescored error rate
To sum up, I am a little bit worry about the decoding efficiency because of comment 1 and 2 above. I have done some experiments on your codes, share the result and show some improving ideas here LvHang#2
please improve your implementation according to above PR and if you get similar speed after incorporating these ideas, please start to test wer & etc. as what I discussed in point 3 above to see whether my point 1 and 2 are really problems.
Improvements: 1. with the same config, the previous code will result in 10% more tokens in each frame compared with baseline. The reason is because GetCutoff function is over emitting tokens, while for baseline it is over emitting&non emiting tokens. (please check it use --verbose 6 and the number of all tokens is shown in line 895, but not in line 693, which only contains emitting) Hence we should use 10% less max_active config. 2. the baseline use kaldi version HashList, while the previous code use std::unordered_map. please still use HashList or its variants. The reason is because HashList is allocating memory in Pool/Block fashion, hence it is 10% faster than std::unordered_map. To show this problem, I temporally do a hack in line 247, plese check it After these two improvements, the current code is with similar speed as the baseline. HOWEVER, please check the quality of 1-best & lattice, I have discuss about my worries here kaldi-asr#3061 (review)
// bin/latgen-faster-mapped.cc | ||
|
||
// Copyright 2009-2012 Microsoft Corporation, Karel Vesely | ||
// 2013 Johns Hopkins University (author: Daniel Povey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is OK for testing but eventually this should just be a change to lattice-faster-decoder.cc. It's just a small optimization.
Zhehuai:
You are right that the cutoff may not be optimal, but this is an issue for
the emitting arcs as well, and I don't think it will affect nonemitting
arcs any worse than it already does for emitting ones.
I don't think the need to reprocess the last frame is a big problem, it's
only one frame. The speed advantage of this method is that we don't need
to iterate over all the states twice, to process emitting and nonemitting
arcs separately.
However, I have a lot of concerns about (a) the design of this, with
recover_, and recover_map_, and (b) the quality of the comments. It
doesn't look even close to something that I would merge, at this point, in
terms of clarity and documentation; in fact, it might be faster for me to
just take it, figure out what is going on, and rewrite it myself. I am
hoping that Zhehuai can work with Hang on this. I might in principle be OK
with the core design, but first I would need to be able to understand it,
and the comments do not help with that. Maybe Zhehuai can understand it
and write the comments himself?
…On Wed, Feb 27, 2019 at 10:46 PM Zhehuai Chen ***@***.***> wrote:
***@***.**** commented on this pull request.
I talk with Hang and understand the main idea of his design.
Let's take Lattice-faster-decoder.cc as the baseline. Before explicit
reviews, I have several general comments:
1. next_cutoff in the code is theoretically worse than the baseline:
Say we are decoding frame T (the current decoding step). The baseline
obtains next_cutoff from the best token of all emitting and non-emitting
tokens in the last decoding step, while current code obtains it from the
best token of all emitting tokens in the last decoding step (non-emitting
tokens are processed in the current decoding step). Hence, the "best token"
is not really the best in the current code. Moreover, thinking about line
811-817, it assumes the next emitting arc following previous emitting arc,
which is unusual in speech and won't bring about a reasonable cutoff.
Although we keep updating next_cutoff once we obtain better tokens, the
next_cutoff will start from a bad one and become a much better one, which
not only wastes many tokens in the beginning, but also makes beam tuning
harder since "better one" changes greatly.
1. the current code will have higher delay:
For the last frame, say T, baseline needs to decode emitting arcs of frame
T and non-emitting arcs of frame T, while the current code needs to decode
all above arcs and extra non-emitting arcs of frame T-1
1. since the current code changes the pruning behaviors, if we want to
replace the baseline, we need to test both 1-best & lattice performance
versus decoding speed:
for 1-best, we need to test real time factor (RTF) versus WER
for lattice, we need to test lattice density versus lattice oracle error
rate and lattice rescored error rate
To sum up, I am a little bit worry about the decoding efficiency because
of comment 1 and 2 above. I will do some experiments on the current code
and share the result with Hang.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3061 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVuw27xi0WSGOLIhV0yOoRhiQlt4ePks5vR1EWgaJpZM4bV0gF>
.
|
Dan: In my design, the recover_ and recover_map_ will be used when we call GetRawLattice() in the middle of an utterance decoding. |
Dan, Zhehuai: I update the functions and comments. Now ProcessNonemitting() will take an argument "std::unordered_map<Token*, BaseFloat> recover_map". In normal case, it will be NULL. But when we call GetRawLattice() during decoding, we will pass a local variable "recover_map" of GetRawLattice() and then recover the "active_toks_" in the end of GetRawLattice(). So it will not introduce the linkages. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I review the coding style and code comments inline. Please test speeds and precision once you finish them.
Regarding max-active: see how much difference it makes to the WER/RTF curve
if you just leave a single option max-active and redefine it as the total
number of active states. If it's not a huge incompatibility I'd rather
keep it simple; there will be other slight differences in the pruning
behavior as well, that may somewhat cancel out this effect.
…On Fri, Mar 1, 2019 at 8:16 AM Zhehuai Chen ***@***.***> wrote:
***@***.**** requested changes on this pull request.
I review the coding style and code comments inline. Please share some
testing speeds once you finish them.
------------------------------
In src/decoder/lattice-faster-decoder-combine.h
<#3061 (comment)>:
> + Fst::Fst<fst::StdArc>, a.k.a. StdFst, or GrammarFst. If you invoke it with
+ FST == StdFst and it notices that the actual FST type is
+ fst::VectorFst<fst::StdArc> or fst::ConstFst<fst::StdArc>, the decoder object
+ will internally cast itself to one that is templated on those more specific
+ types; this is an optimization for speed.
+ */
+template <typename FST, typename Token = decodercombine::StdToken>
+class LatticeFasterDecoderCombineTpl {
+ public:
+ using Arc = typename FST::Arc;
+ using Label = typename Arc::Label;
+ using StateId = typename Arc::StateId;
+ using Weight = typename Arc::Weight;
+ using ForwardLinkT = decodercombine::ForwardLink<Token>;
+
+ using StateIdToTokenMap = typename std::unordered_map<StateId, Token*>;
using StateIdToTokenMap = typename std::unordered_map<StateId, Token*,
std::hash, std::equal_to, fkaldi::PoolAllocator<std::pair<const StateId,
Token*>>>;
------------------------------
In src/decoder/lattice-faster-decoder-combine.h
<#3061 (comment)>:
> + // It's called by PruneActiveTokens if any forward links have been pruned
+ void PruneTokensForFrame(int32 frame_plus_one);
+
+
+ // Go backwards through still-alive tokens, pruning them if the
+ // forward+backward cost is more than lat_beam away from the best path. It's
+ // possible to prove that this is "correct" in the sense that we won't lose
+ // anything outside of lat_beam, regardless of what happens in the future.
+ // delta controls when it considers a cost to have changed enough to continue
+ // going backward and propagating the change. larger delta -> will recurse
+ // less far.
+ void PruneActiveTokens(BaseFloat delta);
+
+ /// Processes non-emitting (epsilon) arcs and emitting arcs for one frame
+ /// together. It takes the emittion tokens in "cur_toks_" from last frame.
+ /// Generates non-emitting tokens for current frame and emitting tokens for
Generates non-emitting tokens for the previous frame and emitting tokens
for the current frame.
It would be better to keep the definition consistent as previous decoders.
The
emitting tokens for the current frame means the token take acoustic scores
of the current frame
if you agree with this change, maybe you would like to also change all
"cur_toks_" to "prev_toks_", and change all "next_toks_" to "cur_toks_"
------------------------------
In src/decoder/lattice-faster-decoder-combine.h
<#3061 (comment)>:
> + void PruneActiveTokens(BaseFloat delta);
+
+ /// Processes non-emitting (epsilon) arcs and emitting arcs for one frame
+ /// together. It takes the emittion tokens in "cur_toks_" from last frame.
+ /// Generates non-emitting tokens for current frame and emitting tokens for
+ /// next frame.
+ void ProcessForFrame(DecodableInterface *decodable);
+
+ /// Processes nonemitting (epsilon) arcs for one frame.
+ /// Calls this function once when all frames were processed.
+ /// Or calls it in GetRawLattice() to generate the complete token list for
+ /// the last frame. [Deal With the tokens in map "next_toks_" which would
+ /// only contains emittion tokens from previous frame.]
+ /// If "recover_map" isn't NULL, we build the recover_map which will be used
+ /// to recover "active_toks_[last_frame]" token list for the last frame.
+ void ProcessNonemitting(std::unordered_map<Token*, BaseFloat> *recover_map);
please merge this function into ProcessForFrame as we discussed
------------------------------
In src/decoder/lattice-faster-decoder-combine.h
<#3061 (comment)>:
> + /// Returns true if result is nonempty (using the return status is deprecated,
+ /// it will become void). If "use_final_probs" is true AND we reached the
+ /// final-state of the graph then it will include those as final-probs, else
+ /// it will treat all final-probs as one. Note: this just calls GetRawLattice()
+ /// and figures out the shortest path.
+ bool GetBestPath(Lattice *ofst,
+ bool use_final_probs = true);
+
+ /// Outputs an FST corresponding to the raw, state-level
+ /// tracebacks. Returns true if result is nonempty.
+ /// If "use_final_probs" is true AND we reached the final-state
+ /// of the graph then it will include those as final-probs, else
+ /// it will treat all final-probs as one.
+ /// The raw lattice will be topologically sorted.
+ /// The function can be called during decoding, it will take "next_toks_" map
+ /// and generate the complete token list for the last frame. Then recover it
it will take "next_toks_" map and generate the complete token list -> it
will process non-emission tokens from "next_toks_" map to get both emission
and non-emission tokens to get raw lattices
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> +
+// instantiate this class once for each thing you have to decode.
+template <typename FST, typename Token>
+LatticeFasterDecoderCombineTpl<FST, Token>::LatticeFasterDecoderCombineTpl(
+ const FST &fst,
+ const LatticeFasterDecoderCombineConfig &config):
+ fst_(&fst), delete_fst_(false), config_(config), num_toks_(0) {
+ config.Check();
+}
+
+
+template <typename FST, typename Token>
+LatticeFasterDecoderCombineTpl<FST, Token>::LatticeFasterDecoderCombineTpl(
+ const LatticeFasterDecoderCombineConfig &config, FST *fst):
+ fst_(fst), delete_fst_(true), config_(config), num_toks_(0) {
+ config.Check();
add the following to be consistent with previous version:
next_toks_.reserve(1000);
cur_toks_.reserve(1000);
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> +// an unusual search error.
+template <typename FST, typename Token>
+bool LatticeFasterDecoderCombineTpl<FST, Token>::Decode(DecodableInterface *decodable) {
+ InitDecoding();
+
+ // We use 1-based indexing for frames in this decoder (if you view it in
+ // terms of features), but note that the decodable object uses zero-based
+ // numbering, which we have to correct for when we call it.
+
+ while (!decodable->IsLastFrame(NumFramesDecoded() - 1)) {
+ if (NumFramesDecoded() % config_.prune_interval == 0)
+ PruneActiveTokens(config_.lattice_beam * config_.prune_scale);
+ ProcessForFrame(decodable);
+ }
+ // Procss non-emitting arcs for the last frame.
+ ProcessNonemitting(NULL);
remove this separate call as we discussed
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> +template <typename FST, typename Token>
+bool LatticeFasterDecoderCombineTpl<FST, Token>::GetRawLattice(
+ Lattice *ofst,
+ bool use_final_probs) {
+ typedef LatticeArc Arc;
+ typedef Arc::StateId StateId;
+ typedef Arc::Weight Weight;
+ typedef Arc::Label Label;
+ // Note: you can't use the old interface (Decode()) if you want to
+ // get the lattice with use_final_probs = false. You'd have to do
+ // InitDecoding() and then AdvanceDecoding().
+ if (decoding_finalized_ && !use_final_probs)
+ KALDI_ERR << "You cannot call FinalizeDecoding() and then call "
+ << "GetRawLattice() with use_final_probs == false";
+
+ std::unordered_map<Token*, BaseFloat> *recover_map = NULL;
do not need to use pointer and new one to it. Just use
std::unordered_map<Token*, BaseFloat> recover_map
------------------------------
In src/decoder/lattice-faster-decoder-combine.h
<#3061 (comment)>:
> + void RecoverLastTokenList(std::unordered_map<Token*, BaseFloat> *recover_map);
+
+
+ /// The "cur_toks_" and "next_toks_" actually allow us to maintain current
+ /// and next frames. They are indexed by StateId. It is indexed by frame-index
+ /// plus one, where the frame-index is zero-based, as used in decodable object.
+ /// That is, the emitting probs of frame t are accounted for in tokens at
+ /// toks_[t+1]. The zeroth frame is for nonemitting transition at the start of
+ /// the graph.
+ StateIdToTokenMap cur_toks_;
+ StateIdToTokenMap next_toks_;
+
+ /// Gets the weight cutoff.
+ /// Notice: In traiditional version, the histogram prunning method is applied
+ /// on a complete token list on one frame. But, in this version, it is used
+ /// on a token list which only contains the emittion part. So the max_active
because of this, how about change "max_active" to "max_emit". and to
make it forward compatibility to our previous scripts, we can have
max_emit=0.9*max_active
------------------------------
In src/decoder/lattice-faster-decoder-combine.h
<#3061 (comment)>:
> + /// and next frames. They are indexed by StateId. It is indexed by frame-index
+ /// plus one, where the frame-index is zero-based, as used in decodable object.
+ /// That is, the emitting probs of frame t are accounted for in tokens at
+ /// toks_[t+1]. The zeroth frame is for nonemitting transition at the start of
+ /// the graph.
+ StateIdToTokenMap cur_toks_;
+ StateIdToTokenMap next_toks_;
+
+ /// Gets the weight cutoff.
+ /// Notice: In traiditional version, the histogram prunning method is applied
+ /// on a complete token list on one frame. But, in this version, it is used
+ /// on a token list which only contains the emittion part. So the max_active
+ /// and min_active values might be narrowed.
+ BaseFloat GetCutoff(const StateIdToTokenMap& toks,
+ BaseFloat *adaptive_beam,
+ StateId *best_elem_id, Token **best_elem);
best_elem_id -> best_state_id
best_elem -> best_token
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> + // pruning "online" before having seen all tokens
+
+ // "next_cutoff" is used to limit a new token in next frame should be handle
+ // or not. It will be updated along with the further processing.
+ BaseFloat next_cutoff = std::numeric_limits<BaseFloat>::infinity();
+ // "cost_offset" contains the acoustic log-likelihoods on current frame in
+ // order to keep everything in a nice dynamic range. Reduce roundoff errors.
+ BaseFloat cost_offset = 0.0;
+
+ // First process the best token to get a hopefully
+ // reasonably tight bound on the next cutoff. The only
+ // products of the next block are "next_cutoff" and "cost_offset".
+ // Notice: As the difference between the combine version and the traditional
+ // version, this "best_tok" is choosen from emittion tokens. Normally, the
+ // best token of one frame comes from an epsilon non-emittion. So the best
+ // token is a looser boundary. Use it to estimate a bound on the next cutoff.
So the best token for now is a looser boundary. We use it to estimate a
bound on the next cutoff and we will update the cutoff once we have better
tokens
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> + *adaptive_beam = config_.beam;
+ return beam_cutoff;
+ }
+ }
+}
+
+template <typename FST, typename Token>
+void LatticeFasterDecoderCombineTpl<FST, Token>::ProcessForFrame(
+ DecodableInterface *decodable) {
+ KALDI_ASSERT(active_toks_.size() > 0);
+ int32 frame = active_toks_.size() - 1; // frame is the frame-index
+ // (zero-based) used to get likelihoods
+ // from the decodable object.
+ active_toks_.resize(active_toks_.size() + 1);
+
+ cur_toks_.clear();
it may be better to understand:
cur_toks_.swap(next_toks_);
next_toks_.clear();
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> + cur_cost = tok->tot_cost,
+ tot_cost = cur_cost + ac_cost + graph_cost;
+ if (tot_cost > next_cutoff) continue;
+ else if (tot_cost + adaptive_beam < next_cutoff)
+ next_cutoff = tot_cost + adaptive_beam; // a tighter boundary for emitting
+
+ // no change flag is needed
+ Token *next_tok = FindOrAddToken(arc.nextstate, frame + 1, tot_cost,
+ tok, &next_toks_, NULL);
+ // Add ForwardLink from tok to next_tok. Put it on the head of tok->link
+ // list
+ tok->links = new ForwardLinkT(next_tok, arc.ilabel, arc.olabel,
+ graph_cost, ac_cost, tok->links);
+ }
+ } // for all arcs
+ tok->in_current_queue = false; // out of queue
let's do this in line 838 in case there is a state with self-loop
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> + if (tot_cost > next_cutoff) continue;
+ else if (tot_cost + adaptive_beam < next_cutoff)
+ next_cutoff = tot_cost + adaptive_beam; // a tighter boundary for emitting
+
+ // no change flag is needed
+ Token *next_tok = FindOrAddToken(arc.nextstate, frame + 1, tot_cost,
+ tok, &next_toks_, NULL);
+ // Add ForwardLink from tok to next_tok. Put it on the head of tok->link
+ // list
+ tok->links = new ForwardLinkT(next_tok, arc.ilabel, arc.olabel,
+ graph_cost, ac_cost, tok->links);
+ }
+ } // for all arcs
+ tok->in_current_queue = false; // out of queue
+ } // end of while loop
+ KALDI_VLOG(6) << "toks after: " << cur_toks_.size();
KALDI_VLOG(6) << "Number of tokens active on frame " << NumFramesDecoded()-1
<< " is " << cur_toks_.size();
------------------------------
In src/decoder/lattice-faster-decoder-combine.cc
<#3061 (comment)>:
> + BaseFloat w = static_cast<BaseFloat>(iter->second->tot_cost);
+ tmp_array_.push_back(w);
+ if (w < best_weight) {
+ best_weight = w;
+ if (best_elem) {
+ *best_elem_id = iter->first;
+ *best_elem = iter->second;
+ }
+ }
+ }
+
+ BaseFloat beam_cutoff = best_weight + config_.beam,
+ min_active_cutoff = std::numeric_limits<BaseFloat>::infinity(),
+ max_active_cutoff = std::numeric_limits<BaseFloat>::infinity();
+
+ KALDI_VLOG(6) << "Number of tokens active on frame " << NumFramesDecoded()
KALDI_VLOG(6) << "Number of tokens emitting on frame " <<
NumFramesDecoded()-1
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3061 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu9Hs1Z60mrg6zhCKH0K58Taismb5ks5vSSgTgaJpZM4bV0gF>
.
|
According to the suggestions of Dan and Zhehuai, update the comments and code. |
thanks. Zhehuai, please check carefully.
I also will wait to hear on the efficiency.
Ideally it would be nice to be able to separate the effects of stopping
using HashList, versus combining nonemitting/emitting propagation, but
perhaps that is too much to hope for.
…On Sat, Mar 2, 2019 at 5:43 PM LvHang ***@***.***> wrote:
According to the suggestions of Dan and Zhehuai, update the comments and
code.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3061 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu1VmmwPSPI7SaqItNdScEhlylEMbks5vSv6pgaJpZM4bV0gF>
.
|
I uploaded a simple testing script. It calls "nnet3-compute" to generate posterior, and then use "latgen-faster-mapped" or "latgen-faster-mapped-combine" to decode. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Various requested changes.
Just treat it as zero, since we do in fact normalize costs to be not far
from zero on each frame. It's not necessary; let the Push() code handle it.
…On Tue, Mar 26, 2019 at 3:51 PM LvHang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/decoder/lattice-faster-decoder-combine-bucketqueue.h
<#3061 (comment)>:
> + in_queue(false) { }
+};
+
+} // namespace decoder
+
+
+template<typename Token>
+class BucketQueue {
+ public:
+ /** Constructor. 'cost_scale' is a scale that we multiply the token costs by
+ * before intergerizing; a larger value means more buckets.
+ * 'best_cost_estimate' is an estimate of the best (lowest) cost that
+ * we are likely to encounter (e.g. the best cost that we have seen so far).
+ * It is used to initialize 'bucket_storage_begin_'.
+ */
+ BucketQueue(BaseFloat best_cost_estimate, BaseFloat cost_scale = 1.0);
Dan, how are you thinking about remove "best_cost_estimate" from the
constructor of BucketQueue?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3061 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu2-JmSaFokjgMbEdyatfOmULbWWXks5vanpdgaJpZM4bV0gF>
.
|
In src/decoder/lattice-faster-decoder-combine-bucketqueue.cc
<#3061 (comment)>:
> + if (buckets_[vec_index].empty()) { // This bucket is empty. Update vec_index
+ int32 next_vec_index = vec_index + 1;
+ for (; next_vec_index < buckets_.size(); next_vec_index++) {
+ if (!buckets_[next_vec_index].empty()) break;
+ }
+ vec_index = next_vec_index;
+ first_occupied_vec_index_ = vec_index;
+ }
+
+ if (tok->in_queue) { // This is a effective token
+ tok->in_queue = false;
+ return tok;
+ }
+ }
+ return NULL;
+}
Sorry, dan, I didn't get the difference between
"first_nonempty_bucket_index_" and "first_occupied_vec_index_"
I was referring the terminology in a comment in the code, which you got
from an email of mine. They are the same, but my point is you don't have a
temporary in the code any more named 'bucket_index'.
Dan, with the new design, I think the Pop() function still looks like
current code.
We cannot return the token directly as you wrote
if (!buckets_[first_nonempty_bucket_index_].empty()) {
Token *ans = buckets_[first_nonempty_bucket_index_].back();
buckets_[first_nonempty_bucket_index_].pop_back();
return ans;
}
Because we cannot guarantee the 'ans' is an effective token.
OK, then check ans ans->in_queue == true before returning it. But don't
start with a while
loop, because you are checking a condition that is not necessary to check
at that point
(because as I said, you would always ensure that
first_nonempty_bucket_index_ was
less than buckets_size(), precisely to avoid that check).
while (first_nonempty_bucket_index_ < buckets_.size()) {
Token *tok = buckets_[first_nonempty_bucket_index_].back();
if (tok->in_queue) break;
buckets_[first_nonempty_bucket_index_].pop_back(); //remove the invalid token
if (buckets_[first_nonempty_bucket_index_].empty()) {
first_nonempty_bucket_index_++;
for (; first_nonempty_bucket_index_ < buckets_.size(); first_nonempty_bucket_index_++) {
if(!bucket_[first_nonempty_bucket_index_].empty()) break;
}
}
}
At the same time, we still need "first_nonempty_bucket_index ==
buckets_.size()" to determinize the BucketQueue is empty at the beginning
of the function.
if (first_nonempty_bucket_index == buckets_.size()) { // no token left
return NULL:
]
Read more carefully what I said. I am proposing to always ensure that
first_nonempty_bucket_index< buckets_.size()), and change what that
variable means.
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3061 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu_yRWN3GDFtM1wvABW44v4FxCJabks5vasbJgaJpZM4bV0gF>
.
|
Hi Dan, I understand that you want to use the new "bucket_index" formula instead of the old "bucket_index" and "vec_index". Let me explain what makes me confused. Firstly, we have the definition of "first_nonempty_bucket_index_". So the key problem is when and how to update "first_nonempty_bucket_index_". (1) But what should we do if the Pop operation is not successed? We need a while loop to repeat the try until the BucketQueue is empty.
And the code will be (please see the comments to know what I'm concerned.)
|
The whole point is to make the normal code path very fast.
Just set first_nonempty_bucket_index_ to buckets_.size() - 1 if it would
naturally be more than buckets_.size(). That way you never have to check
(in the normally taken code path) whether that index is valid or not.
…On Wed, Mar 27, 2019 at 12:56 AM LvHang ***@***.***> wrote:
Hi Dan,
I understand that you want to use the new "bucket_index" formula instead
of the old "bucket_index" and "vec_index".
Let me explain what makes me confused. Firstly, we have the definition of
"first_nonempty_bucket_index_".
/// first_nonempty_bucket_index_ is an integer in the range [0,
buckets_.size() -1]
/// which is not larger than the index of the ifrst nonempty element of
buckets_.
int32 first_nonempty_bucket_index_;
So the key problem is when and how to update
"first_nonempty_bucket_index_".
Assume the Push() function adds the token into the BucketQueue one by one
and sets "first_nonempty_bucket_index_" to the first non-empty bucket.
From your previous comments, I guess, after once successful Pop operation
(ans->in_queue), you just leave the "first_nonempty_bucket_index_" there.
And then, until the bucket is empty
(buckets_[first_nonempty_bucket_index_].empty()), you will update it.
(1) But what should we do if the Pop operation is not successed? We need a
while loop to repeat the try until the BucketQueue is empty.
(2) How can we know the BucketQueue is empty? I think the most convenient
way is "first_nonempty_bucket_index_ == buckets_.size()". But it is
conflict with your definition. So I suggest use
the following definition.
/// first_nonempty_bucket_index_ is an integer whose valid range is [0, buckets_.size() -1]
/// which is not larger than the index of the ifrst nonempty element of buckets_.
/// If "first_nonempty_bucket_index_" corresponds to a value past the end of buckets_
/// (i.e. first_nonempty_bucket_index_ == buckets_.size(), we interpret it as "there are no
/// buckets with entries."
int32 first_nonempty_bucket_index_;
And the code will be (please see the comments to know what I'm concerned.)
93 template<typename Token>
94 Token* BucketQueue<Token>::Pop() {
95 while (true) {
96 if (!bucket_[first_nonempty_bucket_index_].empty()) {
97 Token *ans = buckets_[first_nonempty_bucket_index_].back();
98 buckets_[first_nonempty_bucket_index_].pop_back();
99 if (ans->in_queue) {
100 ans->in_queue = false;
101 return ans;
102 }
103 }
104 // There are two cases can trigger the following code to update the
105 // "first_nonempty_bucket_index_". One is last token is invalid (i.e.
106 // ans->in_queue == false). The other is bucket is empty.
107 if (buckets_[first_nonempty_bucket_index_].empty()) {
108 first_nonempty_bucket_index_++;
109 for (; first_nonempty_bucket_index_ < buckets_.size();
110 first_nonempty_bucket_index_++) {
111 if (!buckets_[first_nonempty_bucket_index_].empty()) break;
112 }
113 }
114 // There are two cases will trigger the following code. One is
115 // first_nonempty_bucket_index_ == buckets_.size() [ But it is conflict with
116 // the definition]. The other is we find an appropriate new
117 // "first_nonempty_bucket_index_".
118 if (first_nonempty_bucket_index_ == buckets_.size()) return NULL;
119 }
120 }
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3061 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ADJVu99sY5yu4M7gHDcdTMcp8AhjQX7bks5vavn7gaJpZM4bV0gF>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small comments.
This is getting closer to being ready to commit. One more big thing that needs to be done is make a clean version of this where this code replaces the code in lattice-faster-decoder.{h,cc}. There is no reason to keep both versions around.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed by a bot strictly because of inactivity. This does not mean that we think that this issue is not important! If you believe it has been closed hastily, add a comment to the issue and mention @kkm000, and I'll gladly reopen it. |
This issue has been automatically marked as stale by a bot solely because it has not had recent activity. Please add any comment (simply 'ping' is enough) to prevent the issue from being closed for 60 more days if you believe it should be kept open. |
Combine ProcessEmitting() and ProcessNonemitting() into ProcessForFrame(). According to the modification, update GetRawLattice().
Put it here so that @chenzhehuai can review it easily.
Thanks.