Skip to content

Merging already-merged chain egs #2916

Open
@danpovey

Description

@hhadian when you have time for this it would be nice if you could do it. It's not urgent-- it will be a while before the other pieces are all ready.

This is something that I am going to need for the new adaptation framework I am
working on. Currently in nnet-example-utils.cc and nnet-chain-example.cc, the
example-merging code does not support merging already-merged egs (search for already-merged).
This is something that I'm going to need to be supported at least in NnetChainExample, and
this would also need to be supported, I think, in the NnetExample merging code, since
I think the chain example merging code supports that code. If it would be helpful in
implementation, you may assume that all the egs to be merged have the same number
of 'n' values (e.g. it might be 4; it's the number of chunks per speaker that we use
for adaptation).

After the examples have been merged I'd like a variable as follows to be set in
the NnetChainSupervision object:

 // This will be 1 in normal cases, but in the 'chaina' code (chain training
 // with adaptation) it will be set to the number of chunks per speaker in
 // this minibatch.  For example if it's 4, then we are asserting that
 // sequences n=0 through 3 all come from the same speaker, n=4 through 7
 // all come from the same speaker, and so on.
 int32 chunks_per_spk;

Please make sure this is 1 by default (e.g. in the constructor), that the
on-disk format stays the same when it's 1 (e.g. only write it if it's not 1) to
minimize code-version compatibility headaches; and only set it to
a value other than 1 when merging chain supervision objects that were
already merged (you can check that the sizes of the things being merged match).
We may later introduce such a variable in the NnetSupervision object, but
it's not needed just yet.

This PR can go to my svd_draft branch in my personal repo, as it's part of
that project.

Metadata

Assignees

No one assigned

    Labels

    in progressIssue has been taken and is being worked onstaleStale bot on the loose

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions