-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support ivector training in pytorch model #3969
Conversation
Thanks a lot for reviewing!
…On Tue, Mar 3, 2020 at 9:47 AM Fangjun Kuang ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/aishell/s10/local/run_ivector_common.sh
<#3969 (comment)>:
> + ${temp_data_root}/${train_set}_sp_hires_max2 \
+ exp/nnet3${nnet3_affix}/extractor $ivectordir
+
+fi
+
+if [[ $stage -le 8 ]]; then
+ # Also extract iVectors for the test data, but in this case we don't need the speed
+ # perturbation (sp) or small-segment concatenation (comb).
+ for data in dev test; do
+ steps/online/nnet2/extract_ivectors_online.sh --cmd "$train_cmd" --nj 10 \
+ data/${data}_hires exp/nnet3${nnet3_affix}/extractor \
+ exp/nnet3${nnet3_affix}/ivectors_${data}_hires
+ done
+fi
+
+exit 0;
this file should end with a newline.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#3969?email_source=notifications&email_token=AAZFLO2DVOTQ4UXOGPS7GC3RFROS5A5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCXU67ZI#pullrequestreview-367652837>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO46SQT5HSXGZ6DFFHTRFROS5ANCNFSM4K7XDWTA>
.
|
Guys, I just want to mention something...
I think it would be better if we shifted (not necessarily right now..) to,
instead of exposing the Kaldi egs as a Dataset,
exposing them as a DataLoader. That way we could use the existing
command-line tools for things like shuffling and
time-shifting, and it will be much more efficient for I/O.
The idea is that the dataloader would, on every epoch, create a suitable
command line and read from it as a pipe.
If it was a distributed data-loader, probably the easiest way to do it
would be to make sure there is an appropriately
split scp file and give it the appropriate one. We could the scripts here
#3765
to generate the scp files. I want to merge this soon; one option is to
merge into pybind11 first to test it.
…On Tue, Mar 3, 2020 at 10:11 AM fanlu ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In egs/aishell/s10/chain/feat_dataset.py
<#3969 (comment)>:
>
with open(feats_scp, 'r') as f:
for line in f:
split = line.split()
assert len(split) == 2
- items.append(split)
-
- self.items = items
+ uttid, rxfilename =split
OK
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3969?email_source=notifications&email_token=AAZFLO6XPUG7BA6VEDOUVETRFRRNRA5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCXVAYYY#discussion_r386762254>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO2XIWLBFIYM4T3DZQ3RFRRNRANCNFSM4K7XDWTA>
.
|
@csukuangfj I have fixed the code with your suggestion. please have a look. |
OK, I'll run it after it's merged. |
I'll take a look at that pr and start to do this.
|
OK, merging. |
Great, thanks! Firstly, just doing the merge and figuring out how to use
those newer scripts to prepare the egs would be a great start.
…On Tue, Mar 3, 2020 at 11:19 AM Haowen Qiu ***@***.***> wrote:
I'll take a look at that pr and start to do this.
Guys, I just want to mention something... I think it would be better if we
shifted (not necessarily right now..) to, instead of exposing the Kaldi egs
as a Dataset, exposing them as a DataLoader. That way we could use the
existing command-line tools for things like shuffling and time-shifting,
and it will be much more efficient for I/O. The idea is that the dataloader
would, on every epoch, create a suitable command line and read from it as a
pipe. If it was a distributed data-loader, probably the easiest way to do
it would be to make sure there is an appropriately split scp file and give
it the appropriate one. We could the scripts here #3765
<#3765> to generate the scp files.
I want to merge this soon; one option is to merge into pybind11 first to
test it.
… <#m_3185835891003743640_>
On Tue, Mar 3, 2020 at 10:11 AM fanlu *@*.*> wrote: @.** commented on
this pull request. ------------------------------ In
egs/aishell/s10/chain/feat_dataset.py <#3969 (comment)
<#3969 (comment)>>: >
with open(feats_scp, 'r') as f: for line in f: split = line.split() assert
len(split) == 2 - items.append(split) - - self.items = items + uttid,
rxfilename =split OK — You are receiving this because you commented. Reply
to this email directly, view it on GitHub <#3969
<#3969>?email_source=notifications&email_token=AAZFLO6XPUG7BA6VEDOUVETRFRRNRA5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCXVAYYY#discussion_r386762254>,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAZFLO2XIWLBFIYM4T3DZQ3RFRRNRANCNFSM4K7XDWTA
.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#3969?email_source=notifications&email_token=AAZFLO22JMLLIJ7M3UMK42LRFRZLFA5CNFSM4K7XDWTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENR5TZI#issuecomment-593746405>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZFLO63INWGRC6Y5NEM6SLRFRZLFANCNFSM4K7XDWTA>
.
|
update latest result