Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training with multiple machines #3966

Merged
merged 1 commit into from
Mar 2, 2020

Conversation

qindazhu
Copy link
Contributor

@qindazhu qindazhu commented Mar 1, 2020

  • Remove run_cleanup_segmentation.sh as it does not make the result better.
  • Remove duplicate code and support single-GPU/multiple-GPU/multiple-machine training in one script
  • Add online-cmvn as @fanlu showed before

Result

tdnn_1c_rd_rmc_rng_without_cleanup tdnn_1c_rd_rmc_rng
dev_cer 5.92 5.99
dev_wer 13.71 13.86
test_cer 7.03 7.08
test_wer 15.35 15.72
TDNN-F(Pytorch, Adam, fanlu's previous result ) TDNN-F(Pytorch, Adam, this pullrequest with 4GPU)
dev_cer 6.16 6.29
dev_wer 14.01 14.10
test_cer 7.31 7.57
test_wer 15.97 15.80

@danpovey
Copy link
Contributor

danpovey commented Mar 1, 2020

OK, I assume this is OK to merge? Any objections?

@@ -28,7 +29,11 @@ def main():
logging.warning('No GPU detected! Use CPU for inference.')
device = torch.device('cpu')
else:
device = torch.device('cuda', args.device_id)
devices = allocate_gpu_devices(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we follow the kaldi style SelectGpu() to select a gpu that has the largest available memory?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to do this with --q options in queue.pl, considering GPUs on a single machine nowdays are commonly with same type (and memory) and we suppose they are in Exclusive_Process mode

@@ -28,7 +29,11 @@ def main():
logging.warning('No GPU detected! Use CPU for inference.')
device = torch.device('cpu')
else:
device = torch.device('cuda', args.device_id)
devices = allocate_gpu_devices(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the user provides device_id at the commandline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do this in another pr, I think what we need is device_ids instead of device_id. Actually I don't think about this very clearly now as:

  • Multiple gpus is required when training with multiple GPUs on a single machine, that's why we need device_ids

  • Assign GPU id when training with multiple machines is relatively complex for uses.

@danpovey any comments about this?

@csukuangfj
Copy link
Contributor

csukuangfj commented Mar 2, 2020

Remove run_cleanup_segmentation.sh as it does help to the WER/CER

does help OR does NOT help

improve CER/WER may be better than help to CER/WER.

@qindazhu qindazhu force-pushed the haowen-ddp-multiple-machine branch from 99ee14b to a0343b6 Compare March 2, 2020 03:31
@qindazhu
Copy link
Contributor Author

qindazhu commented Mar 2, 2020

It does not help. Thanks, updated the text.

We can't say improve here, it just does not make a different by adding or removing cleanup, at least according to my experiment.

Remove run_cleanup_segmentation.sh as it does help to the WER/CER

does help OR does NOT help

improve CER/WER may be better than help to CER/WER.

@csukuangfj
Copy link
Contributor

I just feel that it's unnatural to say help + to + noun

@qindazhu
Copy link
Contributor Author

qindazhu commented Mar 2, 2020

OK, updated.

@danpovey danpovey merged commit 756f490 into kaldi-asr:pybind11 Mar 2, 2020
@danpovey
Copy link
Contributor

danpovey commented Mar 2, 2020

Merged, thanks!!

@fanlu
Copy link

fanlu commented Mar 2, 2020

I have implemented online-ivector in pytorch model, but I get no gain but worse result when using online-ivector. @qindazhu @csukuangfj please review this code.
this code is to inference nnet_output by input chunk.
in inference.py

    for batch_idx, batch in enumerate(dataloader):
        key_list, padded_feat, output_len_list, padded_ivector, ivector_len_list = batch
        padded_feat = padded_feat.to(device)
        padded_ivector = padded_ivector.to(device)
        with torch.no_grad():
            nnet_outputs = []
            input_num_frames = padded_feat.shape[1] + 2 - args.model_left_context - args.model_right_context
            # 17 chunk_len same as kaldi
            for i in range(0, output_len_list[0], 17):
                # input_len 418-> [0, 17, 34, 51, 68, 85, 102, 119, 136]
                first_output = i * 3
                last_output = min(first_output + (17-1) * 3, input_num_frames)
                first_input = first_output
                last_input = last_output + args.model_left_context + args.model_right_context
                input_x = padded_feat[:, first_input:last_input+1, :]
                ivector_index = (first_output + last_output) // 2 // 10
                input_ivector = padded_ivector[:, ivector_index, :]
                feat = torch.cat((input_x, input_ivector.repeat((1, input_x.shape[1], 1))), dim=-1)
                nnet_output_temp, _ = model(feat)
                nnet_outputs.append(nnet_output_temp)
            nnet_output = torch.cat(nnet_outputs, dim=1)

this is the result.

exp ddp test cer test wer dev cer dev wer global objf validation objf output-affine
ivector l2 5e-5 8.27 17.12 7.10 15.20 -0.050093 -0.065751 143.3

@csukuangfj
Copy link
Contributor

Could you please open a pullrequest so that we can view the whole picture?

@fanlu
Copy link

fanlu commented Mar 2, 2020

OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants