How to replace the new dataset #41

a897456 · 2024-03-05T09:16:57Z

Hi @v-iashin
If I want to retrain the model with a new dataset, such as LJSpeech, which .py file should I start with?

The text was updated successfully, but these errors were encountered:

v-iashin · 2024-03-05T09:51:00Z

Hi, thank you for your question.

Assuming you are only considering training the first stage (autoencoder), here are a few hints:

Create a dataset module similar to VGGSound in specvqgan/data/vggsound.py.
Adapt the config configs/vggsound_codebook.yaml for your dataset.
Then follow the instructions Training a spectrogram codebook

a897456 · 2024-03-06T08:04:20Z

Hi @v-iashin
Thank you for your reply.
I obtained the mel_spectrum from the LJSpeech dataset through tacotron2 and generated 13100 .npy files, but the spec_len of the mel_spectrum in these files was disorganized, and the .npy file whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files.
What should I do to keep going? Please give me some hints. Thank you.

a897456 · 2024-03-06T12:44:06Z

E:\ProgramData\anaconda3\envs\py39\python.exe C:\Users\User1\Downloads\SpecVQGAN-main\train.py python train.py --base configs/LJSpeech_codebook.yaml -t True --gpus 0,
2024-03-06T20-39-39_LJSpeech_codebook
Global seed set to 23
Running on GPUs 0,
loaded pretrained LPAPS loss from specvqgan/modules/autoencoder/lpaps\vggishish16.pt
VQLPAPSWithDiscriminator running with hinge loss.
E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: UserWarning: ModelCheckpoint(save_last=True, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None).
warnings.warn(*args, **kwargs)
We will not save audio for conditioning and conditioning_rec
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Global seed set to 23
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - ��У��ĵ�ַ��Ч��).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - ��У��ĵ�ַ��Ч��).
accumulate_grad_batches = 1
Setting learning rate to 1.35e-05 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 3 (batchsize) * 4.50e-06 (base_lr)
Project config
model:
base_learning_rate: 4.5e-06
target: specvqgan.models.vqgan.VQModel
params:
embed_dim: 256
n_embed: 1024
ddconfig:
double_z: false
z_channels: 256
resolution: 848
in_channels: 1
out_ch: 1
ch: 128
ch_mult:
- 1
- 1
- 2
- 2
- 4
num_res_blocks: 2
attn_resolutions:
- 53
dropout: 0.0
lossconfig:
target: specvqgan.modules.losses.vqperceptual.VQLPAPSWithDiscriminator
params:
disc_conditional: false
disc_in_channels: 1
disc_start: 30001
disc_weight: 0.8
codebook_weight: 1.0
min_adapt_weight: 1.0
max_adapt_weight: 1.0
perceptual_weight: 1.0
data:
target: train.SpectrogramDataModuleFromConfig
params:
batch_size: 3
num_workers: 8
spec_dir_path: ./data/LJSpeech
sample_rate: 22050
mel_num: 80
spec_len: 860
spec_crop_len: 848
random_crop: false
train:
target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTrain
params:
specs_dataset_cfg: null
validation:
target: specvqgan.data.LJSpeech_data.LJSpeechSpecsValidation
params:
specs_dataset_cfg: null
test:
target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTest
params:
specs_dataset_cfg: null
python: null
train:
py: null

Lightning config
callbacks:
image_logger:
target: train.ImageLogger
params:
for_specs: true
vocoder_cfg:
target: train.VocoderMelGan
params:
ckpt_vocoder: ./vocoder/logs/vggsound/
trainer:
sync_batchnorm: true
distributed_backend: ddp
gpus: 0,

E:\ProgramData\anaconda3\envs\py39\lib\site-packages\omegaconf\basecontainer.py:225: UserWarning: cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)

warnings.warn(

| Name | Type | Params

0 | encoder | Encoder | 29.3 M
1 | decoder | Decoder | 42.4 M
2 | loss | VQLPAPSWithDiscriminator | 17.5 M
3 | quantize | VectorQuantizer | 262 K
4 | quant_conv | Conv2d | 65.8 K
5 | post_quant_conv | Conv2d | 65.8 K

74.9 M Trainable params
14.7 M Non-trainable params
89.6 M Total params
358.463 Total estimated model params size (MB)
Epoch 0: 0%| | 0/151 [00:00<?, ?it/s] E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: RuntimeWarning: You are using LearningRateMonitor callback with models that have no learning rate schedulers. Please see documentation for configure_optimizers method.
warnings.warn(*args, **kwargs)

Hi @v-iashin
Did I succeed?

a897456 · 2024-03-06T12:48:42Z

Hi @v-iashin
I am so sorry, I'm a novice and I may ask a lot of silly questions, please forgive me. thank you.
Can you tell me what do these numbers mean?

v-iashin · 2024-03-06T13:09:33Z

these are the number of times each codebook code was used during previous epoch, e.g. code No.4 was used 48 times. You want these counts to be as uniform as possible (less zeros).

a897456 · 2024-03-06T13:21:58Z

Hi @v-iashin
Thank you for your reply

validating shows 0%, whether this is because I did not add a validation set?
Now epoch=12, there seem to be more and more zeros. What can I do to reduce them?
What is the number of the loss and the epoch in your pre-trained model?

a897456 · 2024-03-08T06:29:16Z

Hi @v-iashin Thank you for your reply. I obtained the mel_spectrum from the LJSpeech dataset through tacotron2 and generated 13100 .npy files, but the spec_len of the mel_spectrum in these files was disorganized, and the .npy file whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files. What should I do to keep going? Please give me some hints. Thank you.

Hi @v-iashin
Can you give me some guidance on this problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to replace the new dataset #41

How to replace the new dataset #41

a897456 commented Mar 5, 2024

v-iashin commented Mar 5, 2024

a897456 commented Mar 6, 2024

a897456 commented Mar 6, 2024

a897456 commented Mar 6, 2024

v-iashin commented Mar 6, 2024

a897456 commented Mar 6, 2024

a897456 commented Mar 8, 2024

How to replace the new dataset #41

How to replace the new dataset #41

Comments

a897456 commented Mar 5, 2024

v-iashin commented Mar 5, 2024

a897456 commented Mar 6, 2024

a897456 commented Mar 6, 2024

| Name | Type | Params

0 | encoder | Encoder | 29.3 M 1 | decoder | Decoder | 42.4 M 2 | loss | VQLPAPSWithDiscriminator | 17.5 M 3 | quantize | VectorQuantizer | 262 K 4 | quant_conv | Conv2d | 65.8 K 5 | post_quant_conv | Conv2d | 65.8 K

a897456 commented Mar 6, 2024

v-iashin commented Mar 6, 2024

a897456 commented Mar 6, 2024

a897456 commented Mar 8, 2024

0 | encoder | Encoder | 29.3 M
1 | decoder | Decoder | 42.4 M
2 | loss | VQLPAPSWithDiscriminator | 17.5 M
3 | quantize | VectorQuantizer | 262 K
4 | quant_conv | Conv2d | 65.8 K
5 | post_quant_conv | Conv2d | 65.8 K