Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to replace the new dataset #41

Open
a897456 opened this issue Mar 5, 2024 · 7 comments
Open

How to replace the new dataset #41

a897456 opened this issue Mar 5, 2024 · 7 comments

Comments

@a897456
Copy link

a897456 commented Mar 5, 2024

Hi @v-iashin
If I want to retrain the model with a new dataset, such as LJSpeech, which .py file should I start with?

@v-iashin
Copy link
Owner

v-iashin commented Mar 5, 2024

Hi, thank you for your question.

Assuming you are only considering training the first stage (autoencoder), here are a few hints:

  1. Create a dataset module similar to VGGSound in specvqgan/data/vggsound.py.
  2. Adapt the config configs/vggsound_codebook.yaml for your dataset.
  3. Then follow the instructions Training a spectrogram codebook

@a897456
Copy link
Author

a897456 commented Mar 6, 2024

Hi @v-iashin
Thank you for your reply.
I obtained the mel_spectrum from the LJSpeech dataset through tacotron2 and generated 13100 .npy files, but the spec_len of the mel_spectrum in these files was disorganized, and the .npy file whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files.
What should I do to keep going? Please give me some hints. Thank you.

@a897456
Copy link
Author

a897456 commented Mar 6, 2024

E:\ProgramData\anaconda3\envs\py39\python.exe C:\Users\User1\Downloads\SpecVQGAN-main\train.py python train.py --base configs/LJSpeech_codebook.yaml -t True --gpus 0,
2024-03-06T20-39-39_LJSpeech_codebook
Global seed set to 23
Running on GPUs 0,
loaded pretrained LPAPS loss from specvqgan/modules/autoencoder/lpaps\vggishish16.pt
VQLPAPSWithDiscriminator running with hinge loss.
E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: UserWarning: ModelCheckpoint(save_last=True, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None).
warnings.warn(*args, **kwargs)
We will not save audio for conditioning and conditioning_rec
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Global seed set to 23
initializing ddp: GLOBAL_RANK: 0, MEMBER: 1/1
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - �����������У�������ĵ�ַ��Ч��).
[W C:\actions-runner_work\pytorch\pytorch\builder\windows\pytorch\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [localhost.sangfor.com.cn]:64851 (system error: 10049 - �����������У�������ĵ�ַ��Ч��).
accumulate_grad_batches = 1
Setting learning rate to 1.35e-05 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 3 (batchsize) * 4.50e-06 (base_lr)
Project config
model:
base_learning_rate: 4.5e-06
target: specvqgan.models.vqgan.VQModel
params:
embed_dim: 256
n_embed: 1024
ddconfig:
double_z: false
z_channels: 256
resolution: 848
in_channels: 1
out_ch: 1
ch: 128
ch_mult:
- 1
- 1
- 2
- 2
- 4
num_res_blocks: 2
attn_resolutions:
- 53
dropout: 0.0
lossconfig:
target: specvqgan.modules.losses.vqperceptual.VQLPAPSWithDiscriminator
params:
disc_conditional: false
disc_in_channels: 1
disc_start: 30001
disc_weight: 0.8
codebook_weight: 1.0
min_adapt_weight: 1.0
max_adapt_weight: 1.0
perceptual_weight: 1.0
data:
target: train.SpectrogramDataModuleFromConfig
params:
batch_size: 3
num_workers: 8
spec_dir_path: ./data/LJSpeech
sample_rate: 22050
mel_num: 80
spec_len: 860
spec_crop_len: 848
random_crop: false
train:
target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTrain
params:
specs_dataset_cfg: null
validation:
target: specvqgan.data.LJSpeech_data.LJSpeechSpecsValidation
params:
specs_dataset_cfg: null
test:
target: specvqgan.data.LJSpeech_data.LJSpeechSpecsTest
params:
specs_dataset_cfg: null
python: null
train:
py: null

Lightning config
callbacks:
image_logger:
target: train.ImageLogger
params:
for_specs: true
vocoder_cfg:
target: train.VocoderMelGan
params:
ckpt_vocoder: ./vocoder/logs/vggsound/
trainer:
sync_batchnorm: true
distributed_backend: ddp
gpus: 0,

E:\ProgramData\anaconda3\envs\py39\lib\site-packages\omegaconf\basecontainer.py:225: UserWarning: cfg.pretty() is deprecated and will be removed in a future version.
Use OmegaConf.to_yaml(cfg)

warnings.warn(

| Name | Type | Params

0 | encoder | Encoder | 29.3 M
1 | decoder | Decoder | 42.4 M
2 | loss | VQLPAPSWithDiscriminator | 17.5 M
3 | quantize | VectorQuantizer | 262 K
4 | quant_conv | Conv2d | 65.8 K
5 | post_quant_conv | Conv2d | 65.8 K

74.9 M Trainable params
14.7 M Non-trainable params
89.6 M Total params
358.463 Total estimated model params size (MB)
Epoch 0: 0%| | 0/151 [00:00<?, ?it/s] E:\ProgramData\anaconda3\envs\py39\lib\site-packages\pytorch_lightning\utilities\distributed.py:68: RuntimeWarning: You are using LearningRateMonitor callback with models that have no learning rate schedulers. Please see documentation for configure_optimizers method.
warnings.warn(*args, **kwargs)

Hi @v-iashin
Did I succeed?

@a897456
Copy link
Author

a897456 commented Mar 6, 2024

image

Hi @v-iashin
I am so sorry, I'm a novice and I may ask a lot of silly questions, please forgive me. thank you.
Can you tell me what do these numbers mean?

@v-iashin
Copy link
Owner

v-iashin commented Mar 6, 2024

these are the number of times each codebook code was used during previous epoch, e.g. code No.4 was used 48 times. You want these counts to be as uniform as possible (less zeros).

@a897456
Copy link
Author

a897456 commented Mar 6, 2024

Hi @v-iashin
Thank you for your reply

  1. validating shows 0%, whether this is because I did not add a validation set?
  2. Now epoch=12, there seem to be more and more zeros. What can I do to reduce them?
  3. What is the number of the loss and the epoch in your pre-trained model?
    image

@a897456
Copy link
Author

a897456 commented Mar 8, 2024

Hi @v-iashin Thank you for your reply. I obtained the mel_spectrum from the LJSpeech dataset through tacotron2 and generated 13100 .npy files, but the spec_len of the mel_spectrum in these files was disorganized, and the .npy file whose spec_len was larger than 848 (the number required in your code) was only 1/3, about 470 files. What should I do to keep going? Please give me some hints. Thank you.

Hi @v-iashin
Can you give me some guidance on this problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants