Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split and process valid set #25

Merged
merged 7 commits into from
Dec 14, 2023
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
add default dataset_types
  • Loading branch information
lmxue committed Dec 14, 2023
commit a8a4e9804dabb51ce240b6560bad05268c45ea41
2 changes: 1 addition & 1 deletion preprocessors/metadata.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen when processing other datasets except for libritts and ljspeech? I suspect that line39 will cause a bug, since there is none valid.json for others.

Besides, is there any corresponding design for the valid and test dataset in trainer?

Copy link
Collaborator Author

@lmxue lmxue Dec 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite metadata.py to automatically adapt different json files.
The valid set is split to distinguish it from the test set and also used to compute validation loss in the trainer.

Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from tqdm import tqdm


def cal_metadata(cfg, dataset_types):
def cal_metadata(cfg, dataset_types=["train", "test"]):
"""
Dump metadata (singers.json, meta_info.json, utt2singer) for singer dataset or multi-datasets.
"""
Expand Down