Support On-the-fly Features Extraction #145

RMSnow · 2024-02-26T11:32:28Z

✨ Description

Support on-the-fly features extraction for the large-scale data preprocessing. Its strengths can be summarized as:

Save the disk space of features storage
Avoid to be stuck by the features extraction due to some computational platform issues
Simplify the preprocessing pipeline and make users focus on training

How to use?

Under the on-the-fly features extraction, the workflow for the future Amphion model is:

Data Preprocess like before
- For train/val dataset split
- For getting the metadata file (.json) like before. The utt["Path"] and utt["Duration"] are the two key elements.
- For getting the medata statistics information like before.
~~Features Preprocess~~ (No features preprocess any more!)
Training
- For config file, you need to set preprocess.features_extraction_mode as online
- Implement your [Task]OnlineDataset and [Model]Trainer
Inference like before

Currently, I have supported DiffWaveNetSVC with on-the-fly features extraction. You can see the two main classes: SVCOnlineDataset and DiffusionTrainer.

👨‍💻 Main Changes

model.base.base_dataset.py:
- Rename the original BaseDataset and BaseCollator to BaseOfflineDataset and BaseOfflineDataset and BaseOfflineCollator
- Implement the BaseOnlineDataset and BaseOnlineCollator. The __getitem__ function will get the minimum elements (such as the raw waveform and its duration)
processors.audio_features_extractor.py:
- In Amphion's latest technical report, we formulate the audio generation tasks into three categories: Text to Waveform, Descriptive Text to Waveform, and Waveform to Waveform. Therefore, we can also implement three kind of features extraction: Text Features, Descriptive Text Features, and Waveform Features.
- In audio_features_extractor.py, I have integrated the common waveform features extraction operation (such as Mel Spectrogram, F0, Energy, and Semantic Features). Note that I have not integrated some vocoder requiring features. @VocodexElysium
- I have created text_features_extractor.py and descriptive_text_features_extractor.py for future TTS, TTA, and TTM's refactor/integration/supplement. @HeCheng0625 @lmxue @HarryHe11 @viewfinder-annn
Support for DiffWaveNetSVC
- two main classes: SVCOnlineDataset and DiffusionTrainer.
Refactor and improve some codes
- Such as re-organizing for config folder as Amphion/config/[Task]/[Model].json.

✅ Checklist

Code has been reviewed
Code complies with the project's code standards and best practices
Code has passed all tests
Code does not affect the normal use of existing features
Code has been commented properly
Documentation has been updated (if applicable)
Demo/checkpoint has been attached (if applicable)

RMSnow

Self reviewed.

viewfinder-annn

lgtm, a lot features incoming XD

lmxue

The recipe should be updated to provide instructions for online feature extraction.

config/base.json

RMSnow · 2024-02-29T08:24:23Z

The recipe should be updated to provide instructions for online feature extraction.

@lmxue Good advice. I plan to update the recipe in the future. This PR is to prepare a codebase for our recent internal research.

lmxue

Nice work on supporting online feature extraction.

Support on-the-fly features extraction for the large-scale data preprocessing

RMSnow added 8 commits February 24, 2024 16:00

on-the-fly features extraction: plan

dd43848

audio features extractor for on-the-fly extraction

3e89e89

svc dataset for on-the-fly feature extraction

e6efd85

svc trainer for online extraction

0b3506b

f0: config for interpolate and return uv

a53398c

debug for training

ac4932a

Merge branch 'main' into online_feature_extractor

65364dc

complete the comments

e6e940b

RMSnow commented Feb 26, 2024

View reviewed changes

RMSnow requested review from viewfinder-annn, lmxue, Adorable-Qin, HarryHe11, HeCheng0625 and VocodexElysium February 26, 2024 11:41

RMSnow added 2 commits February 28, 2024 14:20

mel min-max normalization debug

66039f1

fix bugs for singer id

21548cc

viewfinder-annn approved these changes Feb 29, 2024

View reviewed changes

RMSnow and others added 2 commits February 29, 2024 14:24

vctk's channel mel extrema to conduct normalization

81b2087

Update comments

57ac6a9

lmxue requested changes Feb 29, 2024

View reviewed changes

config/base.json Show resolved Hide resolved

RMSnow requested a review from lmxue February 29, 2024 08:25

lmxue approved these changes Feb 29, 2024

View reviewed changes

RMSnow added 2 commits February 29, 2024 17:08

clean some codes

b6a5723

Fix bugs for SVC inference

004aab0

RMSnow merged commit b2102dc into open-mmlab:main Feb 29, 2024

ArkhamImp pushed a commit to ArkhamImp/Amphion that referenced this pull request Apr 17, 2024

Support On-the-fly Features Extraction (open-mmlab#145)

2b86827

Support on-the-fly features extraction for the large-scale data preprocessing

ArkhamImp pushed a commit to ArkhamImp/Amphion that referenced this pull request Apr 17, 2024

Support On-the-fly Features Extraction (open-mmlab#145)

d4dc30b

Support on-the-fly features extraction for the large-scale data preprocessing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support On-the-fly Features Extraction #145

Support On-the-fly Features Extraction #145

Uh oh!

RMSnow commented Feb 26, 2024 •

edited

Loading

Uh oh!

RMSnow left a comment

Uh oh!

viewfinder-annn left a comment

Uh oh!

lmxue left a comment

Uh oh!

Uh oh!

RMSnow commented Feb 29, 2024 •

edited

Loading

Uh oh!

lmxue left a comment

Uh oh!

Uh oh!

Support On-the-fly Features Extraction #145

Support On-the-fly Features Extraction #145

Uh oh!

Conversation

RMSnow commented Feb 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

How to use?

👨‍💻 Main Changes

✅ Checklist

Uh oh!

RMSnow left a comment

Choose a reason for hiding this comment

Uh oh!

viewfinder-annn left a comment

Choose a reason for hiding this comment

Uh oh!

lmxue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RMSnow commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lmxue left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

RMSnow commented Feb 26, 2024 •

edited

Loading

RMSnow commented Feb 29, 2024 •

edited

Loading