Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support On-the-fly Features Extraction #145

Merged
merged 14 commits into from
Feb 29, 2024

Conversation

RMSnow
Copy link
Collaborator

@RMSnow RMSnow commented Feb 26, 2024

✨ Description

Support on-the-fly features extraction for the large-scale data preprocessing. Its strengths can be summarized as:

  • Save the disk space of features storage
  • Avoid to be stuck by the features extraction due to some computational platform issues
  • Simplify the preprocessing pipeline and make users focus on training

How to use?

Under the on-the-fly features extraction, the workflow for the future Amphion model is:

  1. Data Preprocess like before
    • For train/val dataset split
    • For getting the metadata file (.json) like before. The utt["Path"] and utt["Duration"] are the two key elements.
    • For getting the medata statistics information like before.
  2. Features Preprocess (No features preprocess any more!)
  3. Training
    • For config file, you need to set preprocess.features_extraction_mode as online
    • Implement your [Task]OnlineDataset and [Model]Trainer
  4. Inference like before

Currently, I have supported DiffWaveNetSVC with on-the-fly features extraction. You can see the two main classes: SVCOnlineDataset and DiffusionTrainer.

👨‍💻 Main Changes

  1. model.base.base_dataset.py:
    • Rename the original BaseDataset and BaseCollator to BaseOfflineDataset and BaseOfflineDataset and BaseOfflineCollator
    • Implement the BaseOnlineDataset and BaseOnlineCollator. The __getitem__ function will get the minimum elements (such as the raw waveform and its duration)
  2. processors.audio_features_extractor.py:
    • In Amphion's latest technical report, we formulate the audio generation tasks into three categories: Text to Waveform, Descriptive Text to Waveform, and Waveform to Waveform. Therefore, we can also implement three kind of features extraction: Text Features, Descriptive Text Features, and Waveform Features.
    • In audio_features_extractor.py, I have integrated the common waveform features extraction operation (such as Mel Spectrogram, F0, Energy, and Semantic Features). Note that I have not integrated some vocoder requiring features. @VocodexElysium
    • I have created text_features_extractor.py and descriptive_text_features_extractor.py for future TTS, TTA, and TTM's refactor/integration/supplement. @HeCheng0625 @lmxue @HarryHe11 @viewfinder-annn
  3. Support for DiffWaveNetSVC
  4. Refactor and improve some codes
    • Such as re-organizing for config folder as Amphion/config/[Task]/[Model].json.

✅ Checklist

  • Code has been reviewed
  • Code complies with the project's code standards and best practices
  • Code has passed all tests
  • Code does not affect the normal use of existing features
  • Code has been commented properly
  • Documentation has been updated (if applicable)
  • Demo/checkpoint has been attached (if applicable)

Copy link
Collaborator Author

@RMSnow RMSnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Self reviewed.

Copy link
Collaborator

@viewfinder-annn viewfinder-annn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, a lot features incoming XD

Copy link
Collaborator

@lmxue lmxue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recipe should be updated to provide instructions for online feature extraction.

config/base.json Show resolved Hide resolved
@RMSnow
Copy link
Collaborator Author

RMSnow commented Feb 29, 2024

The recipe should be updated to provide instructions for online feature extraction.

@lmxue Good advice. I plan to update the recipe in the future. This PR is to prepare a codebase for our recent internal research.

@RMSnow RMSnow requested a review from lmxue February 29, 2024 08:25
Copy link
Collaborator

@lmxue lmxue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on supporting online feature extraction.

@RMSnow RMSnow merged commit b2102dc into open-mmlab:main Feb 29, 2024
1 check passed
ArkhamImp pushed a commit to ArkhamImp/Amphion that referenced this pull request Apr 17, 2024
Support on-the-fly features extraction for the large-scale data preprocessing
ArkhamImp pushed a commit to ArkhamImp/Amphion that referenced this pull request Apr 17, 2024
Support on-the-fly features extraction for the large-scale data preprocessing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants