Skip to content

Add a documentation page for data quality required for fine-tuning #598

Closed as not planned
@Aml-Hassan-Abd-El-hamid

Description

Self Checks

  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

1. Is this request related to a challenge you're experiencing? Tell me about your story.

I'm trying to fine-tune the model to be able to pronounce Egyptian dialect.

I currently have a number of long videos -between 6 to 8 hours- that contain Egyptian books and the corresponding audio for different people reading those books, I'm cutting those audios into segments on silence and matching the segments to the text from the books, but I'm lacking some information to do so, such as:

  1. How long should the ideal audio/text segments be to get the best results?
  2. Should I keep the audio stereo or should I turn it to the mono channel?
  3. Should I resample those audios or keep their original frequency?
  4. should I delete the audio segments with slight background music or should I keep them?
  5. should I keep the punctuation in the text or should I delete them?
  6. Is there any cleaning for the text or the audio that should be done before fine-tuning?

2. Additional context or comments

No response

3. Can you help us with this feature?

  • I am interested in contributing to this feature.

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions