Add a documentation page for data quality required for fine-tuning #598
Closed as not planned
Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
- [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
- Please do not modify this template :) and fill in all the required fields.
1. Is this request related to a challenge you're experiencing? Tell me about your story.
I'm trying to fine-tune the model to be able to pronounce Egyptian dialect.
I currently have a number of long videos -between 6 to 8 hours- that contain Egyptian books and the corresponding audio for different people reading those books, I'm cutting those audios into segments on silence and matching the segments to the text from the books, but I'm lacking some information to do so, such as:
- How long should the ideal audio/text segments be to get the best results?
- Should I keep the audio stereo or should I turn it to the mono channel?
- Should I resample those audios or keep their original frequency?
- should I delete the audio segments with slight background music or should I keep them?
- should I keep the punctuation in the text or should I delete them?
- Is there any cleaning for the text or the audio that should be done before fine-tuning?
2. Additional context or comments
No response
3. Can you help us with this feature?
- I am interested in contributing to this feature.