Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fine Tuning Enhancements #242

Closed
wants to merge 5 commits into from
Closed

Conversation

bmalaski
Copy link

@bmalaski bmalaski commented Jun 5, 2024

Hello,

I adding some tweaks to how the training works. I know you are in the middle of 2.0 so I will most likely need to fix these when that is released. Anyway, for your consideration:

  • Git-Ignore: Implements the default python git-ignore with some changes to ignore the conda env.

  • Learning Rate Scheduler: Currently it is broken, allow the user to change the scheduler and add detailed explanations. Set default scheduler to Cosine Annealing Warm Restarts, which I have been using with great results. I tried to add sensible defaults to all other rates, however I have yet to test all of them. Mostly just using

  • Continue Training: Lost power during training a model and it was annoying to have to manually setup everything again. There actually exists a bug in the original coqui trainer.py that prevented this, and since they wont be making anymore updates I added the training file locally and fixed the bug. The short version is this, when continuing it attempted to reload the config as Xtts, but alltalk is using GPT. This caused a serialization failure. It should be working now.

  • Multiple Training Projects: I am training several voices. So I want to train for some epochs, move to the next and continue as needed. To do this, I have changed the "person name" to be a project name and made out_path dynamic against this name. All files are generated to this project directory. This allows the user to train multiple models, and continue training whatever model they like at the time.

  • Metrics Logging: Pretty graphs, using Matlab, because I have used it before, to save a pretty training image. I extend the base console logger from coqui and use it to track the metric. Mostly just arrays which go into memory. Maybe some systems will limited system memory may have an issue, so I can always limit the step data later if needed. While tensor boards do exist, this seems like a more approachable option and it mirrors the logs.

  • Estimated Completion Time: Weighted Average completion time, where the most recent epochs give a stronger influence over the estimation. Needs 2 epochs completed before it will give an ETA.

  • Limit Shared Memory: I would rather get an out of memory error than have the training use shared memory. To accomplish this, we limited the overall GPU memory to 95%. This should remove the behavior of training spilling into shared memory, however some will still be used by torch.

  • BPE Tokenizer: Allow the user to create a custom tokenizer based off their own data during training. This can lead to better training results in cases with unique words and vocab. I found this especially useful when processing voices that use fictional words.

  • Dataset Progress: Just simple progress tracking and estimated time for dataset creation

Take a look and let me know what you think.

eta

ui

image

bmalaski added 5 commits June 5, 2024 00:25
small tweaks

formatting changes

Update finetune.py

Update metrics_logger.py

use weighted average

Update finetune.py

Create metrics_logger.py

Update .gitignore
@erew123
Copy link
Owner

erew123 commented Jun 6, 2024

Hey @bmalaski

This looks really awesome! Apologies for my very slow reply to you, but I have been cramming hard to get the BETA out. Which it now is AllTalk v2 BETA Download Details & Discussion

I will take a proper look at what you've done and Ill be happy to import it into the V1 build of Finetuning.

Good news/Bad news section. I have updated the V2 build of Finetuning, however its mostly visual, a couple of improvements with file locations/handling etc. So its probably about 90-95% the same code base. This means its more than likely a reasonably easy copy/paste to get this new code over from what you have done here. I'm happy to do it, or if you're very keen and wish to try the BETA, you're welcome to. I'm hoping to take a day or so off from coding and then I will get back and look through/test/import your PR properly!

Thanks so much though. Its great when other people help out and do something like you've done!

I will get back to you shortly!

@bmalaski
Copy link
Author

bmalaski commented Jun 7, 2024

Hey man, yea take some time to relax. I will look at v2 beta and look at creating a new PR for that branch

@erew123
Copy link
Owner

erew123 commented Jun 10, 2024

Hi @bmalaski Ive just managed to sit and properly take a look at this. Its great, really great!

The only thing I did note is that if you set the epochs lower than 2, it never finishes training, which may come back to "Estimated Completion Time" never being able to complete. It does actually train a model, just never gets to saying it finished that step. Not much of an issue though, just something I noted.

I didnt see any complaints about imports. I know matlab but Ive not used it in Python. Im guessing there are no extra imports other than "pip install trainer" required?

Finetune on v2

So if you are willing to give the v2 of finetuning a go, I probably should mention the changes Ive made to that code:

  1. Its now running Gradio 4.xx rather than 3.52, which does help with things that can be done in the interface.
  2. Tided up the Gradio interface generally, which you may find beneficial as it leaves plenty of space on screen for checkboxes etc. And I have moved most of the explainers/guides onto separate tabs.
  3. As the models folder shifted down 1x level in the models folder, I changed the whole process for finding/selecting models as well as the compaction/move model to folder at the end of training.
  4. Finally, I have done a thing with the step 1 whisper model to force a maximum length of wav files. The reason for this is whisper was sometimes pretty bad at splitting up audio and you can get 2+ minute long wav files or when you have short input audio it sometimes just didn't split them down.
  5. Moved to 0.24.1 of the Coqui TTS engine/scripts........ Err yes, the last one Coqui wrote/published was 0.22.0..... But I found someone whom was working on updating the engine/requirements and so I've decided to move up to that and also send in a few PR's there.

I don't think any code you've made here will impact/interfere with anything I've changed or vice versa, so I'm guessing it shouldn't be too much more than a copy/paste job. If you get time to give it a shot..That would be awesome!.

Let me know on the imports requirements for the version you have done and Ill get it imported :)

Thanks so much!

@bmalaski bmalaski closed this Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants