-
-
Notifications
You must be signed in to change notification settings - Fork 10
Description
hey guys,
I have been trying to use local language model by putting the laptop motherboard in a custom 3d necklace.
the idea is that i will use whisper and have it with me at all times. however in practice I found that the language model gets confused as to who is said what. whisper transcribe all what it hears from all people without actually saying that those are different people, so the model process one big lump of text and gets confused.
this brings me here, whisper does provide time-frame for when each sentence was said. now I want to use a complementary voiceprint detector that can actually detect the voices of different people and trim the audio into different parts that it can actually label by name of speaker for the language model to be able to understand the context of the conversation better.
I just need you guys to fellow along with me as I build off of what you guys built in case i needed help.