Skip to content

feature #2

@Abdulrahman392011

Description

@Abdulrahman392011

hey guys,

I have been trying to use local language model by putting the laptop motherboard in a custom 3d necklace.
the idea is that i will use whisper and have it with me at all times. however in practice I found that the language model gets confused as to who is said what. whisper transcribe all what it hears from all people without actually saying that those are different people, so the model process one big lump of text and gets confused.

this brings me here, whisper does provide time-frame for when each sentence was said. now I want to use a complementary voiceprint detector that can actually detect the voices of different people and trim the audio into different parts that it can actually label by name of speaker for the language model to be able to understand the context of the conversation better.

I just need you guys to fellow along with me as I build off of what you guys built in case i needed help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions