[Feature] Add support for voice-based gender detection for grammatically gendered languages

## Summary

Add support for plugging in voice-based gender detection models into the LiveKit Agents pipeline. This would allow agents to infer speaker gender from audio input in real-time.

## Motivation

Voice-based gender detection is essential for **languages with grammatical gender** (gender inflection/agreement). Many languages require verbs, adjectives, and participles to agree with the speaker's gender:

- **Polish**: "Zrobiłem" (male) vs "Zrobiłam" (female) - "I did"
- **Russian**: "Я сказал" (male) vs "Я сказала" (female) - "I said"
- **German**: "Ich bin gegangen" vs adjective endings based on gender
- **French**: "Je suis allé" (male) vs "Je suis allée" (female) - "I went"
- **Spanish**: "Estoy cansado" (male) vs "Estoy cansada" (female) - "I am tired"
- **Italian**, **Portuguese**, **Hebrew**, **Arabic**, and many others

For voice AI agents operating in these languages, using incorrect gender forms sounds unnatural and can be confusing or even disrespectful to users. Currently, agents have no way to detect the caller's gender from voice to generate grammatically correct responses.

## Proposed Solution

### Option 1: Built-in Audio Classification Node

Add a new optional `audio_classification_node` in the pipeline that runs in parallel with STT:

```python
async def audio_classification_node(
    self, audio: AsyncIterable[rtc.AudioFrame], model_settings: ModelSettings
) -> Optional[AsyncIterable[GenderClassificationEvent]]:
    # Returns gender classification events
    ...
```

### Option 2: Plugin System for Gender Classifiers

Create a plugin interface similar to STT/TTS plugins:

```python
from livekit.plugins import gender_classifier

class VoiceGenderClassifier(gender_classifier.GenderClassifier):
    async def classify(self, audio: AsyncIterable[rtc.AudioFrame]) -> GenderResult:
        ...
```

Potential model integrations:
- **Pyannote Audio** - Speaker diarization with gender inference
- **SpeechBrain** - Open-source speech models
- **Resemblyzer** - Speaker embeddings
- **Custom TensorFlow/PyTorch models**

### Option 3: Enhanced STT with Gender Metadata

Enhance the STT interface to optionally return gender metadata alongside transcription:

```python
class SpeechEvent:
    text: str
    speaker_gender: Optional[str]  # "male", "female", "unknown"
    gender_confidence: Optional[float]
```

## Acceptance Criteria

- [ ] Ability to plug in custom gender detection models
- [ ] Access to detected gender within agent logic (for prompt construction, TTS voice selection)
- [ ] Support for real-time streaming classification
- [ ] Documentation and example implementation

## Alternatives Considered

1. **Ask the user directly**: Works but creates friction and feels unnatural in voice conversations
2. **Use neutral forms where possible**: Not always grammatically correct or natural in gendered languages
3. **Custom STT node override**: Currently possible but requires users to implement their own audio buffering and model integration

## Additional Context

This feature would enable LiveKit Agents to properly serve users in languages with grammatical gender, which represents a significant portion of the world's languages and speakers.

Related documentation:
- [Pipeline nodes and hooks](https://docs.livekit.io/agents/logic/nodes/)
- [Speech & audio](https://docs.livekit.io/agents/multimodality/audio/)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add support for voice-based gender detection for grammatically gendered languages #4640

Summary

Motivation

Proposed Solution

Option 1: Built-in Audio Classification Node

Option 2: Plugin System for Gender Classifiers

Option 3: Enhanced STT with Gender Metadata

Acceptance Criteria

Alternatives Considered

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Add support for voice-based gender detection for grammatically gendered languages #4640

Description

Summary

Motivation

Proposed Solution

Option 1: Built-in Audio Classification Node

Option 2: Plugin System for Gender Classifiers

Option 3: Enhanced STT with Gender Metadata

Acceptance Criteria

Alternatives Considered

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions