feat(Azure STT): add an option to use the lexical form of the transcription #4350

tarekasishm · 2025-12-21T17:52:29Z

This PR introduces optional support for returning Azure Speech-to-Text results in lexical format.

✨ What’s new

A new flag has been added to STTOptions to control whether the Azure STT plugin returns lexical or normalized text.
The option defaults to false, so the current behavior remains unchanged.
When enabled, the STT plugin will return Azure’s lexical form directly in the transcription result.

🔄 Backward compatibility

This change is fully backward-compatible.
Existing users will see no behavior change unless the new option is explicitly enabled.

🧩 Motivation

Some downstream use cases (e.g. custom NLU pipelines, post-processing, or domain-specific text handling) require access to the raw lexical transcription provided by Azure, rather than the normalized output. This change makes that possible without affecting existing integrations.

⚙️ Usage

The new option is exposed as an additional field in STTOptions.
Default behavior remains identical to the current implementation.
Enabling the option switches the Azure STT response to lexical format.

davidzhao · 2025-12-22T08:00:03Z

thanks for the PR, could you run ruff format . && ruff check --fix . and rebase from main? it'd be great to get all the CI passing

davidzhao · 2025-12-22T08:00:34Z

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py

+                detailed_result = json.loads(evt.result.json)
+                lexical = detailed_result.get("NBest", [{}])[0].get("Lexical", None)
+            except Exception as e:
+                pass


exception should be logged at a minimum

Done! Thank you for the comment

davidzhao · 2025-12-22T08:00:50Z

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py

+                detailed_result = json.loads(evt.result.json)
+                lexical = detailed_result.get("NBest", [{}])[0].get("Lexical", None)
+            except Exception as e:
+                pass


same comment here

theomonnom

thanks

theomonnom · 2025-12-23T20:55:42Z

livekit-plugins/livekit-plugins-azure/livekit/plugins/azure/stt.py

    profanity: NotGivenOr[speechsdk.enums.ProfanityOption] = NOT_GIVEN
    phrase_list: NotGivenOr[list[str] | None] = NOT_GIVEN
    explicit_punctuation: bool = False
+    lexical_output: bool = False


lexical_output isn't exposed since STTOptions is private.

Can you add it to the constructor?
How did you test the PR?

tarekasishm added 2 commits December 21, 2025 17:55

feat: add an option to use the lexical form of the transcription

acad986

Merge branch 'main' into feat/azure-add-lexical-result-option

cb5ddf9

davidzhao reviewed Dec 22, 2025

View reviewed changes

tarekasishm and others added 4 commits December 22, 2025 09:05

chore: fix ruff issues

6f8e31a

chore: format file

4795f50

chore: add expection logs

788c81d

use exc_info

1130ada

theomonnom approved these changes Dec 23, 2025

View reviewed changes

theomonnom reviewed Dec 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(Azure STT): add an option to use the lexical form of the transcription #4350

feat(Azure STT): add an option to use the lexical form of the transcription #4350

tarekasishm commented Dec 21, 2025

Uh oh!

davidzhao commented Dec 22, 2025

Uh oh!

davidzhao Dec 22, 2025

Uh oh!

tarekasishm Dec 22, 2025

Uh oh!

davidzhao Dec 22, 2025

Uh oh!

theomonnom left a comment

Uh oh!

theomonnom Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(Azure STT): add an option to use the lexical form of the transcription #4350

Are you sure you want to change the base?

feat(Azure STT): add an option to use the lexical form of the transcription #4350

Conversation

tarekasishm commented Dec 21, 2025

Uh oh!

davidzhao commented Dec 22, 2025

Uh oh!

davidzhao Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

tarekasishm Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

davidzhao Dec 22, 2025

Choose a reason for hiding this comment

Uh oh!

theomonnom left a comment

Choose a reason for hiding this comment

Uh oh!

theomonnom Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants