Skip to content

Conversation

@tarekasishm
Copy link

This PR introduces optional support for returning Azure Speech-to-Text results in lexical format.

What’s new

  • A new flag has been added to STTOptions to control whether the Azure STT plugin returns lexical or normalized text.
  • The option defaults to false, so the current behavior remains unchanged.
  • When enabled, the STT plugin will return Azure’s lexical form directly in the transcription result.

🔄 Backward compatibility

  • This change is fully backward-compatible.
  • Existing users will see no behavior change unless the new option is explicitly enabled.

🧩 Motivation

Some downstream use cases (e.g. custom NLU pipelines, post-processing, or domain-specific text handling) require access to the raw lexical transcription provided by Azure, rather than the normalized output. This change makes that possible without affecting existing integrations.

⚙️ Usage

  • The new option is exposed as an additional field in STTOptions.
  • Default behavior remains identical to the current implementation.
  • Enabling the option switches the Azure STT response to lexical format.

@davidzhao
Copy link
Member

thanks for the PR, could you run ruff format . && ruff check --fix . and rebase from main? it'd be great to get all the CI passing

detailed_result = json.loads(evt.result.json)
lexical = detailed_result.get("NBest", [{}])[0].get("Lexical", None)
except Exception as e:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exception should be logged at a minimum

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Thank you for the comment

detailed_result = json.loads(evt.result.json)
lexical = detailed_result.get("NBest", [{}])[0].get("Lexical", None)
except Exception as e:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment here

Copy link
Member

@theomonnom theomonnom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks

profanity: NotGivenOr[speechsdk.enums.ProfanityOption] = NOT_GIVEN
phrase_list: NotGivenOr[list[str] | None] = NOT_GIVEN
explicit_punctuation: bool = False
lexical_output: bool = False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lexical_output isn't exposed since STTOptions is private.

Can you add it to the constructor?
How did you test the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants