Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix:handle empty string transcriptions #150

Merged
merged 3 commits into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 29 additions & 11 deletions ovos_dinkum_listener/service.py
Original file line number Diff line number Diff line change
Expand Up @@ -659,20 +659,38 @@ def _record_end_signal(self):
)
self.bus.emit(Message("recognizer_loop:record_end"))

def _stt_text(self, transcripts: List[Tuple[str, float]],
stt_context: dict):
# Report utterance to intent service
if transcripts:
utts = [u[0] for u in transcripts] # filter confidence
def __normtranscripts(self, transcripts: List[Tuple[str, float]]) -> List[str]:
# unfortunately common enough when using whisper to deserve a setting
# mainly happens on silent audio, not as a mistranscription
default_hallucinations = [
"thanks for watching!",
'thank you for watching!',
"so",
"beep!"
# "Thank you" # this one can also be valid!!
]
hallucinations = self.config.get("hallucination_list", default_hallucinations) \
if self.config.get("filter_hallucinations", True) else []
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j1nx @goldyfruit @builderjer what do you think about this? is it a good thing to enable by default or should i make it False unless changed by user?

the list above was made from just saying wake word and not asking anything afterwards, i sometimes also get a "please subscribe" but far less common

Copy link
Member

@goldyfruit goldyfruit Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes IMO this should be enabled by default.
Maybe adding "Did you say something?" or "Not sure I heard you" could be nice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the filtering of hallucinations

a bus message is emitted that a skill could listen for an speak those notifications if desired "recognizer_loop:speech.recognition.unknown"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh! Yeah works for me too.

utts = [u[0].lstrip(" \"'").strip(" \"'") for u in transcripts]
filtered_hutts = [u for u in utts if u and u.lower() not in hallucinations]
hutts = [u for u in utts if u and u not in filtered_hutts]
if hutts:
LOG.debug(f"Filtered hallucinations: {hutts}")
return filtered_hutts

def _stt_text(self, transcripts: List[Tuple[str, float]], stt_context: dict):
utts = self.__normtranscripts(transcripts)
LOG.debug(f"STT: {utts}")
if utts:
lang = stt_context.get("lang") or Configuration().get("lang", "en-us")
LOG.debug(f"STT: {utts}")
payload = {"utterances": utts,
"lang": lang}
payload = {"utterances": utts, "lang": lang}
self.bus.emit(Message("recognizer_loop:utterance", payload, stt_context))
elif self.voice_loop.listen_mode == ListeningMode.CONTINUOUS:
LOG.debug("ignoring transcription failure")
else:
self.bus.emit(Message("recognizer_loop:speech.recognition.unknown", context=stt_context))
if self.voice_loop.listen_mode != ListeningMode.CONTINUOUS:
LOG.error("Empty transcription, either recorded silence or STT failed!")
self.bus.emit(Message("recognizer_loop:speech.recognition.unknown", context=stt_context))
else:
LOG.debug("Ignoring empty transcription in continuous listening mode")

def _save_stt(self, audio_bytes, stt_meta, save_path=None):
LOG.info("Saving Utterance Recording")
Expand Down
6 changes: 2 additions & 4 deletions ovos_dinkum_listener/voice_loop/voice_loop.py
Original file line number Diff line number Diff line change
Expand Up @@ -781,12 +781,10 @@ def _after_cmd(self, chunk: bytes):
self._vad_remove_silence()

utts, stt_context = self._get_tx(stt_context)

LOG.info(f"Raw transcription: {utts}")
if utts:
LOG.debug(f"transformers metadata: {stt_context}")
LOG.info(f"transcribed: {utts}")
else:
LOG.info("nothing transcribed")

JarbasAl marked this conversation as resolved.
Show resolved Hide resolved
# Voice command has finished recording
if self.stt_audio_callback is not None:
self.stt_audio_callback(self.stt_audio_bytes, stt_context)
Expand Down
Loading