Skip to content

Commit

Permalink
Use UTF-8 encoding to save the txt and vtt files (openai#37)
Browse files Browse the repository at this point in the history
Explicitly set the text encoding to UTF-8 in order to avoid UnicodeEncodeErrors

Co-authored-by: Jong Wook Kim <jongwook@nyu.edu>
  • Loading branch information
hanacchi and jongwook authored Sep 23, 2022
1 parent 759e8d4 commit c85eaaa
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions whisper/transcribe.py
Original file line number Diff line number Diff line change
Expand Up @@ -289,11 +289,11 @@ def cli():
audio_basename = os.path.basename(audio_path)

# save TXT
with open(os.path.join(output_dir, audio_basename + ".txt"), "w") as txt:
with open(os.path.join(output_dir, audio_basename + ".txt"), "w", encoding="utf-8") as txt:
print(result["text"], file=txt)

# save VTT
with open(os.path.join(output_dir, audio_basename + ".vtt"), "w") as vtt:
with open(os.path.join(output_dir, audio_basename + ".vtt"), "w", encoding="utf-8") as vtt:
write_vtt(result["segments"], file=vtt)


Expand Down

0 comments on commit c85eaaa

Please sign in to comment.