Releases · mkiol/dsnote

17 Aug 08:47

mkiol

v4.6.1

f062ac5

Speech Note 4.6.1 Latest

Latest

Linux Desktop

Changes:

General
- Fix: The application failed to start when the processor did not support the required CPU extension.
User Interface
- Swedish translation has been updated.
Accessibility
- Fix: Special keyboard keys were not supported as a keyboard shortcut. Examples: 'Favorites', 'Launch Mail', 'Refresh', 'Home Page', 'Calculator' and many more...
Translator
- New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
- Updated models: English to Hungarian, Czech to English, Greek to English

Sailfish OS

Changes:

User Interface
- Swedish translation has been updated.
Translator
- New models: English to Latvian, English to Danish, English to Croatian, English to Slovenian, Indonesian to English, Romanian to English
- Updated models: English to Hungarian, Czech to English, Greek to English

Assets 3

03 Aug 13:05

mkiol

v4.6.0

09afd78

Speech Note 4.6.0

Linux Desktop

Changes:

User Interface
- Speech Note has been translated into Norwegian language.
- Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
Speech to Text
- The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
- Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
- Separate settings for engines. The configuration of each engine has been separated in the settings. You can separately set the parameters for WhisperCpp and FasterWhisper. The new configuration parameters that have been added to the settings are: Number of simultaneous threads, Beam search width, Audio context size, Use Flash Attention.
- Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
- Support for OpenVINO hardware acceleration in WhisperCpp engine. With OpenVINO decoding on CPU is much quicker. If you are not using GPU acceleration, it is recommended to enable OpenVINO in WhisperCpp engine settings. Currently, OpenVINO is enabled only for CPU acceleration.
- Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
Text to Speech
- Control tags for advance TTS processing. Control tags allow you to dynamically change the speed of synthesized text or add silence between sentences. To use control tags, insert {speed: 0.5} or {silence: 1s} into the text. For convenience, you can also insert predefined control tags using text context menu Insert control tag.
- Welsh language. New language is enabled with Piper voice.
- New Piper voices for Spanish, Italian and English
- New RHVoice voices for Slovak and Croatian
Translator
- Improved Translator UI. The Translate, Switch languages and Add buttons have been placed between text areas which is more convenient.
- Support for older hardware. Until now, the translator did not work on older processors without CPU AVX extension. Now there is no such restriction anymore.
- New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
- Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English
Flatpak
- New library: OpenVINO version 2024.1.0.15008
- whisper.cpp update to version 1.6.2
- CTranslate2 update to version 4.3.1

Video presentation of all new features: https://www.youtube.com/watch?v=AVW5OY63wjg

Sailfish OS

Changes:

User Interface
- Speech Note has been translated into Norwegian language.
- Grouped models. Models that provide multiple sub-models (for example, TTS models that provide different voices) are shown in groups. This makes it easier to find models in the model browser.
- Option to enable/disable support for subtitles. Subtitle support is a niche functionality. To simplify the user interface, the subtitle options is not visible by default. To enable them, use the Subtitles support option in the settings.
Speech to Text
- The name of the all Whisper models has been changed to WhisperCpp to better reflect the engine behind them.
- Automatic language detection in STT. To automatically detect the language during STT, select one of the models that is in the Auto detected category in the language list.
- Quicker decoding with WhisperCpp. Optimization for short sentences has been added to WhisperCpp. With it, the speed of STT has doubled!
- Translate to English option for WhisperCpp models. When enabled, speech is automatically translated into English.
- Option for inserting processing statistics. New settings option allows inserting processing related information to the text after decoding, such as processing time and audio length. This can be useful for comparing the performance of different models, engines and their parameters.
Text to Speech
- Welsh language. New language is enabled with Piper voice.
- New Piper voices for Spanish, Italian and English
- New RHVoice voices for Slovak and Croatian
Translator
- New button for switching languages.
- New models: English to Lithuanian, Croatian to English, Latvian to English, Danish to English, Serbian to English, Slovak to English, Bosnian to English, Vietnamese to English
- Updated models: Lithuanian to English, Slovenian to English, Russian to English, Ukrainian to English

Assets 3

18 May 16:49

mkiol

v4.5.0

40c322f

Speech Note 4.5.0

Linux Desktop

Changes:

User Interface
- Import subtitles embedded into video file. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad
- Support for more subtitles formats. You can import and export subtitles in SRT, WebVTT and ASS formats.
- Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified menu bar option.
- Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
- Settings option for default action when importing note from a file. You can set Ask whether to add or replace, Add to an existing note or Replace an existing note.
- Enhanced text editor font settings. You can set the font family, style and size of the font used in the text editor.
- Text to Text repair options. With these options you can directly fix diacritical marks and punctuation in the text.
- Text context menu with additional options: Read selection and Translate selection. To activate context menu use mouse right click.
- New text appending style: After empty line
- System tray menu for changing active STT/TTS model
- User friendly names of audio input devices
- Simplified model filtering. It is now less flexible, but much easier to understand and use.
- Speech Note has been translated into Ukrainian and Russian languages.
- Fix: Cancellation was blocking the user interface.
Speech to Text
- Updated Distil model for English: Distil Large-v3. New model is enabled for Whisper and Faster Whisper engines.
- New Fine-Tuned Whisper models for Slovenian and Polish
- Fix: Punctuation model could not be downloaded.
Text to Speech
- WhisperSpeech engine that generates voice with exceptional naturalness. The new engine comes with models for English and Polish languages. All models support voice cloning.
- New voice cloning model for Vietnamese: viXTTS. Model is a fine-tuned version of the phenomenal Coqui XTTS.
- New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
- New RHVoice voice for Czech
- Settings option to enable/disable speech synchronization with subtitle timestamps. This may be useful for creating voice overs.
- Mixing speech with audio from an existing file. When exporting to a file, you can overlay speech with audio from an existing media file. This can be useful when creating voice overs from subtitles.
- Context menu option to read from cursor position or read only selected text. To activate context menu use mouse right click.
- Speech audio is always normalized after TTS processing.
- Fix: Mimic3 models could not be downloaded.
Translator
- New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
- Updated models: Czech and Lithuanian
- Handy buttons to quickly add translated text to the note or to replace it and switch languages
- Context menu option to translate from cursor position or translate only selected text. To activate context menu use mouse right click.
Accessibility
- New Actions for STT/TTS models switching: switch-to-next-stt-model, switch-to-prev-stt-model, switch-to-next-tts-model, switch-to-prev-tts-model, set-stt-model, set-tts-model
- New global keyboard shortcuts for STT/TTS models switching (X11 only): Switch to next STT model, Switch to prev STT model, Switch to next TTS model, Switch to prev TTS model
- Toggle option for keyboard shortcuts (X11 only). When Toggle behavior is enabled, Start listening/reading shortcuts will also stop listening/reading if they are triggered while listening/reading is active.
- Fix: Accented characters (e.g.: ã, ê) were not transferred correctly to the active window.
Flatpak
- Flatpak runtime update to version 5.15-23.08
- AMD ROCm update to version 5.7.3
- PyTorch update to version 2.2.1
- CTranslate2 update to version 4.2.1
- Faster-Whisper update to version 1.0.2

A video demonstration of all the changes in 4.5.0: https://www.youtube.com/watch?v=S9MJ7y8-bcw

Sailfish OS

Changes:

User Interface
- Import subtitles in many formats and subtitles embedded into video file. You can import and export subtitles in SRT, WebVTT and ASS formats. If your video file contains one or many subtitle streams, you can import the selected subtitles into notepad.
- Unified file importing and exporting. Text, subtitles, audio and video files can be imported or exported using unified pull-down menu option.
- Settings option to enable/disable remembering the last note. If the option is disabled, the last note will not be available after restarting the app.
- Settings option for default action when importing note from a file. You can set Ask whether to add or replace, Add to an existing note or Replace an existing note.
- New text appending style: After empty line
- Speech Note has been translated into Ukrainian and Russian languages.
- Fix: Cancellation was blocking the user interface.
Speech to Text
- Subtitles support in STT. To generate timestamped text in SRT format, change the text format to SRT Subtitles using the button at the bottom of the text area. Check the settings to find more subtitle options.
Text to Speech
- Speech synchronized with subtitle timestamps in TTS. When the text format is set to SRT Subtitles, the generated speech will be synchronized with the subtitle timestamps. This can be useful if you want to make voice over.
- New Piper voices for English, Persian, Slovenian, Turkish, French and Spanish
- New RHVoice voice for Czech
- Settings option to enable/disable speech synchronization with subtitle timestamps.
- Speech audio is always normalized after TTS processing.
Translator
- New models: Greek to English, Maltese to English, Slovenian to English, Turkish to English, English to Catalan
- Updated models: Czech and Lithuanian

Assets 3

26 Jan 09:54

mkiol

v4.4.0

0aeee1a

Speech Note 4.4.0

Linux Desktop

Changes:

Flatpak
- Modular Flatpak package (Base package and Add-ons)
- NVIDIA CUDA runtime update to version 12.2
- AMD ROCm runtime update to version 5.6
- PyTorch update to version 2.1.1
User Interface
- Improvements to the model browser
- Model filtering options
- Setting option to minimize to the system tray
- Setting option to enable/disable text in desktop notifications
Speech to Text
- Marathi language. New language is enabled with Whisper and Faster Whisper models.
- New version of Faster Whisper Large model: 'FasterWhisper Large-v3'
- 'Distil' versions of Faster Whisper models
- Whisper and Faster Whisper enabled for Chinese-Cantonese language
- Support for Speex audio codec in 'Transcribe a file'
- Translate to English option for Whisper and Faster Whisper models
- More effective GPU acceleration for Whisper models with AMD graphics cards
- Subtitles generation (SRT format)
- Support for multiple audio streams in a video file
Text to Speech
- Marathi language. New language is enabled with Coqui MMS model.
- Voice cloning with Coqui XTTS and YourTTS models.
  - Coqui XTTS models are enabled for: Arabic, Brazilian Portuguese, Chinese, Czech, Dutch, English, French, German, Hungarian, Italian, Japanese, Korean, Polish, Russian, Spanish and Turkish.
  - YourTTS model is enabled for: English, French and Brazilian Portuguese.
- Voice samples creator
- New voices for Serbian and Uzbek languages (RHVoice model)
- GPU acceleration for Coqui models with AMD graphics cards (in Flatpak version)
- Speech synchronized with subtitle timestamps
Translator
- New model: Lithuanian to English
- Option to force text cleaning before translation
- Text formatting support
- Translation progress indicator
Other
- Setting option to override GPU version (AMD graphics cards)
- Setting option to limit number of simultaneous CPU threads
- Setting option to set Python libraries directory (in non-Flatpak version)

Sailfish OS

Speech to Text
- Marathi language. New language is enabled with Whisper models.
- Whisper enabled for Chinese-Cantonese language
- Support for Speex audio codec in 'Transcribe a file'
- Support for multiple audio streams in a video file
Text to Speech
- New voices for Serbian and Uzbek languages (RHVoice model)
Translator
- New model: Lithuanian to English
- Translation progress indicator

Assets 3

13 Nov 08:53

mkiol

v4.3.0

d883c54

Speech Note 4.3.0

Linux Desktop

Changes:

Accessibility
- Global keyboard shortcuts (X11 only)
- Support for Actions
User Interface
- Desktop notifications
- Speech speed control in the main app window
- Opening files with Drag and Drop gesture
- Fix: Application did not use native widgets on some platforms
Translator
- New model: English to Hungarian
Speech to Text
- New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
- New engine: Faster Whisper
- New engine: April-ASR. Models for: English, French and Polish.
- Inserting text to any active window (X11 only)
- Copy decoded text directly to the clipboard
- Stop listening button
- Support for Opus audio codec in Transcribe a file
- More effective GPU acceleration for Whisper models (NVIDIA CUDA only)
- New smaller and quicker Whisper models for English: Distil-Whisper
- New version of Whisper Large model: Whisper Large-v3
- Fix: CUDA acceleration for Whisper models did not work on NVIDIA video cards with Maxwell architecture
Text to Speech
- New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
- New engine: Mimic 3
- Reading text from the clipboard
- New Piper voices: Arabic, English, Hungarian, Polish, Czech, German, Ukrainian, Vietnamese, Serbian, French, Spanish, Nepali
- More steps in Speech speed option
- Diacritical marks restoration before speech synthesis for Arabic and Hebrew
- Support for GPU acceleration for Coqui models (NVIDIA CUDA only)
- Fix: Coqui Chinese MMS Hakka and MinNan voices were broken
- Fix: Exporting to audio file was not possible when text was very long
Other
- Setting option to disable support for certain graphic cards
- Setting option Clear cache on close
- Cache compression (Opus format instead of raw audio)
- Detecting the availability of the optional features

Sailfish OS

Changes:

Translator
- New model: English to Hungarian
Speech to Text
- New languages: Afrikaans, Gujarati, Hausa, Telugu, Tswana, Javanese, Hebrew
- New engine: April-ASR. Models for: English, French and Polish.
- Stop listening button
- Support for Opus audio codec in Transcribe a file
Text to Speech
- New Piper voices: Arabic, English, Hungarian, Polish, Czech, German, Ukrainian, Vietnamese, Serbian, French, Spanish, Nepali
- More steps in Speech speed option
- Diacritical marks restoration before speech synthesis for Arabic
- Fix: Exporting to audio file was not possible when text was very long
Other
- Setting option Clear cache on close
- Cache compression (Opus format instead of raw audio)

Assets 3

29 Sep 18:01

mkiol

v4.2.1

8259fa7

Speech Note 4.2.1

Linux Desktop

Changes:

Speech to Text
- Improved AMD GPU acceleration support for Whisper models

Assets 2

25 Sep 13:03

mkiol

v4.2.0

3040b60

Speech Note 4.2.0

Linux Desktop

Changes:

Translator
- New models: Hungarian to English, Finnish to English
Speech to Text
- Support for video files transcription
- Option 'Audio source' to select preferred audio source
- Whisper engine update and increase in performance.
  Processing time has been reduced by an average of 50%.
- Improved Nvidia GPU acceleration support for Whisper models
Text to Speech
- Save audio in compressed formats (MP3 or Ogg Vorbis).
  You can also save metadata tags to the audio file, such as track number, title, artist or album.
- Pause option. You can pause or resume speech reading.
- New MMS models: Hungarian, Catalan, German,
  Spanish, Romanian, Russian and Swedish
- Update of RHVoice voice for Uzbek
- Fix: Many Coqui models couldn't read the numbers or the reading wasn't correct.
- Fix: Piper models could not be downloaded
User Interface
- Menu options: 'Open a text file' and 'Save to a text file'
- Command line option to open files
- Improved UI colors when app is running under GNOME dark theme
- Option 'Graphical style' to change Qt interface style

Sailfish OS

Changes:

Translator
- New models: Hungarian to English, Finnish to English
Speech to Text
- Support for video files transcription. With 'Transcribe a file' menu option you can
  convert audio file or audio from video file to text.
- Whisper engine update and increase in performance.
  Processing time has been reduced by an average of 15% (Xperia 10 III).
Text to Speech
- Save audio in compressed formats (MP3 or Ogg Vorbis).
  You can also save metadata tags to the audio file, such as track number, title, artist or album.
- Pause option. You can pause or resume speech reading.
- Update of RHVoice voice for Uzbek
- Fix: Piper models could not be downloaded
User Interface
- Share to Speech Note. You can push text, audio or video content to Speech Note
  using share button in other apps (e.g. Notes, Gallery, Audio recorder, Browser).

Assets 2

23 Aug 14:53

mkiol

v4.1.0

6fbe835

Speech Note 4.1.0

Linux Desktop

Changes:

Speech to Text:
- Support for GPU acceleration for Whisper models
- Fix: Whisper wasn't able to decode short speech sentences
Text to Speech:
- Option 'Speech speed' to make synthesized speech slower or faster.
- New models from Massively Multilingual Speech (MMS) project:
  Albanian, Amharic, Arabic, Basque, Bengali, Bulgarian, Chinese,
  Greek, Hindi, Icelandic, Indonesian, Kazakh, Korean, Latin,
  Latvian, Malay, Mongolian, Polish, Portuguese, Swahili, Tagalog,
  Tatar, Thai, Turkish, Uzbek, Vietnamese, Yoruba
- New Piper voices: Czech, German, Hungarian, Portuguese, Slovak,
  English
- Update of RHVoice voices for Slovak and Czech
- New Coqui voices for Japanese, Turkish and Spanish
- Fix: Splitting text into sentences was incorrect for: Georgian,
  Japanese, Bengali, Nepali, Hindi
Interface
- Option to change font size in text editor

Sailfish OS

Changes:

Speech to Text:
- Remove of experimental 'Restore punctuation' option
- Fix: Whisper wasn't able to decode short speech sentences
Text to Speech:
- Option 'Speech speed' to make synthesized speech slower or faster.
- New Piper voices: Czech, German, Hungarian, Portuguese, Slovak,
  English
- Update of RHVoice voices for Slovak and Czech
- Fix: Splitting text into sentences was incorrect for: Georgian,
  Japanese, Bengali, Nepali, Hindi

Assets 2

07 Aug 15:11

mkiol

v4.0.0

7f57ae7

Speech Note 4.0.0

Changes:

Translator:
- Support for offline translations.
Interface:
- User interface redesign
- Settings option to force specific interface style.
- App translated to new languages: Dutch and Italian
Text to Speech:
- All existing Piper models were updated.
- New Piper voices for: English, Swedish, Turkish, Polish,
  German, Spanish, Finnish, French, Ukrainian, Russian,
  Swahili, Serbian, Romanian, Luxembourgish and Georgian
- New RHVoice model for Slovak language

Assets 2

07 Jul 12:30

mkiol

v3.1.5

8b6a463

Speech Note 3.1.5

Changes in Linux Desktop version:

Text to Speech:
- New Coqui voice for English: Jenny
Speech to Text:
- Quicker decoding when using DeepSpeech/Coqui models (especially on ARM CPU)

Changes in Sailfish OS version:

Speech to Text:
- Quicker decoding when using DeepSpeech/Coqui models
- Re-enabled Swedish Vosk model

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linux Desktop

Sailfish OS

Linux Desktop

Sailfish OS

Linux Desktop

Sailfish OS

Linux Desktop

Sailfish OS

Linux Desktop

Sailfish OS

Linux Desktop

Linux Desktop

Sailfish OS

Linux Desktop

Sailfish OS

Changes in Linux Desktop version:

Changes in Sailfish OS version:

Releases: mkiol/dsnote

Speech Note 4.6.1

Linux Desktop

Sailfish OS

Speech Note 4.6.0

Linux Desktop

Sailfish OS

Speech Note 4.5.0

Linux Desktop

Sailfish OS

Speech Note 4.4.0

Linux Desktop

Sailfish OS

Speech Note 4.3.0

Linux Desktop

Sailfish OS

Speech Note 4.2.1

Linux Desktop

Speech Note 4.2.0

Linux Desktop

Sailfish OS

Speech Note 4.1.0

Linux Desktop

Sailfish OS

Speech Note 4.0.0

Speech Note 3.1.5

Changes in Linux Desktop version:

Changes in Sailfish OS version: