Skip to content
Firoj Alam edited this page Sep 8, 2018 · 13 revisions

Welcome to the Katha: Bangla Text to Speech wiki!

Katha: Bangla Text to Speech is a software package for the Bangla language which can help to tackle the illiteracy problem, empower the visually impaired and increase the possibilities of improved human-machine interaction. This project aims to develop a TTS system for Bangla using diphone and unit selection concatenation techniques based on the Festival speech synthesis technology paper.

The Katha: Bangla Text to Speech has been developed during the time 2007-2011, at CRBLP (currently does not exist), BRAC University. Due to the lack resources, the support has been discontinued. As a part of the research team, we are maintaining this project and hosted on github (previously hosted at sourceforge). The goal is make it publicly available for it's wider research uses. From the begining of 2017, we are receiving support from Cognitive Insight Research Lab to extend its research work.

Have feedback on Katha: Bangla Text to Speech? Let @firojalam04 know on Twitter.

Different modules of TTS project:

Phoneme Inventory:

A defined set of phoneme inventory is important for a language. This is not only important for phonetic analysis but also important for speech processing application such as TTS and ASR. There have been several studies in the past, mostly based on articulatory phonetics. We concentrated on the acoustic characteristics of Bangla phonemes, obtained by analyzing the recordings of male and female voices. The goal of this task was to determine the total number of phonemes and their acoustic properties in Bangla language. For this purpose we collected text in different format and then recorded the text. We hired gender equivalent professional and non-professional speakers and recording studio for recording. Then acoustic analysis was done on the recorded speech. Finally, we concluded with 30 consonants, 14 vowels, and 21 diphthongs, paper and paper.

Text Normalization:

We have developed rule based text normalization system in two different technologies such as java and festival scheme. The job of text normalize system is to convert non-standard word representation to standard form. Currently this system can handle number, phone number, ordinal, cardinal, acronym, and abbreviation paper.

Letter to Sound System/Lexicon:

Our team developed a rule based pronunciation generator for Bangla words. It takes a word and finds the pronunciations for the graphemes of the word. A grapheme is a unit in writing that cannot be analyzed into smaller components. Resolving the pronunciation of a polyphone grapheme (i.e. a grapheme that generates more than one phoneme) is the major hurdle that the Automated Pronunciation Generator (APG) encounters. Bangla is partially phonetic in nature, thus we can define rules to handle most of the cases. Besides, up till now we lack a balanced corpus which could be used for a statistical pronunciation generator. This system is extending day by day to make the accuracy up to the mark. A pronunciation lexicon also developed which consists 93K lexical entries.

Intonation Modeling:

In linguistics, intonation is a variation of pitch while speaking which is not used to distinguish words. Intonation and stress are two main elements of linguistic prosody. Since, we do not have existing system for intonation in Bangla, so we are trying to make an intonation model using statistical system from speech corpus. A read speech corpus was developed to develop intonation model.

Diphone Database for TTS:

Developed a diphone database consisting 4355 diphones. Diphone is the number of square of phones. We identified 30 consonants, 14 vowels and 21 diphthong phonemes. This includes designing nonsense sentences from diphone list, recording by professional speaker, splitting and labeling. Please download speech corporapaper

Katha: Bangla Text To Speech is a free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.