X

Text-to-Speech (TTS) synthesis turns written text into speech. More often than not, speech synthesis is based on natural human speech, that is the synthesiser takes speech sounds from prerecorded human speech and concatenates them to generate the sentences as they are typed. Due to technological progress in recent years, speech synthesis has become quite common. You can hear it in elevators in shopping centres, on the GPS system in your car, on mobile phones, and computer operating systems usually feature TTS synthesis.

TTS synthesis is a big help for the visually impaired. TTS synthesis is used to have screen readers read out everything that is on the screen.

Synthesis systems used in abair.ie

For every language (and dialect) a new speech synthesis system has to be developed. Since June 2008 a synthesiser for Gweedore Irish is available on the Web site abair.ie. A synthesiser for Connaught Irish was developed in 2010 and soon there will be one for Munster Irish.

To turn written text into speech, abair.ie uses two different systems. The first system is called unit selection. The synthesiser searches a large data base of recorded human speech and concatenates diphones (i.e. two adjacent speech sounds) in the order they appear in the written input from the user (cf. illustration below). A major advantage of this system is that it produces speech that sounds quite natural. On the other hand, it tends to be slow and sometimes goes astray.

The second speech synthesis system used by abair.ie is called HTS. For this system to work it is sufficient to have a small speech data base that contains all speech sounds of a given language in all possible contexts. HTS calculates the average of every speech sound in every context. Once this is done, the computer can generate speech sounds without having to refer to the recordings. This system is very fast and stable. On the other hand, the speech generated by HTS does not sound very natural. Nevertheless, it is functional and handy for anybody who wants to go through their daily tasks fast and efficiently, even if the quality of the speech is not the best. This system is still in its infancy but people are looking for ways of improving it, i.e. to make it sound more natural. In a couple of years, HTS will could be the most widely use system in speech synthesis.

Addional Benfits

For this project a large corpus of spoken Irish (read by two native speakers) was built up. The data base that is used inside abair.ie consists of over 15 hours of recorded speech. The speech has been transcribed and annotated, and as a consequence, is now an invaluable resource for those who want to analyse spoken Irish. Soon data will be collected to add Munster Irish to abair.ie.


Illustration: What happens inside abair.ie when you press “Synthesise”?
Step Example
Input
(the text you type or paste)
Tús maith, 1/2 na hoibre.
Normalisation
(turns numbers and symbols into regular words)
tús maith , leath na hoibre
Dictionary look-up
(gets pronunciations for words in the system's dictionary)
tús → tˠ uː sˠ
maith → mˠ a hʲ
leath → lʲ a h
na → nˠ ə
Letter-to-sound rules
(creates pronunciations for words not in the dictionary)
hoibre → h i bʲ ɾʲ ə
Unit selection
(gets units from the recorded corpus that match phone strings in the input text)
unit selection
Synthesis
(puts the selected units together to create a sound file)
Synthesis
Output
(plays the sound file for the input "Tús maith, 1/2 na hoibre.")
output