Audio samples produced by TubeTalker
The audio files below coincide with this paper:
Story (2013), Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech and Language.
The speech samples were produced by speech simulation system called TubeTalker. TubeTalker operates at the level of the vocal tract area function on the theoretical view that speech is produced by multiple levels of airway structure and modulation. A “neutral” vocal tract shape is the base structure on which all other modulation is superimposed. The first level of modulation consists of time-dependent shaping of the neutral tract shape over most of its length; this produces transitions from one to another. Spatially localized perturbations are imposed in the second level of modulation that momentarily perturb the underlying vowel substrate. The examples below are demonstrations of using TubeTalker to generate speech at the word and phrase levels.
Neutral vocal tract
Word: “Ohio”
Word: “Abracadabra” (Vowels only)
Word: “Abracadabra”
Phrase: “He had a rabbit” (Vowels only)
Phrase: “He had a rabbit”
Phrase: “The brown cow” (Vowels only)
Phrase: “The brown cow”
Modifications to the timing of the control parameters
In the following two phrases, the timing of all control parameters was altered such that the first half of each phrase was increased in duration by 25 percent and the latter half decreased by 25 percent. The total duration of each phrase is the same as the original.
Modifications to the timing of the control parameters
In the following two phrases, the timing of all control parameters was altered such that the first half of each phrase was increased in duration by 25 percent and the latter half decreased by 25 percent. The total duration of each phrase is the same as the original.
Modifications to the voice source
In the following two phrases, the baseline separation of the vocal processes was increased from 0.1 cm to 0.15 cm. This change has the effect of allowing a greater non-oscillatory component of the glottal flow during voicing, and results in increased glottal turbulence. The perceptual effect is a breathier voice quality.
Modifications to the nasal coupling parameters (hypernasal)
In the following two phrases, the nasal coupling area was maintained at a minimum value of 0.2cm2 throughout the duration of each phrase. The effect is to nasalize all portions of the phrases resulting in a hypernasal quality.
Modifications to the epilaryngeal tube
In the following two phrases, the entry area to the vocal tract was increased to effectively widen the epilaryngeal tube. This modification alters the voice quality in two ways – the first three formants are shifted slightly downward in frequency and the glottal flow waveform is altered. The perceptual effect is a darker voice quality.
Modifications to the epilaryngeal tube and increase in vocal tract length
In the following two phrases, the entry area to the vocal tract was increased as in the previous example. In addition, the vocal tract length was increased to 18.5 cm.
Extra modifications not in the published paper
This version of “abracadabra” has increased duration, decreased fundamental frequency, widened epilarynx, and the vocal tract length was increased to 18.5 cm.