Audio samples produced by TubeTalker

The audio files below coincide with this paper:

Story (2013), Phrase-level speech simulation with an airway modulation model of speech production, Computer Speech and Language.

The speech samples were produced by speech simulation system called TubeTalker. TubeTalker operates at the level of the vocal tract area function on the theoretical view that speech is produced by multiple levels of airway structure and modulation. A “neutral” vocal tract shape is the base structure on which all other modulation is superimposed. The first level of modulation consists of time-dependent shaping of the neutral tract shape over most of its length; this produces transitions from one to another. Spatially localized perturbations are imposed in the second level of modulation that momentarily perturb the underlying vowel substrate. The examples below are demonstrations of using TubeTalker to generate speech at the word and phrase levels. 

Neutral vocal tract 

This sample is the neutral vocal tract only. The voice source does produce a fundamental frequency (F0) contour to give samples a more natural quality. The F0 contour is identical for all samples. 

Word: “Ohio”

With regard to the vocal tract, “Ohio” is an all-vowel utterance. It was generated by modulating the neutral vocal tract shape such that it produced acoustic characteristics of the vowels. The glottal aspiration for the “h” sound was creating by an adbuctory maneuver of the vocal folds.

Word: “Abracadabra” (Vowels only)

This word requires modulation at the level of vowel transitions and consonantal perturbations. The sample below, however, is of only the vowel transitions that underlie production of the word.

Word: “Abracadabra”

Now the consonantal perturbations are imposed on vowel substrate to produce the “Abracadabra.

Phrase: “He had a rabbit” (Vowels only)

This sample demonstrates increased complexity due to it being a phrase rather than a word. This audio file, however, is only the vowel substrate on which phrase is built.

Phrase: “He had a rabbit”

The consonantal perturbations are now imposed. Note that an “r” is present in this phrase which requires that consonant perturbation not occlude the vocal tract.

Phrase: “The brown cow” (Vowels only)

This audio file demonstrates the vowel substrate for the phrase

Phrase: “The brown cow”

The unique component of this example is that it includes a nasal consonant. This requires that the area of the nasal port that couples the main vocal tract to the nasal passages/sinuses be precisely timed to allow nasalization, but also to terminate quickly for adequate production of the “k” sound in the following word (“cow”).

Modifications to the timing of the control parameters

In the following two phrases, the timing of all control parameters was altered such that the first half of each phrase was increased in duration by 25 percent and the latter half decreased by 25 percent. The total duration of each phrase is the same as the original.

Phrase: “He had a rabbit” – Modified timing #1
Phrase: “The brown cow” – Modified timing #1

Modifications to the timing of the control parameters

In the following two phrases, the timing of all control parameters was altered such that the first half of each phrase was increased in duration by 25 percent and the latter half decreased by 25 percent. The total duration of each phrase is the same as the original.

Phrase: “He had a rabbit” – Modified timing #2
Phrase: “The brown cow” – Modified timing #2

Modifications to the voice source

In the following two phrases, the baseline separation of the vocal processes was increased from 0.1 cm to 0.15 cm. This change has the effect of allowing a greater non-oscillatory component of the glottal flow during voicing, and results in increased glottal turbulence. The perceptual effect is a breathier voice quality.

Phrase: “He had a rabbit” – Modified voice source
Phrase: “The brown cow” – Modified voice source

Modifications to the nasal coupling parameters (hypernasal)

In the following two phrases, the nasal coupling area was maintained at a minimum value of 0.2cm2 throughout the duration of each phrase. The effect is to nasalize all portions of the phrases resulting in a hypernasal quality.

Phrase: “He had a rabbit” – Modified nasal coupling
Phrase: “The brown cow” – Modified nasal coupling

Modifications to the epilaryngeal tube

In the following two phrases, the entry area to the vocal tract was increased to effectively widen the epilaryngeal tube. This modification alters the voice quality in two ways – the first three formants are shifted slightly downward in frequency and the glottal flow waveform is altered. The perceptual effect is a darker voice quality. 

Phrase: “He had a rabbit” – Modified epilarynx
Phrase: “The brown cow” – Modified epilarynx

Modifications to the epilaryngeal tube and increase in vocal tract length

In the following two phrases, the entry area to the vocal tract was increased as in the previous example. In addition, the vocal tract length was increased to 18.5 cm. 

Phrase: “He had a rabbit” – Modified epilarynx + increased VT length

Phrase: “The brown cow” – Modified epilarynx + increased VT length

Extra modifications not in the published paper

This version of “abracadabra” has increased duration, decreased fundamental frequency, widened epilarynx, and the vocal tract length was increased to 18.5 cm. 

Word: “Abracadabra”
This is the same as the sample above, but has an added vocal tremor.
 
Word: “Abracadabra”