I was told the other day that vowel formants don't exist, or that they're not real, or something; that measuring F1 and F2 is like trying to measure the diameter of an irregular, non-spherical object, like this comet:
I can see two ways in which this is true. First, a model of the human vocal tract consisting of two (or even more) formant values is far from accurate: the vocal tract can generate anti-formants, and even without that complication, the shape and material of the mouth and pharynx are not very much like those of a flute or clarinet; the vocal tract is a much more complex resonator. However, most of us are not interested in modeling this physical, biological object as such, nor, unless we are very much oriented towards articulatory phonetics, the movements of the tongue for their own sake. We are more interested in representing the aspects of vowel sounds that are important linguistically.
But here too, the conception that "vowels are things with two formants and a midpoint" is a substantial oversimplification. Most obviously, vowels extend over (more or less) time, and their properties can change (more or less) during this time. Both these aspects - duration and glide direction/extent - can be linguistically distinctive, although they often are not. So to fully convey the properties of a vowel, we will need to create a dynamic representation. While this creates challenges for measurement and especially for analysis, these problems are orthogonal to the question of how best to describe a vowel at a single point - more precisely, a short window - in time.
As mentioned, the "point" spectrum of a vowel sound, a function mapping frequency to amplitude, is complex. For one thing, it contains some information that is relatively speaker-specific. This data is of great importance in forensic and other speaker-recognition tasks, but in sociophonetic/LVC work we are usually happy to ignore it (if we even can). Other aspects of a vowel's spectrum, not necessarily easily separable from the first category, convey information about pitch, nasality, and the umbrella term "voice quality". These things may or may not interest us, depending on the language involved and the particular research being conducted.
And then there is "vowel quality" proper, which linguists generally describe using four pairs of terms: high-low (or closed-open), back-front, rounded-unrounded, and tense-lax. Originally, these parameters were thought of as straightforwardly articulatory, reflecting the position of the jaw, tongue, lips, and pharynx. As we will see, this articulatory view has shifted, but the IPA alphabet still uses the same basic terms (though no longer tense-lax), and for the most part Ladefoged & Johnson (2014) classify English vowels along the same dimensions as A. M. Bell (1867), who used primary-wide instead of tense-lax:
Equally long ago, scientists were beginning to realize that the vocal tract's first two resonances (or formants, as we now say) are related to the high-low and back-front dimensions of vowel quality, respectively. The higher the first formant (F1), the lower the vowel, and the higher the second formant (F2), the fronter the vowel. Helmholtz (1862/1877) partially demonstrated this with experiments holding a series of tuning forks up to the mouth, and he also made the first steps in vowel synthesis. But he found it difficult - like present-day researchers, sometimes - to distinguish F1 and F2 in back vowels, considering them simple resonances:
In contrast to Helmholtz, A. G. Bell (1879) and Hermann (1894) both emphasized that the resonance frequencies characterizing specific vowels were independent of the fundamental frequency (F0) of the voice. Bell provided an early description of formant bandwidth, while Hermann coined the term "formant" itself.
In the new century, experiments on the phonautograph, the phonograph, the Phonodeik, and other machines led most investigators to realize that all vowel sounds consisted of several formants (not just one for back vowels and two for front vowels, as Helmholtz had influentially claimed); advances were also made in vowel synthesis, using electrical devices. In particular, Paget (1930) argued for the importance of two formants for each vowel. But the most progress in acoustic phonetics was made after the invention of the sound spectrograph and its declassification following World War II.
Four significant publications from the immediate postwar years - Essner (1947), Delattre (1948), Joos (1948), and Potter & Peterson (1948) - all use spectrographic data to analyze vowels, and all four explicitly identify the high-low dimension with F1 (or "bar 1" for P & P) and the back-front dimension with F2 (or "bar 2" for P & P). Higher formants are variously recognized, but considered less important. In Delattre's analysis, the French front rounded vowels are seen to have lower F2 (and F3) values than the corresponding unrounded vowels:
Jakobson, Fant and Halle (1952) is somewhat different; it defines the acoustic correlates of distinctive phonological features, aiming to cover vowels and consonants with the same definitions. So the pair compact-diffuse refers to the relative prominence of a central spectral peak, while grave-acute reflects the skewness of the spectrum. In vowels, compact is associated with a more open tongue position and higher F1, and diffuse with a more closed tongue position and lower F1. Grave or back vowels have F2 closer to F1, acute or front vowels have F2 closer to F3. (Indeed, F2 - F1 has been suggested as an alternative to F2 for measuring the back-front dimension, e.g. in the first three editions of Ladefoged's Course on Phonetics).
These authors all more or less privilege an acoustic view of vowel differences over an articulatory view, noting that acoustic measurements may correspond better to the traditional "articulatory" vowel quadrilateral than the actual position of the tongue during articulation. The true relationship between articulation and acoustics was not well understood until Fant (1960). But the discrepancy had been suspected at least since Russell (1928), whose X-ray study showed that the highest point of the tongue was not directly related to vowel quality; it was noticeably higher in [i] than in [u], for example. In addition, certain vowel qualities, like [a], could be produced with several different configurations of the tongue. Observations like these led Russell to make the well-known claim that "phoneticians are thinking in terms of acoustic fact, and using physiological fantasy to express the idea".
Outside of a motor theory of speech perception, the idea that the vowel space is defined by acoustic resonances rather than articulatory configurations should be acceptable, although the integration of continuous acoustic parameters into phonological theory has never been smoothly accomplished. While we don't know that formant frequencies are directly tracked by our perceptual apparatus, it seems that they are far from fictional, at least from the point of view of how they function in language.
[To be continued in Part 2, with less history and more practical suggestions!]
References (the works by Harrington, Jacobi, and Lindsey, though not cited above, were useful in learning about the topic):
Bell, Alexander Graham. 1879. Vowel theories. American Journal of Otology 1: 163-180.
Bell, Alexander Melville. 1867. Visible Speech: The Science of Universal Alphabetics. London: Simkin, Marshall & Co.
Delattre, Pierre. 1948. Un triangle acoustique des voyelles orales du français. The French Review 21(6): 477-484.
Essner, C. 1947. Récherches sur la structure des voyelles orales. Archives Néerlandaises de Phonétique Expérimentale 20: 40-77.
Fant, Gunnar. 1960. Acoustic theory of speech production. The Hague: Mouton.
Harrington, Jonathan. 2010. Acoustic phonetics. In W. Hardcastle et al. (eds.), The Handbook of Phonetic Sciences, 2nd edition. Oxford: Blackwell.
Helmholtz, Hermann von. 1877. Die Lehre von den Tonempfindungen. 4th edition. Braunschweig: Friedrich Vieweg.
Hermann, Ludimar. 1894. Beiträge zur Lehre von der Klangwahrnehmung. Pflügers Archiv 56: 467-499.
Jacobi, I. 2009. On variation and change in diphthongs and long vowels of spoken Dutch. Ph.D. dissertation, University of Amsterdam.
Jakobson, Roman, Fant, C. Gunnar M., and Halle, Morris. 1952. Preliminaries to speech analysis: the distinctive features and their correlates. Cambridge: MIT Press.
Joos, Martin. 1948. Acoustic phonetics. Language Monograph 23. Baltimore: Waverly Press.
Ladefoged, Peter. A course in phonetics. First edition, 1975. Second edition, 1982. Third edition, 1993. Fourth edition, 2001. Fifth edition, 2006. Sixth edition, 2011.
Ladefoged, Peter & Johnson, Keith. 2014. A course in phonetics. Seventh edition. Stamford, CT: Cengage Learning.
Lindsey, Geoff. 2013. The vowel space. http://englishspeechservices.com/blog/the-vowel-space/
Mattingly, Ignatius G. 1999. A short history of acoustic phonetics in the U.S. Proceedings of the XIVth International Congress of Phonetic Sciences. San Francisco, CA. 1-6.
Paget, Richard. 1930. Human speech. London: Routledge & Kegan Paul.
Potter, Ralph K. & Peterson, Gordon E. 1948. The representation of vowels and their movements. Journal of the Acoustical Society of America 20(4): 528-535.
Russell, George Oscar. 1928. The vowel, its physiological mechanism as shown by X-ray. Columbus: OSU Press.