Pronouncing consonants

Speech sounds can be examined from two different perspectives. You can look at the physical properties of speech sounds (this is what phonetics is concerned with) and you can also look at how speech sounds behave in a given linguistic system (=language), how they function, where they occur and where they do not, how they turn into each other (this is what phonology is concerned with).

Human speech typically involves a speaker and a listener. The speaker pushes air out of their lungs, manipulates the airflow with their speech organs (glottis, velum, tongue, teeth, lips, etc). How speech sounds are created in this process is the topic of articulatory phonetics. Once the air leaves the speaker’s lips it travels through air as a wave: in some areas between the speaker’s lips and the listener’s ears the pressure of the air is higher, in others it is lower. What these air waves look like for different speech sounds is what acoustic phonetics is working on. Finally, the air waves reach the listener’s eardrums and get converted into neural impulses that their brain can then decipher. The branch of phonetics concerned with such issues is called auditory phonetics. We will only deal with articulation here.

In articulatory terms the main difference between consonants and vowels is that a consonant typically involves an obstruction somewhere in the vocal tract. It is the sound that the movement of the air creates as it passes around or across this obstruction that we call the given consonant.

The production (aka articulation) of a consonant is characterized by the area where this obstruction is formed — this is the consonant’s place of articulation (often abbreviated as POA or simply place) — and the degree of the obstruction (complete or partial, in the latter case narrow or wide, etc) — this is the consonant’s manner of articulation (or simply manner). Some consonants have further characteristic properties, which we will introduce later.

A very precise description of how a given sound is articulated is often unnecessary. It is useful, for example, when a sound is compared to another similar sound in some other dialect or language. In most cases little (sometimes even bigger) differences in the articualtion of sounds has no linguistic consequence. Such differences may go unnoticed, or they may be taken as indication that the speaker has a different social, geographical, etc background.

Places of articulation

Places of articulation are anatomical terms, names of organs along the vocal tract from the larynx (where the windpipe branches off towards the lungs) to the lips. Some of these organs are active, others are passive: the active articulators move towards the passive ones. You are encouraged to read this detailed description of the articulators.

The three places most commonly referred to in phonology is labial, coronal, and velar.Labial and coronal are in fact groups of places the differences between which is often irrelevant. We will nevertheless distinguish them here.

Labial

bilabial

Labial sounds involve one or both of the lips.Labia is Latin for ‘lips’, labialis means ‘lippy, connected to the lips’. The well-known consonants, p, b, and m are each formed by pressing the two lips together firmly. The precise phonetic term for this category is bilabial, ie involving both lips.

labiodental

The consonants f and v are also labial, but it is only the lower lip that takes part in their production: the upper teeth are pressed agaist this articulator. These sounds are therefore labiodental, ie lippy-toothy.Dentes is Latin for ‘teeth’, cf dentist.

A third kind of labial consonant is w, which, in fact, has two simultaneous places: when pronouncing this sound the lips are approaching each other (= labial), but at the same time the back of the tongue is also raised towards the velum (= velar, more on which below). The place of w is thus labiovelar. In the sound system of English, w clearly behaves as a labial consonant, so we here take labiovelar to be a subcategory of labials and not of velars.

Coronal

CoronalCorona is the name of the front part of the tongue. is the most general term among places of articulation. Practically any consonant that is not labial or velar belongs to this group.

alveolar

The most common coronal consonants are t, d, n, s, z, and l. If a more precise term is needed, these consonant are alveolar, because the front part of the tongue touches or approaches the alveolar ridge during their articulation.The alveolar ridge is the gum behind your upper incisors.

Up to this point we have used well-known symbols. This is because the consonants discussed so far are very common, and so the designers of IPA have chosen to represent these sounds by letters from the Roman alphabet. There are less common sounds too, for which the Roman alphabet has no symbols (because Latin did not have these sounds). In this case Greek letters or other solutions are adopted.

dental

The two “th” sounds serve as our first example for such sounds. These sounds are also coronal, more precisely they are dental, sometimes referred to as interdental: the tip of the tongue reaches out to the upper incisors (front teeth) while pronouncing these consonants. The first one occurs in thief θɪjf, the second in the definite article the ðɪj (which often occurs as ðə in connected speech).As you can see, English spelling does not distinguish θ and ð at all.

postalveolar

While θ and ð are more front (= closer to the lips) in their place than the alveolars, another set of coronal consonants are more back (= closer to the throat). There are pronounced with the tongue touching behind the alveoral ridge, hence their phonetic category is postalveolar (aka palatoalveolar). Let us exemplify these too: cheap ʧɪjp, jeep ʤɪjp, sheep ʃɪjp. ʒ is the least frequent consonant of English, it mostly occurs between vowels, eg in vision vɪʒən.

The sound represented in English by the letter R is often pronounced as a retroflex consonant, during the production of which the tip of the tongue is curled back a bit behind the alveolar ridge. The “R sound” of dialects and languages is in fact very variable, ranging from labial w to velar x, including the alveolar r too. Where precision is not necessary, the symbol r is used instead of the odd-looking ɻ: red rɛd (or ɹɛd, or ɻɛd).

The last place of articulation discussed among coronals is palatal.Palatum is Latin for the ‘roof of the mouth’. It is also called the hard palate, to distinguish it from the velum. In palatal consonants the back of the tongue is raised to(wards) the palate. j (as in yet, not as in jet!) is the only such consonant in English. It may be worth noting that Hungarian (among other languages) has three further palatals: cNote that the IPA uses c unlike the spelling system of European languages. (eg tyúk ‘hen’), ɟ (eg gyík ‘lizard’), and ɲ (eg nyár ‘summer’).

Velar

Behind the (hard) palate the roof of the mouth continues with the soft palate, also known as the velum.Velum is Latin for ‘veil’. Strictly speaking the membrane we are talking about is the palatal velum. Sounds in which the back of the tongue is raised against the velum are called velar.

velar

The consonants k and g are both velar, as is ŋ, the sound represented by the letter N in eg sing sɪŋ. The last sound in the German or Hungarian pronunciation of the name Bach bax is also velar, in English is this often replaced by k: bak.

Glottal

In this brief survey we will not discuss uvular, pharyngeal, or epiglottal as places of articulation, because English does not provide us with examples for these sounds. The last (or, if looked at from the lungs, first) place of articulation is glottal.Glottis is the name of the opening between the vocal folds, which are located in the larynx.

The sound represented by the letter H in English (of course only when it is not preceded by eg S, as in ship), ie h, is often called glottal: it has no specific obstacle in the oral tract, not, in fact, at the glottis either.Phonetically, h is more like a vowel than like a consonant, but it is a voiceless vowel. The glottal stop, ʔ, however, is truly glottal: it is pronounced by closing the glottis. This sound is rather typical of London English, many t’s are replaced by it: city sɪʔɪj. Other accents of English may also have ʔ in place of t’s, but much less extensively.

Manner

We have seen that several different consonants share their place of articulation. These differ in other characteristics, eg their manner of articulation. Both t and s are alveolar, the former is a momentary sound, the latter can be maintained for a very long time — as long as one’s lungs are emptied.

The most common consonants are stops. In them the articulators (the lips, or the front part of the tongue and the alveolar ridge, or the back part of the tongue and the velum or the vocal folds) are pressed together firmly so that no air can escape through this obstacle.

Stops are of two kinds: in p, b, t, d, k, g (as well as c, ɟ) the nasal tract is closed too, so that no air can come out of the lungs either through the mouth or through the nose. Therefore pressure builds up until the articulators suddenly separate and the air bursts out. These sounds are called oral stops, or, because of the explosion so typical of them, explosives, or just plosives. The other kind of stops include m, n, ɲ, and ŋ. These are different from plosives in that the velum is lowered during there articulation, hence air can escape through the nasal tract.Nasality is impossible with the glottal stop, in which the vocal folds are pressed together. Because of this, these are not momentary sounds, they can be pronounced for a prolonged time. These sounds are called nasal stops, or just nasals for short.In fact, many phonologists use the term “stop” for plosives only, which may be a bit misleading for uninitiated students.

As mentioned, the closure in the mouth is complete in stops (the nose may be open though, in nasal stops). In another group of sounds there is a significant closure, the articulators come into contact, but there are gaps between them through which the air can escape. Because of the narrowness of the passage between the articulators, the airflow is turbulent, which we perceive as noise. Sounds produced in this manner are called fricatives because of the friction of the air characterizing them. As already hinted at, s is a fricative sound together with z, ʃ, ʒ, as well as f, v, θ, ð, and x. The set of possible fricatives is larger than that of plosives, because there appear to be less places where complete closure can comfortably be formed in the human vocal tract. Nevertheless, there are languages with very few fricatives (Ancient Greek had only one, s, Latin two, s and f, Hawaiian and Dinka have no fricatives at all.)

It is useful to distinguish two types of fricatives: the more noisy sibilants (ie s, z, ʃ, ʒ) and the less noisy nonsibilants (ie f, v, θ, ð, x).Note that other groupings of these two sets are also found in the phonological literature.

The affricates of English are ʧ and ʤ. The symbols for these two consonants may lead one to think that they are clusters of a plosive followed by a fricative. However, there is good reason to believe that they are unitary segments.The reasons will be discussed elsewhere. Some researchers argue that affricates are just plosives the release of which is slower than usual. In this view English has not only labial, alveolar, and velar plosives, but also two postalveolar onesm ʧ and ʤ.

Fricatives are produced by a fairly narrow constriction between the articulators. By pulling the articulators more apart, a larger gap remains between them. Sounds created in this way are called approximants. Of the consonants we have discussed so far, w, l, ɻ, j, and h are approximants, although h is often (probably mistakenly) labelled a fricative.

You may have noticed that the difference between fricatives and approximants is not very great, it depends on the degree of the constriction between the articulators. In fact, v, ð, j (as well as ɣ, which has not been mentioned yet, it is the pair of the velar x) may symbolize both fricatives and approximants. For example, the English v is a fricative, while the Hungarian v, especially before a vowel, is an approximant.There is a designated symbol for this sound, ʋ, but linguists usually just use v for it. This is important to bear in mind, because using the Hungarian type approximant v in English may sound weird to natives.

If the constriction between the articulators is maximally wide, vowels are pronounced. In this sense, vowel is also a manner of articulation, but we will discuss vowels separately.

A category partly cross-cutting the manners of articulation above is lateral.Lateralis is Latin for ‘connected with the sides of something’. In these sounds the tip of the tongue touches against the top of the oral tract, but there is a hole at the two sides. If this hole is large enough, we get a lateral approximant, like l, if it is narrower, the result is a lateral fricative, like the sound written as ll in Welsh (the IPA symbol is ɬ).

It has already been mentioned that “R-type” sounds are very diverse, ie may be pronounced in several different ways, yet these differences are rarely distinctive in a given language. The most common English rhotic is an alveolar or retroflex approximant, ɹ or ɻ. Earlier many speakers pronounced a flap, ɾ, where the spelling has r. Scottish speakers may have a trill, r. In standard Hungarian this sound is either a flap or a trill, but note that w or x can also be used to replace it. This though is considered a speech defect by many.

The lateral l and the rhotic r are often referred to together as liquids. As we are going to see, these two consonants do exhibit some common behaviour.

The two approximants j and w are often referred to as glides. These sounds can occur at the end of diphthongs (eg bye baj, bow baw), in this position they are said to be the offglides of these diphthongs. Some analysts group r and/or h with glides too.

Two very basic sound categories have to be mentioned here. One of them, the best known one, is consonant vs vowel. These two types of sound cannot always be distinguished physically. That is to say, if you hear a sound, you cannot always tell whether it is a vowel or a consonant. In English (like in many other languages) the glides j and w may hover between the two states: eg minion may be mɪnɪjən or mɪnjən.

The other distinction is between obstruents and sonorants. Plosives, affricates, and fricatives are obstruents, nasals, approximants, and vowels are sonorants. One common feature in English (and in Hungarian) of obstruents is that they come in pairs: there are two contrasting consonants at each place for each manner. That is, p and b are both (bi)labial plosives, s and z are both alveolar fricatives, etc. Sonorants on the other hand do not form such pairs, there is only one labial nasal (m) and only one palatal approximant (j), etc. In Hungarian only vowels can form the core of a syllable, with consonant surrounding it (eg bal ‘left’, bank ‘bank’, etc). In English any sonorant may occur in this role, although sonorant consonants may form only an unstressed syllable (eg bottle bɔtḷ, prison prɪzṇ, prism prɪzṃ).The symbols with a dot underneath are syllabic consonants, each of these words is two syllables.

Phonation types

We already know that the larynx is at the top of the windpipe, containing the vocal folds. The vocal folds can block off the windpipe so that food or liquids intended for the stomach do not get diverted into the lungs.Talking opens your vocal folds. If you talk during eating or drinking, you may suffocate. The vocal folds are capable of vibrating, which makes speech more hearable.

seven positions of the glottis

In the diagrammes above you can see schematized images of the glottis. The two black triangles are the arytenoid cartilages, which govern the vocal folds (the edges of which are represented by the vertical line(s) in the middle).

A glottal stop (ʔ) is produced by closing the vocal folds firmly, thereby blocking the airflow from the lungs (figure A). If you force the air strong enough, it can get through even the closed vocal folds, making them vibrate, this is creaky voice (figure B). In modal (normal) voicing the arytenoids do not press the vocal folds together, therefore air can go through them without extra pressure, again making them vibrate (figure C). If the vocal folds are pulled apart, a high rate of airflow is necessary to make them vibrate, this is called breathy voice (figure D). Pulling the vocal folds further apart inhibits their vibration, the result is a voiceless sound (figure E). Finally, you can pull the vocal fold so far from each other that they cannot get close enough for voicing even for the next sound, this is aspiration (figure F).

The pairs pb, td, ʧʤ, kg, etc are commonly labelled voiceless–voiced. However, in English these pairs are often distinguished not by voicing but by other means, eg aspiration. Read more…

Airstream mechanisms

Airstream mechanisms are discussed here only for completeness’s sake. During speech production air must be moving through at least part of the vocal tract. The movement of air is commonly egressive (ie outward), but it may also marginally be ingressive (ie inward).

The initiator of the movement may be the lungs (this is pulmonicPulmo is Latin for ‘lung’. airstream), or the larynx (glottalic airstream), or the velum (velaric airstream).

out  in  
lungs
larynx
velum

The three initiators may combine with the two airstream directions in a limited way, as shown in the chart to the right. The pulmonic airstream can only be egressive, that is, air can only move out of the lungs. It is true that we can speak while inhaling the air (try it!), but this is mechanism is not normally used by any human language. English and Hungarian have only pulmonic egressive sounds (lungs, out).

Pressure can also be created by closing the glottis and moving it upwards — this results in so called ejectives (larynx, out). Vacuum can be created by moving the glottis downwards, in which case it is vibrating — the sounds so produced are implosives (larynx, in). Finally humans may rarefy the air by moving their tongue downwards while closing the gap at the velum by the back of the tongue. These sounds are velaric ingressive, more commonly known as clicks (velum, in).

clicks are fun!

Here is a chart of the nonpulmonic consonants, you can click on the symbols and listen to how these consonants sound. You can also find the IPA symbols for these consonants here. (You don’t have to learn these symbols!)

Multiple articulatory gestures

Up to this point we have discussed almost exclusively consonants that had a single place and a single manner of articulation. There also exist consonants that have two places, ie two simultaneous closures in the oral tract. The best known example for such double places is labiovelar. We have already got to know the labiovelar approximant, w, but stops with these two places also exist: k͡p, g͡b, ŋ͡m.

In some cases the two places are not simultaneous, but a secondary articulation accompanies follows or precedes the primary place. For example, velars may be labialized: , or palatalized: , , other places may be velarized. In many varieties of English, l may be velarized (aka “dark”), the symbol for this sound is ɫ.

Affricates may also be considered as having two articulatory gestures: two manners of articulation, stop+fricative. But, as we have seen, affricates can also be taken to be slowly released plosives.

Appendix

IPA chart with audio

show me the questions
let’s go back to the contents page