Speech production is a process running through a set of levels of a very heterogeneous nature as it starts from an idea and ends with a sound wave. Focussing on the production of a word, the steps may be schematized as follows:1
level | operations/processes | brain area | example |
Pragmatics | a cognitive and communicative idea is formed | tertiary association cortices | involves a tiger |
Conceptualization | idea is analyzed in terms of constitutive notions | notion of ‘tiger’ is activated | |
Lexical selection | notions are mapped onto items of the mental lexicon | [various regions of left hemisphere] | the word tiger is selected |
Morphological adaptation | lexeme is specified as a word form according to its function in the sentence | Broca's area | (nothing in this example) |
Symbolization | combination of morphs is mapped onto a phonological representation | Wernicke's area | /tajgər/ |
Phonetics | phonological representation is converted into a plan for the execution of phonatory and articulatory movements2 | Broca's area + premotor cortex + basal ganglia3 | [tʰɑɪ•gɚ] |
Articulatory representation | phonetic plan is temporarily kept in working memory | prefrontal cortex | |
Motor commands | sequentially, each component of the plan is sent to speech apparatus | primary motor cortex | |
Phonation and articulation | speech apparatus executes motor commands (with proprioceptive and auditory feedback) | phonatory and articulatory organs | [Röntgen video here] |
Acoustics | sound wave is produced | air | [s. next section] |
Although this model is schematic and simplified in some respects, its steps can be tentatively associated with the passage of a neural impulse through the brain regions involved as indicated in the table.
The model incorporates self monitoring (Levelt et al. 1999). This implies that the speaker controls the result of each of the steps executed, compares it with his intentions and, if an error is found, may take appropriate measures. The last monitoring step consists in the auditory feedback that the speaker receives of his own utterance.
1 Cf. the “Standard Model of Word-Form Encoding” (Meyer 2000) and Stille et al. 2020.
2 The step-wise conversion of a relatively abstract phonological into an individual phonetic representation is flatly contradicted in Port 2007.
3 The three components mentioned are linked by a closed loop in which motor programs are selected and refined.