RÉSUMÉ
Human infants learn spontaneously and effortlessly the language(s) spoken in their environments, despite the extraordinary complexity of the task. Here, I will present an overview of the early phases of language acquisition and focus on one area where a modeling approach is currently being conducted using tools of signal processing and automatic speech recognition: the unsupervized acquisition of phonetic categories. During their first year of life, infants construct a detailed representation of the phonemes of their native language and lose the ability to distinguish nonnative phonemic contrasts. Unsupervised statistical clustering is not sufficient; it does not converge on the inventory of phonemes, but rather on contextual allophonic units or subunits. I present an information-theoretic algorithm that groups together allophonic variants based on three sources of information that can be acquired independently: the statistical distribution of their contexts, the phonetic plausibility of the grouping, and the existence of lexical minimal pairs. This algorithm is tested on several natural speech corpora. We find that these three sources of information are probably not language-specific. What is presumably unique to language is the way in which they are combined to optimize the emergence of linguistic categories.
A PROPOS D'EMMANUEL DUPOUX
Emmanuel Dupoux est le directeur du Laboratoire de Sciences Cognitives et Psycholinguistique à Paris. Ses recherches portent sur les processus et les représentations spécifiques au cerveau humain qui permettent au bébé d’acquérir une ou plusieurs langues. Ses recherches sont basées sur les techniques classiques d’imagerie cérébrale chez les nouveaux-nés et de modélisation chez les adultes. Il enseigne à l'Ecole des Hautes Etudes en Sciences sociales où il a mis en place un programme interdisciplinaire de 3ème cycle en Sciences Cognitives.