[PhD] Consideration of inter-rater variability in the perceptual assessment of speech and voice disorders and its integration into an automatic decision support system.

Although communication methods have changed significantly over the last twenty years with the digital age, and can replace speech in many situations, speech is still essential for successful integration into our society. Given the importance of oral communication, losing speech or language can be felt as a loss of humanity. By definition, communication disorders include any impairment of voice, speech, language and hearing that impairs communication.

In this thesis, we will focus on voice and speech disorders that may be related either to damage to the phonatory apparatus (dysphonia), or to neurological damage leading to motor disorders (dysarthria), or to damage related to a malformation or cancer located in the speech producing apparatus.

These different diseases can lead to a deterioration in the functioning of the speech production system, with a highly variable impact on communication in affected patients. These communication disorders can have a significant impact on the quality of life of patients, which can interfere with their daily life, their professional activities, their social and family relationships and lead to other illnesses such as depression.

Improving the quality of life of patients is a central objective of the care pathway in which the maintenance of oral communication is one of the points to be considered by clinicians. In this context, the assessment of voice and speech disorders and their longitudinal evolution take an important place both at the time of diagnosis and in the choice of the patient’s therapeutic management (including the rehabilitation phase) and follow-up.

The perceptual assessment of speech and voice disorders, known as the “ear” assessment, is currently the most widely used method in clinical practice, despite its highly subjective, variable and difficult to reproduce nature, which is widely emphasised in the literature.

The LIA has been working for about fifteen years on voice and speech disorders, and more particularly on the way in which automatic processing tools can help clinical experts in their acoustic-phonetic analysis of the speech signal and perceptive analysis of the subjects’ productions (controls and patients). The aim of this work is to gain a better understanding of the impairments inherent to the disorders and to provide, in fine, objective assessment approaches of these impairments, usable in clinical practice. This thesis is a continuation of this work and addresses the problem of perceptual evaluation specific to this context and its implication in the development of supervised automatic approaches for the objective evaluation of speech and voice disorders.

Given the acknowledged subjectivity of perceptual assessment, its non-reproducibility, the intra- and inter-rater differences potentially observed in our work but also in the literature, can we use these perceptual measures as a “gold standard” for learning models for speech impairment assessment tasks and for evaluating their performance?
In parallel, it would be interesting to identify on which perceptual information extracted from speech productions these experts base their assessments. Is this perceptual information similar in terms of quality? quantity? Is it possible to find “alteration patterns” in the speech productions that could explain the different/convergent behaviours of each of these experts?

On the basis of these answers, the next question to be addressed by this work will be to identify how to take into account this variability, observed in the perceptual evaluations produced by a panel of experts, in an automatic system leading to a rapid, objective and explainable evaluation that can be used in clinical practice.


