Classification of laryngeal pathologies using audio, bioimpedance measurements and deep learning
Julia Zofia Tomaszewska
School of Computing and Engineering
Supervisor:
Dr Apostolos Georgakis
School of Computing and Engineering
Vocal tract pathologies encompass a wide spectrum of disorders, from functional impairments to structural abnormalities. Their early detection and classification are critical for efficient treatment and recovery. In this study, we design and implement a digital classification system for vocal tract pathologies based on audio signals and bio-impedance (electroglottographic – EGG) measurements. By that, we hope to contribute towards the development of an accurate and reliable diagnostic tool.
For the development of the envisaged system, three classifiers are implemented. Random Forest classifier (RF) is employed to assess the effectiveness of various feature extraction methods. Additionally, RF is examined at binary classification of control (signals obtained from participants unaffected by the investigated pathologies) and pathological signals, achieving the maximum of 99.85% overall accuracy when using Mel-Frequency Cepstral Coefficients derived from audio recordings. For the classification of vocal tract pathologies, two types of deep learning classifiers are developed and tested – the Convolutional Neural Network (CNN), and the Long-Short Term Memory Network (LSTM).
The CNNs emerged as the most effective, achieving the maximum of 86.61% accuracy for Gammatone Cepstral Coefficients (GTCC) derived from audio speech data, and 84.91% accuracy for GTCCs derived from bio-impedance speech data. These findings underscore the potential of tailored approaches for specific data modalities and pathologies. Future work aims to explore the multi-modal approach, merging audio- and EGG-based systems to achieve more accurate and reliable insight into laryngeal pathology classification, thereby allowing the evolution of a potential vocal tract disorder diagnostic tool.