Skip to main content
Article

Enhancing ASR systems for dysarthric speech through deep learning approaches and dataset expansion

Author: Leon Turner (University of West London)

  • Enhancing ASR systems for dysarthric speech through deep learning approaches and dataset expansion

    Article

    Enhancing ASR systems for dysarthric speech through deep learning approaches and dataset expansion

    Author:

Abstract

Presented at the UWL Annual Doctoral Students' Conference, Friday 12 July 2024.

Keywords: dysarthric speech, deep learning

How to Cite:

Turner, L., (2025) “Enhancing ASR systems for dysarthric speech through deep learning approaches and dataset expansion”, New Vistas 11(1). doi: https://doi.org/10.36828/newvistas.277

Downloads:
Download HTML

68 Views

10 Downloads

Published on
2025-02-19

Peer Reviewed

65d68c07-54be-4fde-9996-13394387ba99

Enhancing ASR systems for dysarthric speech through deep learning approaches and dataset expansion

Leon Turner

School of Computing and Engineering

Supervisors:

Dr Eugenio Donati

School of Computing and Engineering

Dr Gerard Roma

School of Computing and Engineering

Dysarthria is a neurological motor speech disorder that impacts an individual’s ability to control speech production muscles, leading to reduced speech intelligibility. Therefore, individuals with dysarthria often struggle with interacting and communicating with daily life communication. The current state of Automatic Speech Recognition (ASR) for dysarthric speech is limited, especially for severe cases of dysarthria. This research aims to expand the available data and explore novel approaches in developing robust ASR systems tailored for dysarthric speech. The data expansion phase will see the collaboration with specialist organisations and speech therapy centers to recruit volunteers with dysarthria and collect a substantial amount of dysarthric speech data. The dataset will include recordings from individuals with varying degrees of severity, covering a wide range of speech patterns and characteristics. The data will go through a pre-processing stage leveraging signal processing techniques to extract relevant speech features and enhance the quality of the recordings, ensuring the suitability of the data for ASR training. The model development phase will use deep learning techniques such as convolutional neural networks and recurrent neural networks to create automated speech recognition models fitted for dysarthric speech. It will explore enhancing existing model architectures and creating new ones that can capture the distinctive acoustic and phonetic features of dysarthric speech. Transfer learning will be used to make use of existing models trained on large-scale datasets of non-dysarthric speech, refining them for dysarthric speech usage. These models will be optimised for real-world application by addressing computational efficiency and real-time processing constraints. The performance of the developed ASR models will be evaluated using standard evaluation metrics including word error rate, Character Error Rate, Word Recognition Rate, and Accuracy. Preliminary results showed potential in improving accuracy, however these results are limited due to a shortage of data. This research is expected to have significant impact in various domains, such as healthcare, education, and accessibility.