Document Type
Article
Publication Date
1-1-2024
Publication Title
IEEE Access
Volume
12
First Page
59198
Keywords
Classifier, CNN, cochleagram, dysphonia, gammatone filters, voice pathology
Last Page
59210
Abstract
The spectral images provide the dynamic characteristics of the voice signal in the time and frequency domains. However, extracting the predominant spectral features from the voice samples is still challenging. This work generates cochleagram images to unveil detailed spectral content of the voice samples to recognize dysphonic voice. Both sustained vowel ('/a/') and sentence voice samples are considered to include phonation, respiration, and resonance of the vocal tone. Also, gender bias is eliminated by considering male and female voice samples separately, as they have structurally different vocal tracts, pharynx, and oral cavities. The simulation results show that the cochleagram, coined with a designed pre-trained convolutional neural network (CNN), can achieve 95% accuracy in identifying dysphonic voices with sentence samples. A robust, noninvasive, and automated voice pathology detection system is effectively generated through perceptual analysis of voice signals. The proposed automated pathological voice detection system can objectively correlate the clinical findings and assist in monitoring the treatment progress of dysphonic voice on top of subjective assessment by clinicians.
DOI
10.1109/ACCESS.2024.3392808
E-ISSN
21693536
Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Recommended Citation
Islam, Rumana; Abdel-Raheem, Esam; and Tarique, Mohammed. (2024). Cochleagram to Recognize Dysphonia: Auditory Perceptual Analysis for Health Informatics. IEEE Access, 12, 59198-59210.
https://scholar.uwindsor.ca/electricalengpub/485