Document Type

Article

Publication Date

1-1-2024

Publication Title

IEEE Access

Volume

12

First Page

59198

Keywords

Classifier, CNN, cochleagram, dysphonia, gammatone filters, voice pathology

Last Page

59210

Abstract

The spectral images provide the dynamic characteristics of the voice signal in the time and frequency domains. However, extracting the predominant spectral features from the voice samples is still challenging. This work generates cochleagram images to unveil detailed spectral content of the voice samples to recognize dysphonic voice. Both sustained vowel ('/a/') and sentence voice samples are considered to include phonation, respiration, and resonance of the vocal tone. Also, gender bias is eliminated by considering male and female voice samples separately, as they have structurally different vocal tracts, pharynx, and oral cavities. The simulation results show that the cochleagram, coined with a designed pre-trained convolutional neural network (CNN), can achieve 95% accuracy in identifying dysphonic voices with sentence samples. A robust, noninvasive, and automated voice pathology detection system is effectively generated through perceptual analysis of voice signals. The proposed automated pathological voice detection system can objectively correlate the clinical findings and assist in monitoring the treatment progress of dysphonic voice on top of subjective assessment by clinicians.

DOI

10.1109/ACCESS.2024.3392808

E-ISSN

21693536

Creative Commons License

Creative Commons Attribution 4.0 International License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Share

COinS