Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

John H L Hansen, Marigona Bokshi, Soheil Khorram

Journal of the Acoustical Society of America 2020 August

Speech production variability introduces significant challenges for existing speech technologies such as speaker identification (SID), speaker diarization, speech recognition, and language identification (ID). There has been limited research analyzing changes in acoustic characteristics for speech produced by untrained singing versus speaking. To better understand changes in speech production of the untrained singing voice, this study presents the first cross-language comparison between normal speaking and untrained karaoke singing of the same text content. Previous studies comparing professional singing versus speaking have shown deviations in both prosodic and spectral features. Some investigations also considered assigning the intrinsic activity of the singing. Motivated by these studies, a series of experiments to investigate both prosodic and spectral variations of untrained karaoke singers for three languages, American English, Hindi, and Farsi, are considered. A comprehensive comparison on common prosodic features, including phoneme duration, mean fundamental frequency (F0), and formant center frequencies of vowels was performed. Collective changes in the corresponding overall acoustic spaces based on the Kullback-Leibler distance using Gaussian probability distribution models trained on spectral features were analyzed. Finally, these models were used in a Gausian mixture model with universal background model SID evaluation to quantify speaker changes between speaking and singing when the audio text content is the same. The experiments showed that many acoustic characteristics of untrained singing are considerably different from speaking when the text content is the same. It is suggested that these results would help advance automatic speech production normalization/compensation to improve performance of speech processing applications (e.g., speaker ID, speech recognition, and language ID).

Full text links

We have located links that may give you full text access.

Show additional links to paperHide additional links to paper

PubMed

Add to Saved Papers

Get 1-tap access

Related Resources

Haemodynamic monitoring during noncardiac surgery: past, present, and future.Karim Kouz et al.Journal of Clinical Monitoring and Computing 2024 April 31

Obesity pharmacotherapy in older adults: a narrative review of evidence.Alex E Henney et al.International Journal of Obesity 2024 May 7

2024 AHA/ACC/AMSSM/HRS/PACES/SCMR Guideline for the Management of Hypertrophic Cardiomyopathy: A Report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines.Steve R Ommen et al.Circulation 2024 May 9

Use of Intravenous Albumin: A Guideline from the International Collaboration for Transfusion Medicine Guidelines.Jeannie Callum et al.Chest 2024 March 5

''Myth Busting in Infectious Diseases'': A Comprehensive Review.Ali Almajid et al.Curēus 2024 March

SGLT2 Inhibitors in Kidney Diseases-A Narrative Review.Agata Gajewska et al.International Journal of Molecular Sciences 2024 May 2

For the best experience, use the Read mobile app

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

All material on this website is protected by copyright, Copyright © 1994-2024 by WebMD LLC.
This website also contains material copyrighted by 3rd parties.

By using this service, you agree to our terms of use and privacy policy.

Your Privacy Choices

You can now claim free CME credits for this literature searchClaim now

Get seemless 1-tap access through your institution/university

For the best experience, use the Read mobile app

Speech variability: A cross-language study on acoustic variations of speaking versus untrained singing.

Full text links

Related Resources

Trending Papers

For the best experience, use the Read mobile app