EconPapers    
Economics at your fingertips  
 

Using a forced aligner for prosody research

Hongchen Wu, Jiwon Yun (), Xiang Li, Huiyi Huang and Chuandong Liu
Additional contact information
Hongchen Wu: Georgia Institute of Technology
Jiwon Yun: Stony Brook University
Xiang Li: Georgia Institute of Technology
Huiyi Huang: Georgia Institute of Technology
Chuandong Liu: Georgia Institute of Technology

Palgrave Communications, 2023, vol. 10, issue 1, 1-13

Abstract: Abstract Forced alignment is a speech technique that can automatically align audio files with transcripts. With the help of forced alignment tools, annotating audio files and creating annotated speech databases have become much more accessible and efficient. Researchers have recently started to evaluate the benefits and accuracy of forced aligners in speech research and have provided insightful suggestions for improvement. However, current work has so far paid little attention to evaluating forced aligners in prosody research, which focuses on suprasegmental features. In this paper, we take ambiguous sentence-level audio input in Mandarin Chinese, which can be disambiguated prosodically, to evaluate the alignment accuracy of the Montreal Forced Aligner (MFA). With a satisfactory result for syllable-by-syllable alignment, we further explore the possibility and benefits of using the forced alignment tool to generate phrase-by-phrase alignment. This topic has barely been studied in previous research on forced alignment. Our paper demonstrates that the forced alignment tool can effectively generate accurate alignment at both syllable and phrase levels for tonal languages, such as Mandarin. We found that the average differences between human annotators and MFA were smaller than the gold standard, indicating a satisfactory level of performance by the tool. Moreover, the MFA-assisted annotation rate by human transcribers was at least 20 times faster than previously reported manual annotation efficiency, providing significant time and resource savings for prosody researchers. Our results also suggest that phrase-level alignment accuracy of MFA can be affected by the quality of the recording, calling prosody researchers’ attention to controlling the audio quality in the recording. The finding that de-stressed words/phrases pose challenges for MFA also provides a reference for improving forced aligners.

Date: 2023
References: View complete reference list from CitEc
Citations:

Downloads: (external link)
http://link.springer.com/10.1057/s41599-023-01931-4 Abstract (text/html)
Access to full text is restricted to subscribers.

Related works:
This item may be available elsewhere in EconPapers: Search for items with the same title.

Export reference: BibTeX RIS (EndNote, ProCite, RefMan) HTML/Text

Persistent link: https://EconPapers.repec.org/RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01931-4

Ordering information: This journal article can be ordered from
https://www.nature.com/palcomms/about

DOI: 10.1057/s41599-023-01931-4

Access Statistics for this article

More articles in Palgrave Communications from Palgrave Macmillan
Bibliographic data for series maintained by Sonal Shukla () and Springer Nature Abstracting and Indexing ().

 
Page updated 2025-03-19
Handle: RePEc:pal:palcom:v:10:y:2023:i:1:d:10.1057_s41599-023-01931-4