How ready is speech-to-text for psychological language research? Evaluating the validity of AI-generated English transcripts for analyzing free-spoken responses in younger and older adults
Name:
zoom_paper_ACCEPTED.pdf
Size:
380.2Kb
Format:
PDF
Description:
Final Accepted Manuscript
Affiliation
Department of Psychology, University of ArizonaIssue Date
2024-05-21
Metadata
Show full item recordPublisher
Springer Science and Business Media LLCCitation
Pfeifer, V.A., Chilton, T.D., Grilli, M.D. et al. How ready is speech-to-text for psychological language research? Evaluating the validity of AI-generated English transcripts for analyzing free-spoken responses in younger and older adults. Behav Res (2024). https://doi.org/10.3758/s13428-024-02440-1Journal
Behavior Research MethodsRights
© The Psychonomic Society, Inc. 2024.Collection Information
This item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at repository@u.library.arizona.edu.Abstract
For the longest time, the gold standard in preparing spoken language corpora for text analysis in psychology was using human transcription. However, such standard comes at extensive cost, and creates barriers to quantitative spoken language analysis that recent advances in speech-to-text technology could address. The current study quantifies the accuracy of AI-generated transcripts compared to human-corrected transcripts across younger (n = 100) and older (n = 92) adults and two spoken language tasks. Further, it evaluates the validity of Linguistic Inquiry and Word Count (LIWC)-features extracted from these two kinds of transcripts, as well as transcripts specifically prepared for LIWC analyses via tagging. We find that overall, AI-generated transcripts are highly accurate with a word error rate of 2.50% to 3.36%, albeit being slightly less accurate for younger compared to older adults. LIWC features extracted from either transcripts are highly correlated, while the tagging procedure significantly alters filler word categories. Based on these results, automatic speech-to-text appears to be ready for psychological language research when using spoken language tasks in relatively quiet environments, unless filler words are of interest to researchers.Note
12 month embargo; first published 21 May 2024EISSN
1554-3528Version
Final accepted manuscriptSponsors
National Institutes of Healthae974a485f413a2113503eed53cd6c53
10.3758/s13428-024-02440-1
