语言冗余和韵律结构对普通话音节时长的影响

刘晓旺, 郝韵, 张劲松

清华大学学报(自然科学版) ›› 2024, Vol. 64 ›› Issue (11) : 1911-1918.

PDF(2137 KB)
PDF(2137 KB)
清华大学学报(自然科学版) ›› 2024, Vol. 64 ›› Issue (11) : 1911-1918. DOI: 10.16511/j.cnki.qhdxxb.2024.26.025
专题:人机语音通讯

语言冗余和韵律结构对普通话音节时长的影响

  • 刘晓旺1,2, 郝韵3, 张劲松1,2
作者信息 +

Effects of language redundancy and prosodic structure on syllable duration in Mandarin Chinese

  • LIU Xiaowang1,2, HAO Yun3, ZHANG Jinsong1,2
Author information +
文章历史 +

摘要

语音信息论主要研究语言信息冗余和声学特征之间的关系。目前的研究结论主要来自印欧语系的语言, 而在汉语普通话方面的研究较少, 尤其是在语言冗余与韵律结构(边界和重音)的对应关系问题上, 还缺乏研究。该文运用信息理论中的惊异度(surprisal)概念表示语言冗余程度, 通过计算汉语朗读语篇语料库(ASCCD)中汉字层级的一元惊异度(unigram surprisal)和二元惊异度(bigram surprisal), 考察普通话中语言冗余、 韵律结构和音节时长三者之间的关系。结果表明: 给定前字的二元惊异度与重音程度有关, 给定后字的二元惊异度与边界层级存在对应关系; 在控制韵律结构因素后, 语言冗余能够独立解释音节时长的变化, 从而支持弱化版的平稳信号冗余假设。

Abstract

[Objective] Information theory in phonetics primarily investigates the relationship between language redundancy and acoustic features. Language redundancy refers to the predictability of linguistic information, which arises from lexical, syntactic, and semantic contextual factors. The more predictable the information, the higher its redundancy. Numerous studies suggest that when spoken, linguistic units with higher redundancy tend to be shorter in duration. The smooth signal redundancy hypothesis posits that the influence of language redundancy on duration is modulated by prosodic structures. These structures adjust acoustic features by assigning stress and boundaries to elements with lower language redundancy, thus achieving an inverse relationship between language redundancy and duration. However, these conclusions are predominantly based on Indo-European languages, leaving a research gap for Mandarin Chinese. Moreover, there is a lack of research on the correspondence between linguistic redundancy and prosodic structure. Thus, this study aims to investigate the relationships among language redundancy, prosodic structure, and syllable duration, specifically within the context of Mandarin Chinese. [Methods] This study quantifies language redundancy using the concept of surprisal, a principle derived from information theory. A large-scale textual corpus was used to train a 2-gram Chinese character-level language model, which was used to estimate unigram and bigram surprisal. Additionally, The corpus employed in this study comprises Annotated Speech Corpus of Chinese Discourse (ASCCD). The Chinese Tone and Break Index(C-ToBI) annotation system is employed to represent prosodic structures in terms of boundaries and stress. Concurrently, the duration of each syllable and its corresponding stress and boundary levels were recorded. A linear mixed-effect model was employed to explore the effects of language redundancy factors and prosodic structure on syllable duration. To verify whether language redundancy directly explains changes in syllable duration, prosodic structure factors were initially introduced as control variables in the baseline model. Subsequently, the factors of language redundancy were added. By comparing changes in the model's log-likelihood values, any substantial effects of language redundancy on syllable duration can be identified. [Results] The experimental findings revealed a consistent relationship between language redundancy and syllable duration across different Mandarin speakers. Moreover, a moderate correspondence between language redundancy and prosodic structure was observed. However, different redundancy factors were associated with distinct aspects of the prosodic structure. Based on these experimental results, a correlation existed between forward surprisal and stress levels, whereas backward surprisal correlated with boundary levels. Specifically, higher forward surprisal indicated lower redundancy, leading to more salient syllables during speech production. Conversely, elevated backward surprisal corresponded to higher boundary levels. The successive inclusion of prosodic structure factors and language redundancy factors when examining the effects on Mandarin syllable duration enhanced the model's fit. This indicated that controlling for prosodic structure factors allowed language redundancy factors to independently account for changes in syllable duration. [Conclusions] The experimental results of this study support a weak version of the smooth signal redundancy hypothesis. Prosodic structures are confirmed to modulate language redundancy, whereas language redundancy directly accounts for changes in syllable duration. Given that this study relies on read speech data, it opens up an avenue for future research on spontaneous speech. It will also be beneficial to explore the relationship between different methods of measuring language redundancy and prosodic structure. Moreover, understanding their effect on other acoustic features, such as the fundamental frequency, presents another promising research direction.

关键词

语言冗余 / 音节时长 / 韵律结构 / 平稳信号冗余假设

Key words

language redundancy / syllable duration / prosodic structure / smooth signal redundancy hypothesis

引用本文

导出引用
刘晓旺, 郝韵, 张劲松. 语言冗余和韵律结构对普通话音节时长的影响[J]. 清华大学学报(自然科学版). 2024, 64(11): 1911-1918 https://doi.org/10.16511/j.cnki.qhdxxb.2024.26.025
LIU Xiaowang, HAO Yun, ZHANG Jinsong. Effects of language redundancy and prosodic structure on syllable duration in Mandarin Chinese[J]. Journal of Tsinghua University(Science and Technology). 2024, 64(11): 1911-1918 https://doi.org/10.16511/j.cnki.qhdxxb.2024.26.025

参考文献

[1] AYLETT M, TURK A. The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech [J]. Language and Speech, 2004, 47(1): 31-56.
[2] LIEBERMAN P. Some effects of semantic and grammatical context on the production and perception of speech [J]. Language and Speech, 1963, 6(3): 172-187.
[3] SHANNON C E. A mathematical theory of communication [J]. The Bell System Technical Journal, 1948, 27(3): 379-423.
[4] ZIPF G K. Relative frequency as a determinant of phonetic change [J]. Harvard Studies in Classical Philology, 1929, 40: 1-95.
[5] JURAFSKY D, BELL A, GREGORY M, et al. Probabilistic relations between words: Evidence from reduction in lexical production [M]//BYBEE J, HOPPER P. Frequency and the emergence of linguistic structure. Amsterdam: John Benjamins Publishing Company, 2001: 229-254.
[6] AYLETT M, TURK A. Language redundancy predicts syllabic duration and the spectral characteristics of vocalic syllable nuclei [J]. The Journal of the Acoustical Society of America, 2006, 119(5): 3048-3058.
[7] BELL A, BRENIER J M, GREGORY M, et al. Predictability effects on durations of content and function words in conversational English [J]. Journal of Memory and Language, 2009, 60(1): 92-111.
[8] PIANTADOSI S T, TILY H, GIBSON E. Word lengths are optimized for efficient communication [J]. Proceedings of the National Academy of Sciences of the United States of America, 2011, 108(9): 3526-3529.
[9] TANG K, SHAW J A. Prosody leaks into the memories of words [J]. Cognition, 2021, 210: 104601.
[10] VAN SON R J J H, VAN SANTEN J P H. Duration and spectral balance of intervocalic consonants: A case for efficient communication [J]. Speech Communication, 2005, 47(1-2): 100-123.
[11] MALISZ Z, BRANDT E, MÖBIUS B, et al. Dimensions of segmental variability: Interaction of prosody and surprisal in six languages [J]. Frontiers in Communication, 2018, 3: 25.
[12] BRANDT E, MÖBIUS B, ANDREEVA B. Dynamic formant trajectories in German read speech: Impact of predictability and prominence [J]. Frontiers in Communication, 2021, 6: 643528.
[13] TURK A. Does prosodic constituency signal relative predictability? A smooth signal redundancy hypothesis [J]. Laboratory Phonology, 2010, 1(2): 227-262.
[14] PAN S M, HIRSCHBERG J. Modeling local context for pitch accent prediction [C]//Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics. Hong Kong, China: ACL, 2000: 233-240.
[15] CUTLER A, CARTER D M. The predominance of strong initial syllables in the English vocabulary [J]. Computer Speech & Language, 1987, 2(3-4): 133-142.
[16] LADD D R. Intonational phonology [M]. Cambridge: Cambridge University Press, 1996.
[17] 冯勇强, 初敏, 贺琳, 等. 汉语话语音节时长统计分析 [C]//新世纪的现代语音学: 第五届全国现代语音学学术会议论文集. 北京, 中国: 清华大学出版社, 2001: 74-77.FENG Y Q, CHU M, HE L, et al. Statistical analysis of Chinese speech syllable duration [C]//Modern Phonetics in the New Century: Proceedings of the Fifth National Conference on Modern Phonetics. Beijing, China: Tsinghua University Press, 2001: 74-77. (in Chinese)
[18] 熊子瑜. 韵律单元边界特征的声学语音学研究 [J]. 语言文字应用, 2003(2): 116-121.XIONG Z Y. An acoustic study of the boundary features of prosodic unit [J]. Applied Linguistics, 2003(2): 116-121. (in Chinese)
[19] 倪崇嘉, 刘文举, 徐波. 汉语韵律短语的时长与音高研究[J]. 中文信息学报, 2009, 23(4): 82-87.NI C J, LIU W J, XU B. Durational characteristics and pitch characteristics of the prosodic phrase in mandarin Chinese [J]. Journal of Chinese Information Processing, 2009, 23(4): 82-87. (in Chinese)
[20] 梅晓, 熊子瑜. 普通话韵律结构对声韵母时长影响的分析[J]. 中文信息学报, 2010, 24(4): 96-103.MEI X, XIONG Z Y. Analysis of duration of mandarin prosodic structures [J]. Journal of Chinese Information Processing, 2010, 24(4): 96-103. (in Chinese)
[21] 殷治纲. 再论韵律边界的声学特征及其形成机制 [J]. 中国语音学报, 2020(1): 38-50. YIN Z G. Revisiting the acoustic characteristics and the generation mechanism of prosodic boundary [J]. Chinese Journal of Phonetics, 2020(1): 38-50. (in Chinese)
[22] 赵元任. 汉语口语语法 [M]. 吕叔湘, 译. 北京: 商务印书馆, 1979.ZHAO Y R. A grammar of spoken Chinese [M]. LÜ S X, Trans. Beijing: The Commercial Press, 1979. (in Chinese)
[23] 许洁萍, 初敏, 贺琳, 等. 汉语语句重音对音高和音长的影响 [J]. 声学学报, 2000, 25(4): 335-339. XU J P, CHU M, HE L, et al. The influence of Chinese sentence stress on pitch and duration [J]. Acta Acustica, 2000, 25(4): 335-339. (in Chinese)
[24] 贾媛. 普通话焦点的语音实现及音系分析 [D]. 天津: 南开大学, 2009.JIA Y. Phonetic realization and phonological analysis of focus in standard Chinese [D]. Tianjin: Nankai University, 2009. (in Chinese)
[25] 曹文. 汉语焦点重音的韵律实现: 普通话同文异焦句的实验研究 [M]. 北京: 北京语言大学出版社, 2010.CAO W. Prosodic realization of Chinese focus stress: An experimental study on same-sentence different-focus constructions in Mandarin Chinese [M]. Beijing: Beijing Language and Culture University Press, 2010. (in Chinese)
[26] 郝韵, 解焱陆, 林炳怀, 等. 基于GPT-2和互信息的语言单位信息量对韵律特征的影响 [C]//第二十一届中国计算语言学大会. 南昌, 中国: 中国中文信息学会, 2022: 46-55. HAO Y, XIE Y L, LIN B H. Prosodic effects of speech unit's information based on GPT-2 and mutual information [C]//Proceedings of the 21st Chinese National Conference on Computational Linguistics. Nanchang, China: Chinese Information Processing Society of China, 2022: 46-55. (in Chinese)
[27] LI A J, LIN M C, CHEN X X, et al. Speech corpus of Chinese discourse and the phonetic research [C]//Proceedings of the 6th International Conference on Spoken Language Processing. Beijing, China: China Military Friendship Publish, 2000: 13-18.
[28] XU L, ZHANG X W, DONG Q Q. CLUECorpus2020: A large-scale Chinese corpus for pre-training language model [Z]. arXiv Preprint, arXiv: 2003.01355, 2020.
[29] HEAFIELD K. KenLM: Faster and smaller language model queries [C]//Proceedings of the Sixth Workshop on Statistical Machine Translation. Edinburgh, Scotland: Association for Computational Linguistics, 2011: 187-197.
[30] BATES D, MÄCHLER M, BOLKER B, et al. Fitting linear mixed-effects models using lme4 [J]. Journal of Statistical Software, 2015, 67(1): 1-48.
[31] MUNSON B, SOLOMON N P. The effect of phonological neighborhood density on vowel articulation [J]. Journal of Speech, Language, and Hearing Research, 2004, 47(5): 1048-1058.

基金

中央高校基本科研业务专项资金(23YBT18);教育部人文社科研究规划项目(23YJA740012);国际中文教育研究重点研究项目(22YH49B)

PDF(2137 KB)

Accesses

Citation

Detail

段落导航
相关文章

/