Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2018, Vol. 58 Issue (1) : 61-66,74     DOI: 10.16511/j.cnki.qhdxxb.2018.21.003
AUTOMATION |
Automatic prosodic boundary labeling based on fusing the silence duration with the lexical features
FU Ruibo1,2, TAO Jianhua1,2,3, LI Ya1, WEN Zhengqi1
1. National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China;
2. School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100190, China;
3. CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Download: PDF(1161 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Automatic prosodic boundary labeling is important in the construction of a speech corpus for speech synthesis. Automatic labeling of prosodic boundaries gives more consistent results than manual labeling of prosodic boundaries which is time consuming and inconsistent. Manual labeling method is modelled here using a recurrent neural network to train two sub-models which use lexical features and acoustic features to label the prosodic boundaries. Model fusion is then used to combine the outputs of the two sub-models to obtain the optimal labeling results. The silence durations for each word give clearer physical meanings and better correlations with the prosodic boundaries than the acoustic features used in traditional methods extracted frame-by-frame. Tests show that the silence durations extracted using the current acoustic features and the model fusion method improve the prosodic boundary labeling compared with previous feature fusion methods.
Keywords prosodic boundary labeling      ensemble strategy      silence duration      corpus construction      speech synthesis     
ZTFLH:  H116.4  
  TP181  
Issue Date: 15 January 2018
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
FU Ruibo
TAO Jianhua
LI Ya
WEN Zhengqi
Cite this article:   
FU Ruibo,TAO Jianhua,LI Ya, et al. Automatic prosodic boundary labeling based on fusing the silence duration with the lexical features[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(1): 61-66,74.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2018.21.003     OR     http://jst.tsinghuajournals.com/EN/Y2018/V58/I1/61
  
  
  
  
  
  
[1] CHU M, QIAN Y. Locating boundaries for prosodic constituents in unrestricted Mandarin texts[J]. Computational Linguistics and Chinese Language Processing, 2001, 6(1):61-82.
[2] WANG M Q, HIRSCHBERG J. Automatic classification of intonational phrase boundaries[J]. Computer Speech & Language, 1992, 6(2):175-196.
url: http://dx.doi.org/ter Speech
[3] LEVOW G A. Automatic prosodic labeling with conditional random fields and rich acoustic features[C]//International Joint Conference on Natural Language Processing (IJCNLP). Hyderabad, India:2008:217-224.
[4] ROSENBERG A, FERNANDEZ R, RAMABHADRAN B. Modeling phrasing and prominence using deep recurrent learning[C]//Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH). Dresden, Germany, 2015:136-141.
[5] BUSSER B, DAELEMANS W, BOSCH A. Predicting phrase breaks with memory-based learning[C]//4th ISCA Tutorial and Research Workshop (ITRW) on Speech Synthesis. Edinburgh, UK:University of Edinburgh, 2001:29-34.
[6] WIGHTMAN C W, OSTENDORF M. Automatic labeling of prosodic patterns[J]. IEEE Transactions on Speech and Audio Processing, 1994, 2(4):469-481.
[7] HASEGAWA-JOHNSON M, CHEN K, COLE J, et al. Simultaneous recognition of words and prosody in the boston university radio speech corpus[J]. Speech Communication, 2005, 46(3):418-439.
[8] CHEN Q, LING Z H, YANG C Y, et al. Automatic phrase boundary labeling of speech synthesis database using context-dependent HMMs and N-Gram prior distributions[C]//Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH). Dresden, Germany, 2015:227-234.
[9] DING C, XIE L, YAN J, et al. Automatic prosody prediction for Chinese speech synthesis using BLSTM-RNN and embedding features[C]//Automatic Speech Recognition and Understanding (ASRU). Scottsdale, USA, 2015:98-102.
[10] LIN C K, LEE L S. Improved spontaneous Mandarin speech recognition by disfluency interruption point (IP) detection using prosodic features[C]//Ninth European Conference on Speech Communication and Technology. Lisbon, Portuguese, 2005:78-85.
[11] TIELEMAN T, HINTON G. Lecture 6.6-Rmsprop:Divide the gradient by a running average of its recent magnitude[Z/OL].[2017-01-01]. https://www.coursera.org/learn/neural-networks.
[12] HINTON G E, SRIVASTAVA N, KRIZHEVSKY A, et al. Improving neural networks by preventing co-adaptation of feature detectors[J]. Computer Science, 2012, 3(4):212-223.
[1] XIE Yanlu, ZHANG Bei, ZHANG Jinsong. Tone training for Mandarin two-syllable words based on pitch projection synthesized speech[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 170-175.
[2] GAO Yingying, ZHU Weibin. Describing and predicting affective messages for expressive speech synthesis[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(2): 202-207.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd