"[1] XIE Lei, SUN Naicai, FAN Bo. A statistical parametric approach to video-realistic text-driven talking avatar[J]. Multimedia Tools and Applications, 2014, 73(1):377-396. [2] Berger M A, Hofer G, Shimodaira H. Carnival-combining speech technology and computer animation[J]. Computer Graphics and Applications, IEEE, 2011, 31(5):80-89. [3] YANG Minghao, TAO Jianhua, MU Kaihui, et al. A multimodal approach of generating 3D human-like talking agent[J]. Journal on Multimodal User Interfaces, 2012, 5(1-2):61-68. [4] Bregler C, Covell M, Slaney M. Video rewrite:Driving visual speech with audio[C]//Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques. Los Angeles, CA, USA:ACM Press, 1997:353-360. [5] Huang F J, Cosatto E, Graf H P. Triphone based unit selection for concatenative visual speech synthesis[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Orlando, FL, USA:IEEE, 2002:2037-2040. [6] Ezzat T, Geiger G, Poggio T. Trainable videorealistic speech animation[J]. Acm Transactions on Graphics, 2004, 3(3):57-64. [7] TAO Jianhua, YIN Panrong. Speech driven face animation based on dynamic concatenation model[J]. J Inf Computat Sci, 2007, 4(1):271-280. [8] JIA Jia, WU Zhiyong, ZHANG Shen, et al. Head and facial gestures synthesis using PAD model for an expressive talking avatar[J]. Multimedia Tools and Applications, 2014, 73(1):439-461. [9] ZHAO Kai, WU Zhiyong, JIA Jia, et al. An online speech driven talking head system[C]//Proceedings of the Global High Tech Congress on Electronics. Shenzhen, China:IEEE Press, 2012:186-187. [10] Sako S, Tokuda K, Masuko T, et al. HMM-based text-to-audio-visual speech synthesis[C]//Proceedings of the International Conference on Spoken Language Processing. Beijing, China:IEEE Press, 2000:25-28 [11] Eddy S R. Hidden markov models[J]. Current Opinion in Structural Biology, 1996, 6(3):361-365. [12] WANG Lijuan, QIAN Xiaojun, HAN Wei et al. Synthesizing photo-real talking head via trajectory-guided sample selection[C]//Proceedings of the International Speech Communication Association. Makuhari, Japan:IEEE Press, 2010:446-449. [13] Ze H, Senior A, Schuster M. Statistical parametric speech synthesis using deep neural networks[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada:IEEE Press, 2013:7962-7966. [14] Hinton G, DENG Li, YU Dong, et al. Deep neural networks for acoustic modeling in speech recognition:The shared views of four research groups[J]. Signal Processing Magazine, IEEE, 2012, 29(6):82-97. [15] FAN Yuchen, QIAN Yao, XIE Fenglong, et al. TTS synthesis with bidirectional LSTM based recurrent neural networks[C]//Proceedings of the International Speech Communication Association. Singapore:IEEE Press, 2014:1964-1968. [16] Kang S Y, Qian X J, Meng H. Multi-distribution deep belief network for speech synthesis[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada:IEEE Press, 2013:8012-8016. [17] Schuster M, Paliwal K K. Bidirectional recurrent neural networks[J]. IEEE Transactions on Signal Processing, 1997, 45(11):2673-2681. [18] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural computation, 1997, 9(8):1735-1780. [19] FAN Bo, WANG Lijuan, Song F K, et al. Photo-real talking head with deep bidirectional LSTM[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Brisbane, Australia:IEEE Press, 2015:4884-4888. [20] Cootes T F, Edwards G J, Taylor C J. Active appearance models[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2001, 23(6):681-685. [21] Werbos P J. Backpropagation through time:What it does and how to do it[J]. Proceedings of the IEEE, 1990, 78(10):1550-1560. [22] Williams R J, Zipser D. Gradient-based learning algorithms for recurrent networks and their computational complexity[J]. Back-propagation:Theory, Architectures and Applications, 1995:433-486. [23] Pérez P, Gangnet M, Blake A. Poisson image editing[C]//Proceedings of the ACM Transactions on Graphics. New York, NY, USA:ACM, 2003:313-318. [24] WANG Qiang, ZHANG Weiwei, TANG Xiaoou, et al. Real-time bayesian 3-D pose tracking[J]. Circuits and Systems for Video Technology, IEEE Transactions on, 2006, 16(12):1533-1541. [25] Jolliffe I T. Principal component analysis[J]. Springer Berlin, 1986, 87(100):41-64. [26] Stegmann M B. Active appearance models:Theory extensions and cases[J]. Informatics & Mathematical Modelling, 2000, 1(6):748-754. [27] Roweis S. EM algorithms for PCA and SPCA[J]. Advances in Neural Information Processing Systems, 1999, 10:626-632. [28] Cootes T F, Kittipanya-ngam P. Comparing variations on the active appearance model algorithm[C]//Proceedings of the 13th British Machine Vision Conference. Cardiff, Wales, UK:BMVA, 2002:1-10. [29] Graves A, Mohamed A R, Hinton G. Speech recognition with deep recurrent neural networks[C]//Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing. Vancouver, Canada:IEEE Press, 2013:6645-6649. [30] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. [31] Schuster M, Paliwal K K. Bidirectional recurrent neural networks[J]. Signal Processing, IEEE Transactions on, 1997, 45(11):2673-2681. [32] Theobald B J, Fagel S, Bailly G, et al. LIPS2008:Visual speech synthesis challenge[C]//Proceedings of the International Speech Communication Association. Brisbane, Australia:IEEE Press, 2008:2310-2313. [33] Young S, Evermann G, Gales M, et al. The HTK book[M]. Cambridge:Cambridge University Engineering Department, 2002."