Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2017, Vol. 57 Issue (1) : 1-6     DOI: 10.16511/j.cnki.qhdxxb.2017.21.001
COMPUTER SCIENCE AND TECHNOLOGY |
Uyghur morphological segmentation with bidirectional GRU neural networks
ABUDUKELIMU Halidanmu, CHENG Yong, LIU Yang, SUN Maosong
State Key Laboratory of Intelligent Technology and Systems, Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Download: PDF(1041 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Information processing of low-resource, morphologically-rich languages such as Uyghur is critical for addressing the language barrier problem faced by the One Belt and One Road (B&R) program in China. In such languages, individual words encode rich grammatical and semantic information by concatenating morphemes to a root form, which leads to severe data sparsity for language processing. This paper introduces an approach for Uyghur morphological segmentation which divides Uyghur words into sequences of morphemes based on bidirectional gated recurrent unit (GRU) neural networks. The bidirectional GRU exploits the bidirectional context to resolve ambiguities and model long-distance dependencies using the gating mechanism. Tests show that this approach significantly outperforms conditional random fields and unidirectional GRUs. This approach is language-independent and can be applied to all morphologically-rich languages.
Keywords bidirectional gated recurrent unit      neural network      Uyghur      morphological segmentation     
ZTFLH:  TP391.2  
Issue Date: 15 January 2017
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
Cite this article:   
ABUDUKELIMU Halidanmu, CHENG Yong, LIU Yang, SUN Maosong. Uyghur morphological segmentation with bidirectional GRU neural networks[J]. Journal of Tsinghua University(Science and Technology),2017, 57(1): 1-6.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2017.21.001     OR     http://jst.tsinghuajournals.com/EN/Y2017/V57/I1/1
  
  
  
  
  
  
  
  
  
[16] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[1] Orhun M, Tanguǎ C, Adal? E. Rule based analysis of the Uyghur nouns[J]. International Journal on Asian Language Processing, 2009, 19(1):33-43.
[17] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[Z/OL]. (2014-09-01). https://arxiv.org/abs/1409.0473
[18] Schuster M, Paliwal K. Bidirectional recurrent neural networks[J]. IEEE Transactions on signal processing, 1997, 45(11):2673-2681.
[2] Sami V, Peter S, Arne G et al. Morfessor 2.0:Python Implementation and Extensions for Morfessor Baseline, ISBN 978-952-60-5501-5[R]. Helsinki:Aalto University, 2013.
[19] Graves A, Jaitly N, Mohamed A. Hybrid speech recognition with deep bidirectional ISTM[C]//2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech:IEEE, 2014:8-12.
[3] Lafferty J, McCallum A, Pereira F. Conditional random fields:probabilistic models for segmenting and labeling sequence data[C]//Proceedings of the 18th International Conference on Machine Learning. Williamstown, MA, USA:Morgan Kaufmann, 2001:282-289.
[4] Ruokolainen T, Kohonen O, Virpioja S et al. Supervised morphological segmentation in a low-resource learning setting using conditional random fields[C]//Proceeding of the Seventeenth Conference on Computational National Language Learning. Sofia, Bulgaria:Association for Computational Linguistics, 2013:8-9.
[5] Aisha B, SUN Maosong. A statistical method for Uyghur tokenization[C]//International Conference on Natural Language Processing and Knowledge Engineering. Dalian:IEEE, 2009:24-27.
[6] 买热哈巴·艾力, 姜文斌, 王志洋, 等. 维吾尔语词法分析的有向图模型[J]. 软件学报, 2012, 23(12):3115-3129. Aili M, JIANG Wenbin, WANG Zhiyang, et al. Directed graph model of Uyghur morphological analysis[J]. Journal of Software, 2012, 23(12):3115-3129. (in Chinese)
[7] Wumaier A, Tian S. Conditional random fields combined FSM stemming method for Uyghur[C]//International Conference on Computer Science and Information Technology. Beijing:IEEE, 2009:8-11.
[8] Ablimit M, Kawahara T, Pattar A, et al. Stem-affix based Uyghur morphological analyzer[J]. International Journal of Future Generation Communication and Networking, 2016, 9(2):59-72.
[9] Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[Z/OL]. (2014-12-11). https://arxiv.org/abs/1412.3555.
[10] Chen X, Qiu X, Zhu C et al. Long short-term memory neural networks for Chinese word segmentation[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal:Association for Computational Linguistics, 2015:17-21.
[11] Yao Y, Huang Z. Bi-directional LSTM recurrent neural network for Chinese word segmentation[Z/OL]. (2016-02-16). http://arxiv.org/abs/1602.04874.
url: http://arxiv.org/abs/1602.04874.
[12] Morita H, Kawahara D, Kurohashi S. Morphological analysis for unsegmented languages using recurrent neural network language model[C]//Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal:Association for Computational Linguistics, 2015:17-21.
[13] Wang L, Cao Z, Xia Y, et al. Morphological segmentation with window ISTM neural networks[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence. Phoenix, AZ, USA:Association for the Advancement of Artificial Intelligence, 2016:2842-2848.
[14] Wang P, Qian Y, Soong F, et al. Part-of-speech tagging with bidirectional long short-term memory recurrent neural network[Z/OL]. (2015-10-21). http://arxiv.org/abs/1510.06168.
url: http://arxiv.org/abs/1510.06168.
[15] Bengio Y, Simard P, Frasconi P. Learning long-term dependencies with gradient descent is difficult[J]. IEEE Transactions on neural networks, 1994, 5(2):157-166.
[16] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[17] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[Z/OL]. (2014-09-01). https://arxiv.org/abs/1409.0473
[18] Schuster M, Paliwal K. Bidirectional recurrent neural networks[J]. IEEE Transactions on signal processing, 1997, 45(11):2673-2681.
[19] Graves A, Jaitly N, Mohamed A. Hybrid speech recognition with deep bidirectional ISTM[C]//2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech:IEEE, 2014:8-12."
[1] ZHANG Xueqin, LIU Gang, WANG Zhineng, LUO Fei, WU Jianhua. Microscopic diffusion prediction based on multifeature fusion and deep learning[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(4): 688-699.
[2] ZHANG Mingfang, LI Guilin, WU Chuna, WANG Li, TONG Lianghao. Estimation algorithm of driver's gaze zone based on lightweight spatial feature encoding network[J]. Journal of Tsinghua University(Science and Technology), 2024, 64(1): 44-54.
[3] WANG Qingren, WANG Yinzi, ZHONG Hong, ZHANG Yiwen. Chinese-oriented entity recognition method of character vocabulary combination sequence[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(9): 1326-1338.
[4] YANG Bo, QIU Lei, WU Shu. A collaborative filtering model based on heterogeneous graph neural network[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(9): 1339-1349.
[5] FU Wen, WEN Hao, HUANG Junhui, SUN Binxuan, CHEN Jiajie, CHEN Wu, FENG Yue, DUAN Xingguang. Adaptive sliding mode control of underwater manipulator based on nonlinear dynamics model compensation[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(7): 1068-1077.
[6] HUANG Ben, KANG Fei, TANG Yu. A real-time detection method for concrete dam cracks based on an object detection algorithm[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(7): 1078-1086.
[7] CHEN Bo, ZHANG Hua, CHEN Yongcan, LI Yonglong, XIONG Jinsong. Semantic segmentation method of hydraulic structure crack based on feature enhancement[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(7): 1135-1143.
[8] DAI Xin, HUANG Hong, JI Xinyu, WANG Wei. Spatiotemporal rapid prediction model of urban rainstorm waterlogging based on machine learning[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(6): 865-873.
[9] LI Congjian, GAO Hang, LIU Yi. Fast reconstruction of a wind field based on numerical simulation and machine learning[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(6): 882-887.
[10] DU Xiaochuang, LIANG Manchun, LI Ke, YU Yancheng, LIU Xin, WANG Xiangwei, WANG Rudong, ZHANG Guojie, FU Qi. A gamma radionuclide identification method based on convolutional neural networks[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(6): 980-986.
[11] AN Jian, CHEN Yuxuan, SU Xingyu, ZHOU Hua, REN Zhuyin. Applications and prospects of machine learning in turbulent combustion and engines[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(4): 462-472.
[12] SUN Jihao, SONG Ying, SHI Yunjiao, ZHAO Ningbo, ZHENG Hongtao. Prediction of the pollutant generation of a natural gas-powered coaxial staged combustor[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(4): 649-659.
[13] LIU Jiangfan, GE Bing, LI Shanshan, LU Xiang. A prediction method for wall cooling efficiency of combustor chamber based on neural network[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(4): 681-690.
[14] GUO Shiyuan, MA Weizhi, LU Ruilin, LIU Jinlong, YANG Zhigang, WANG Zhongjing, ZHANG Min. Prediction of canal discharge under complex conditions based on a long short-term memory neural network[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(12): 1924-1934.
[15] DENG Qing, ZHANG Bo, LI Yihao, ZHOU Liang, ZHOU Zhengqing, JIANG Huiling, GAO Yang. Crowd counting model for evacuation scenarios based on a cascaded CNN[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(1): 146-152.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd