Abstract：Chinese word segmentation is a prerequisite for information retrieval. With the arrival of big data, information retrieval needs more precise word segmentation and recall. This paper presents a Chinese word segmentation method for short Chinese texts. The method first uses a conditional random field model to label the words with special tags to obtain preliminary results. Then, it uses the traditional dictionary-based method to improve the initial result to complete the word segmentation. This method improves recognition of “out of vocabulary” words and overlap ambiguities over the traditional method, with F-Scores over 0.95 with the 4 corpora of the Sighan 2005 bakeoff. Tests show that this method is better for short text Chinese word segmentation for information retrieval.
刘泽文, 丁冬, 李春文. 基于条件随机场的中文短文本分词方法[J]. 清华大学学报（自然科学版）, 2015, 55(8): 906-910,915.
LIU Zewen, DING Dong, LI Chunwen. Chinese word segmentation method for short Chinese text based on conditional random fields. Journal of Tsinghua University(Science and Technology), 2015, 55(8): 906-910,915.
 SUN Xu, ZHANG Yaozhong, Matsuzaki T, et al. Probabilistic Chinese word segmentation with non-local information and stochastic training [J]. Information Processing & Management, 2013, 49(3): 626-636.
 Lafferty J, Mccallum A, Pereira F. Conditional random fields: Probabilistic models for segmenting and labeling sequence data [C]// ACM, Proceedings of the 18th International Conference on Machine Learning. Williamstown, MA, USA: Scholarly Commons, 2001: 282-289.
 ZHANG Meishan, DENG Zhilong, CHE Wanxiang, et al. Combining statistical model and dictionary for domain adaption of Chinese word segmentation [J]. Journal of Chinese Information Processing, 2012, 26(2): 8-12.
 Fosler-Lussier E, HE Yanzhang, Jyothi P, et al. Conditional random fields in speech, audio, and language processing [J]. Proceedings of the IEEE, 2013, 101(5): 1054-1075.
 YANG Yanfeng, YANG Yanqin, GUAN Hu, et al. Out-of-vocabulary words recognition based on conditional random field in electronic commerce [J]. Lecture Notes in Computer Science, 2014, 8835: 532-539.
 Chellappa R, Fain A, Chellappa R, et al. Markov Random Fields: Theory and Applications [M]. San Diego, CA, USA: Academic Press Inc., 1993.
 CHEN Lei, LI Miao, ZHANG Jian, et al. A double-layer word segmentation combined with local ambiguity word grid and CRF [J]. Transactions on Computer Science & Technology, 2013 (1): 1-8.
 Ray A, Chandawala A, Chaudhury S. Character Recognition Using Conditional Random Field Based Recognition Engine [C]// IEEE, Proceedings of 12th International Conference on Document Analysis and Recognition. Washington DC, USA: IEEE Computer Society, 2013: 18-22.
 Sha F, Pereira F, Shallow parsing with conditional random fields [C]// ACL, Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology. Boston, MA, USA: Association for Computational Linguistics, 2003, 1: 134-141.
 Nocedal J, Updating quasi-Newton matrices with limited storage [J]. Mathematics of computation, 1980, 35(151): 773-782.
ZHAO Hai, HUANG Changning, LI Mu. An improved Chinese word segmentation system with conditional random field [C]// ACL, 2006a. Proceedings of the Fifth Sighan Workshop on Chinese Language Processing. Sydney, Australia: Association for Computational Linguistics, 2006: 162-165.
Tseng H, Chang P, Andrew G, et al. A conditional random field word segmenter for Sighan bakeoff 2005 [C]// ACL, Proceedings of the Fourth Sighan Workshop on Chinese Language Processing. Jeju Island, Korea: Association for Computational Linguistics, 2005: 168-171.
PENG Fuchuan, FENG Fangfang, Mccallum A. Chinese segmentation and new word detection using conditional random fields [C]// Proceedings of Coling 2004. Genera, Switzerland, 2004: 562-568.