1. School of Computer and Electronic Information, Nanjing Normal University, Nanjing 210023, China; 2. School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China; 3. International College for Chinese Studies, Nanjing Normal University, Nanjing 210097, China
Abstract:Concurrent structures which are shared by the predicate-object phrase and the subject-predicate phrase in one sentence are common Chinese verb structures. However, their complexity makes such structures difficult to analyze. Therefore, recognition of concurrent structures is important for semantic analyses and downstream tasks. However, there are few existing concurrent corpora with no concurrent corpora for the Chinese AMR annotation system. This study summarizes a set of concurrent corpus annotation specifications and builds a concurrent corpus for Chinese AMR annotation systems which contains 4 760 concurrent sentences. The LA-BiLSTM-CRF model is then used to recognize concurrent structures with an F1 score of 86.06%. The recognition results are analyzed to determine needed improvements.
侯文惠, 曲维光, 魏庭新, 李斌, 顾彦慧, 周俊生. 面向中文AMR标注体系的兼语语料库构建及兼语结构识别[J]. 清华大学学报(自然科学版), 2021, 61(9): 920-926.
HOU Wenhui, QU Weiguang, WEI Tingxin, LI Bin, GU Yanhui, ZHOU Junsheng. Construction of a concurrent corpus for a Chinese AMR annotation system and recognition of concurrent structures. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 920-926.
[1] 李斌, 闻媛, 宋丽, 等. 融合概念对齐信息的中文AMR语料库的构建[J]. 中文信息学报, 2017, 31(6): 93-102.LI B, WEN Y, SONG L, et al. Construction of Chinese AMR corpus integrating concept alignment information [J]. Journal of Chinese Information Processing, 2017, 31(6): 93-102. (in Chinese) [2] 周强. 汉语句法树库标注体系[J]. 中文信息学报, 2004(4): 1-8.ZHOU Q. Chinese syntax tree bank marking system [J]. Journal of Chinese Information Processing, 2004(4): 1-8. (in Chinese) [3] 郭丽娟. 汉语依存句法分析树库构建与应用研究[D]. 苏州: 苏州大学, 2019.GUO L J. Research on construction and application of Chinese dependent syntax analysis tree bank [D]. Suzhou: Suzhou University, 2019. (in Chinese) [4] 曲维光, 周俊生, 吴晓东, 等. 自然语言句子抽象语义表示AMR研究综述[J]. 数据采集与处理, 2017, 32(1): 26-36.QU W G, ZHOU J S, WU X D, et al. A survey of AMR research on abstract semantic representation of natural language sentences [J]. Data Collection and Processing, 2017, 32(1): 26-36. (in Chinese) [5] 胡裕树. 现代汉语[M]. 上海: 上海教育出版社, 1979.HU Y S. Modern Chinese [M]. Shanghai: Shanghai Education Press, 1979. (in Chinese) [6] 邢福义, 汪国胜. 现代汉语[M]. 北京: 高等教育出版社, 2010.XING F Y, WANG G S. Modern Chinese [M]. Beijing: Higher Education Press, 2010. (in Chinese) [7] 李婷玉, 王亚, 曹聪. 兼语语义类的分类研究[J]. 计算机应用研究, 2017, 34(1):15-20. LI T Y, WANG Y, CAO C. A study on the classification of semantic classes of concurrent structure [J]. Application Research of Computers, 2017, 34(1):15-20. (in Chinese) [8] 马德全, 王利民. 兼语句的语义分析[J]. 内蒙古民族大学学报(社会科学版), 2010, 36(4): 30-32.MA D Q, WANG L M. Semantic analysis of concurrent sentences [J]. Journal of Inner Mongolia University for Nationalities (Social Science Edition), 2010, 36(4): 30-32. (in Chinese) [9] 司玉英. 双宾兼语句的语法、 语义和语用特征[J]. 内蒙古大学学报(哲学社会科学版), 2010, 42(1): 148-152.SI Y Y. The grammatical, semantic and pragmatic features of double-object sentences [J]. Journal of Inner Mongolia University for Nationalities (Social Science Edition), 2010, 42(1): 148-152. (in Chinese) [10] 傅成宏. 现代汉语兼语结构的自动识别[D]. 南京: 南京师范大学, 2007.FU C H. Automatic recognition of modern Chinese concurrent structure [D]. Nanjing: Nanjing Normal University, 2007. (in Chinese) [11] 陈静, 王东波, 谢靖, 等. 基于条件随机场的兼语结构自动识别[J]. 情报科学, 2012, 30(3): 439-443.CHEN J, WANG D B, XIE J, et al. Automatic recognition of concurrent structure based on conditional random field [J]. Information Science, 2012, 30(3):439-443. (in Chinese) [12] PINHERIO R C P H O, PEDRO H. Recurrent convolutional neural networks for scene parsing [C]//International Conference of Machine Learning. Beijing, China: International Machine Learning Society (IMLS), 2014, 32(1):82-90. [13] CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs [J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370. [14] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, CA, USA: Association for Computational Linguistics, 2016:260-270. [15] ZHANG Y, YANG J. Chinese NER using lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018: 1554-1564. [16] 王婷婷. 现代汉语兼语式的句法研究[D].烟台: 鲁东大学, 2017.WANG T T. A syntactic study of bi-Constituent construction in mandarin Chinese [D]. Yantai: Ludong University, 2017. (in Chinese) [17] 张志公. 修辞概要[M]. 上海: 上海新知识出版社, 1957.ZHANG Z G. Rhetorical summary [M]. Shanghai: Shanghai New Knowledge Press, 1957. (in Chinese) [18] 周强, 张伟, 俞士汶. 汉语树库的构建[J]. 中文信息学报, 1997(4): 43-52.ZHOU Q, ZHANG W, YU S W. Construction of Chinese tree bank [J]. Journal of Chinese Information Processing, 1997(4): 43-52. (in Chinese) [19] MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Seattle, WA, USA: Association for Computational Linguistics, 2020: 5951-5960. [20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. 2017: 5998-6008. [21] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural Networks, 2005, 18(5-6): 602-610. [22] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [C]//Neural Information Processing Systems. Harrahs and Harveys, Lake Tahoe, USA: Advances in Neural Information Processing Systems, 2013: 3111-3119. [23] XUE N, XIA F, CHIOU F D, et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus [J]. Natural Language Engineering, 2005, 11(2): 207-238.