计算语言学

面向中文AMR标注体系的兼语语料库构建及兼语结构识别

  • 侯文惠 ,
  • 曲维光 ,
  • 魏庭新 ,
  • 李斌 ,
  • 顾彦慧 ,
  • 周俊生
展开
  • 1. 南京师范大学 计算机与电子信息学院, 南京 210023;
    2. 南京师范大学 文学院, 南京 210097;
    3. 南京师范大学 国际文化教育学院, 南京 210097

收稿日期: 2020-11-30

  网络出版日期: 2021-08-21

基金资助

国家自然科学基金面上项目(61772278);江苏省高校哲学社会科学基金一般项目(2019JSA0220);国家社会科学基金面上项目(18BYY127)

Construction of a concurrent corpus for a Chinese AMR annotation system and recognition of concurrent structures

  • HOU Wenhui ,
  • QU Weiguang ,
  • WEI Tingxin ,
  • LI Bin ,
  • GU Yanhui ,
  • ZHOU Junsheng
Expand
  • 1. School of Computer and Electronic Information, Nanjing Normal University, Nanjing 210023, China;
    2. School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China;
    3. International College for Chinese Studies, Nanjing Normal University, Nanjing 210097, China

Received date: 2020-11-30

  Online published: 2021-08-21

摘要

兼语结构是汉语中常见的一种动词结构,由述宾短语与主谓短语共享兼语,结构复杂,给句法分析造成困难,因此兼语识别工作对于语义解析及下游任务都具有重要意义。但现存兼语语料库较少,面向中文抽象语义表示(AMR)标注体系的兼语语料库构建仍处于空白阶段。针对这一现状,该文总结出一套兼语语料库标注规范,构建了包含4 760个兼语句的面向中文AMR标注体系的兼语语料库。基于构建的语料库,采用LA-BiLSTM-CRF模型识别兼语结构,达到了86.06%的F1,并分析了识别结果,提出了改进方向。

本文引用格式

侯文惠 , 曲维光 , 魏庭新 , 李斌 , 顾彦慧 , 周俊生 . 面向中文AMR标注体系的兼语语料库构建及兼语结构识别[J]. 清华大学学报(自然科学版), 2021 , 61(9) : 920 -926 . DOI: 10.16511/j.cnki.qhdxxb.2021.21.007

Abstract

Concurrent structures which are shared by the predicate-object phrase and the subject-predicate phrase in one sentence are common Chinese verb structures. However, their complexity makes such structures difficult to analyze. Therefore, recognition of concurrent structures is important for semantic analyses and downstream tasks. However, there are few existing concurrent corpora with no concurrent corpora for the Chinese AMR annotation system. This study summarizes a set of concurrent corpus annotation specifications and builds a concurrent corpus for Chinese AMR annotation systems which contains 4 760 concurrent sentences. The LA-BiLSTM-CRF model is then used to recognize concurrent structures with an F1 score of 86.06%. The recognition results are analyzed to determine needed improvements.

参考文献

[1] 李斌, 闻媛, 宋丽, 等. 融合概念对齐信息的中文AMR语料库的构建[J]. 中文信息学报, 2017, 31(6): 93-102.LI B, WEN Y, SONG L, et al. Construction of Chinese AMR corpus integrating concept alignment information [J]. Journal of Chinese Information Processing, 2017, 31(6): 93-102. (in Chinese)
[2] 周强. 汉语句法树库标注体系[J]. 中文信息学报, 2004(4): 1-8.ZHOU Q. Chinese syntax tree bank marking system [J]. Journal of Chinese Information Processing, 2004(4): 1-8. (in Chinese)
[3] 郭丽娟. 汉语依存句法分析树库构建与应用研究[D]. 苏州: 苏州大学, 2019.GUO L J. Research on construction and application of Chinese dependent syntax analysis tree bank [D]. Suzhou: Suzhou University, 2019. (in Chinese)
[4] 曲维光, 周俊生, 吴晓东, 等. 自然语言句子抽象语义表示AMR研究综述[J]. 数据采集与处理, 2017, 32(1): 26-36.QU W G, ZHOU J S, WU X D, et al. A survey of AMR research on abstract semantic representation of natural language sentences [J]. Data Collection and Processing, 2017, 32(1): 26-36. (in Chinese)
[5] 胡裕树. 现代汉语[M]. 上海: 上海教育出版社, 1979.HU Y S. Modern Chinese [M]. Shanghai: Shanghai Education Press, 1979. (in Chinese)
[6] 邢福义, 汪国胜. 现代汉语[M]. 北京: 高等教育出版社, 2010.XING F Y, WANG G S. Modern Chinese [M]. Beijing: Higher Education Press, 2010. (in Chinese)
[7] 李婷玉, 王亚, 曹聪. 兼语语义类的分类研究[J]. 计算机应用研究, 2017, 34(1):15-20. LI T Y, WANG Y, CAO C. A study on the classification of semantic classes of concurrent structure [J]. Application Research of Computers, 2017, 34(1):15-20. (in Chinese)
[8] 马德全, 王利民. 兼语句的语义分析[J]. 内蒙古民族大学学报(社会科学版), 2010, 36(4): 30-32.MA D Q, WANG L M. Semantic analysis of concurrent sentences [J]. Journal of Inner Mongolia University for Nationalities (Social Science Edition), 2010, 36(4): 30-32. (in Chinese)
[9] 司玉英. 双宾兼语句的语法、 语义和语用特征[J]. 内蒙古大学学报(哲学社会科学版), 2010, 42(1): 148-152.SI Y Y. The grammatical, semantic and pragmatic features of double-object sentences [J]. Journal of Inner Mongolia University for Nationalities (Social Science Edition), 2010, 42(1): 148-152. (in Chinese)
[10] 傅成宏. 现代汉语兼语结构的自动识别[D]. 南京: 南京师范大学, 2007.FU C H. Automatic recognition of modern Chinese concurrent structure [D]. Nanjing: Nanjing Normal University, 2007. (in Chinese)
[11] 陈静, 王东波, 谢靖, 等. 基于条件随机场的兼语结构自动识别[J]. 情报科学, 2012, 30(3): 439-443.CHEN J, WANG D B, XIE J, et al. Automatic recognition of concurrent structure based on conditional random field [J]. Information Science, 2012, 30(3):439-443. (in Chinese)
[12] PINHERIO R C P H O, PEDRO H. Recurrent convolutional neural networks for scene parsing [C]//International Conference of Machine Learning. Beijing, China: International Machine Learning Society (IMLS), 2014, 32(1):82-90.
[13] CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs [J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
[14] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, CA, USA: Association for Computational Linguistics, 2016:260-270.
[15] ZHANG Y, YANG J. Chinese NER using lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018: 1554-1564.
[16] 王婷婷. 现代汉语兼语式的句法研究[D].烟台: 鲁东大学, 2017.WANG T T. A syntactic study of bi-Constituent construction in mandarin Chinese [D]. Yantai: Ludong University, 2017. (in Chinese)
[17] 张志公. 修辞概要[M]. 上海: 上海新知识出版社, 1957.ZHANG Z G. Rhetorical summary [M]. Shanghai: Shanghai New Knowledge Press, 1957. (in Chinese)
[18] 周强, 张伟, 俞士汶. 汉语树库的构建[J]. 中文信息学报, 1997(4): 43-52.ZHOU Q, ZHANG W, YU S W. Construction of Chinese tree bank [J]. Journal of Chinese Information Processing, 1997(4): 43-52. (in Chinese)
[19] MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Seattle, WA, USA: Association for Computational Linguistics, 2020: 5951-5960.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. 2017: 5998-6008.
[21] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural Networks, 2005, 18(5-6): 602-610.
[22] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [C]//Neural Information Processing Systems. Harrahs and Harveys, Lake Tahoe, USA: Advances in Neural Information Processing Systems, 2013: 3111-3119.
[23] XUE N, XIA F, CHIOU F D, et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus [J]. Natural Language Engineering, 2005, 11(2): 207-238.
文章导航

/