Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2021, Vol. 61 Issue (9): 920-926    DOI: 10.16511/j.cnki.qhdxxb.2021.21.007
  计算语言学 本期目录 | 过刊浏览 | 高级检索 |
面向中文AMR标注体系的兼语语料库构建及兼语结构识别
侯文惠1, 曲维光1,2, 魏庭新2,3, 李斌2, 顾彦慧1, 周俊生1
1. 南京师范大学 计算机与电子信息学院, 南京 210023;
2. 南京师范大学 文学院, 南京 210097;
3. 南京师范大学 国际文化教育学院, 南京 210097
Construction of a concurrent corpus for a Chinese AMR annotation system and recognition of concurrent structures
HOU Wenhui1, QU Weiguang1,2, WEI Tingxin2,3, LI Bin2, GU Yanhui1, ZHOU Junsheng1
1. School of Computer and Electronic Information, Nanjing Normal University, Nanjing 210023, China;
2. School of Chinese Language and Literature, Nanjing Normal University, Nanjing 210097, China;
3. International College for Chinese Studies, Nanjing Normal University, Nanjing 210097, China
全文: PDF(1269 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 兼语结构是汉语中常见的一种动词结构,由述宾短语与主谓短语共享兼语,结构复杂,给句法分析造成困难,因此兼语识别工作对于语义解析及下游任务都具有重要意义。但现存兼语语料库较少,面向中文抽象语义表示(AMR)标注体系的兼语语料库构建仍处于空白阶段。针对这一现状,该文总结出一套兼语语料库标注规范,构建了包含4 760个兼语句的面向中文AMR标注体系的兼语语料库。基于构建的语料库,采用LA-BiLSTM-CRF模型识别兼语结构,达到了86.06%的F1,并分析了识别结果,提出了改进方向。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
侯文惠
曲维光
魏庭新
李斌
顾彦慧
周俊生
关键词 抽象语义表示兼语结构识别    
Abstract:Concurrent structures which are shared by the predicate-object phrase and the subject-predicate phrase in one sentence are common Chinese verb structures. However, their complexity makes such structures difficult to analyze. Therefore, recognition of concurrent structures is important for semantic analyses and downstream tasks. However, there are few existing concurrent corpora with no concurrent corpora for the Chinese AMR annotation system. This study summarizes a set of concurrent corpus annotation specifications and builds a concurrent corpus for Chinese AMR annotation systems which contains 4 760 concurrent sentences. The LA-BiLSTM-CRF model is then used to recognize concurrent structures with an F1 score of 86.06%. The recognition results are analyzed to determine needed improvements.
Key wordsabstract meaning representation    concurrent structure    recognition
收稿日期: 2020-11-30      出版日期: 2021-08-21
基金资助:国家自然科学基金面上项目(61772278);江苏省高校哲学社会科学基金一般项目(2019JSA0220);国家社会科学基金面上项目(18BYY127)
通讯作者: 曲维光,教授,E-mail:wgqu_nj@163.com     E-mail: wgqu_nj@163.com
引用本文:   
侯文惠, 曲维光, 魏庭新, 李斌, 顾彦慧, 周俊生. 面向中文AMR标注体系的兼语语料库构建及兼语结构识别[J]. 清华大学学报(自然科学版), 2021, 61(9): 920-926.
HOU Wenhui, QU Weiguang, WEI Tingxin, LI Bin, GU Yanhui, ZHOU Junsheng. Construction of a concurrent corpus for a Chinese AMR annotation system and recognition of concurrent structures. Journal of Tsinghua University(Science and Technology), 2021, 61(9): 920-926.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2021.21.007  或          http://jst.tsinghuajournals.com/CN/Y2021/V61/I9/920
  
  
  
  
  
  
  
  
  
[1] 李斌, 闻媛, 宋丽, 等. 融合概念对齐信息的中文AMR语料库的构建[J]. 中文信息学报, 2017, 31(6): 93-102.LI B, WEN Y, SONG L, et al. Construction of Chinese AMR corpus integrating concept alignment information [J]. Journal of Chinese Information Processing, 2017, 31(6): 93-102. (in Chinese)
[2] 周强. 汉语句法树库标注体系[J]. 中文信息学报, 2004(4): 1-8.ZHOU Q. Chinese syntax tree bank marking system [J]. Journal of Chinese Information Processing, 2004(4): 1-8. (in Chinese)
[3] 郭丽娟. 汉语依存句法分析树库构建与应用研究[D]. 苏州: 苏州大学, 2019.GUO L J. Research on construction and application of Chinese dependent syntax analysis tree bank [D]. Suzhou: Suzhou University, 2019. (in Chinese)
[4] 曲维光, 周俊生, 吴晓东, 等. 自然语言句子抽象语义表示AMR研究综述[J]. 数据采集与处理, 2017, 32(1): 26-36.QU W G, ZHOU J S, WU X D, et al. A survey of AMR research on abstract semantic representation of natural language sentences [J]. Data Collection and Processing, 2017, 32(1): 26-36. (in Chinese)
[5] 胡裕树. 现代汉语[M]. 上海: 上海教育出版社, 1979.HU Y S. Modern Chinese [M]. Shanghai: Shanghai Education Press, 1979. (in Chinese)
[6] 邢福义, 汪国胜. 现代汉语[M]. 北京: 高等教育出版社, 2010.XING F Y, WANG G S. Modern Chinese [M]. Beijing: Higher Education Press, 2010. (in Chinese)
[7] 李婷玉, 王亚, 曹聪. 兼语语义类的分类研究[J]. 计算机应用研究, 2017, 34(1):15-20. LI T Y, WANG Y, CAO C. A study on the classification of semantic classes of concurrent structure [J]. Application Research of Computers, 2017, 34(1):15-20. (in Chinese)
[8] 马德全, 王利民. 兼语句的语义分析[J]. 内蒙古民族大学学报(社会科学版), 2010, 36(4): 30-32.MA D Q, WANG L M. Semantic analysis of concurrent sentences [J]. Journal of Inner Mongolia University for Nationalities (Social Science Edition), 2010, 36(4): 30-32. (in Chinese)
[9] 司玉英. 双宾兼语句的语法、 语义和语用特征[J]. 内蒙古大学学报(哲学社会科学版), 2010, 42(1): 148-152.SI Y Y. The grammatical, semantic and pragmatic features of double-object sentences [J]. Journal of Inner Mongolia University for Nationalities (Social Science Edition), 2010, 42(1): 148-152. (in Chinese)
[10] 傅成宏. 现代汉语兼语结构的自动识别[D]. 南京: 南京师范大学, 2007.FU C H. Automatic recognition of modern Chinese concurrent structure [D]. Nanjing: Nanjing Normal University, 2007. (in Chinese)
[11] 陈静, 王东波, 谢靖, 等. 基于条件随机场的兼语结构自动识别[J]. 情报科学, 2012, 30(3): 439-443.CHEN J, WANG D B, XIE J, et al. Automatic recognition of concurrent structure based on conditional random field [J]. Information Science, 2012, 30(3):439-443. (in Chinese)
[12] PINHERIO R C P H O, PEDRO H. Recurrent convolutional neural networks for scene parsing [C]//International Conference of Machine Learning. Beijing, China: International Machine Learning Society (IMLS), 2014, 32(1):82-90.
[13] CHIU J P C, NICHOLS E. Named entity recognition with bidirectional LSTM-CNNs [J]. Transactions of the Association for Computational Linguistics, 2016, 4: 357-370.
[14] LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition [C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, CA, USA: Association for Computational Linguistics, 2016:260-270.
[15] ZHANG Y, YANG J. Chinese NER using lattice LSTM [C]//Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018: 1554-1564.
[16] 王婷婷. 现代汉语兼语式的句法研究[D].烟台: 鲁东大学, 2017.WANG T T. A syntactic study of bi-Constituent construction in mandarin Chinese [D]. Yantai: Ludong University, 2017. (in Chinese)
[17] 张志公. 修辞概要[M]. 上海: 上海新知识出版社, 1957.ZHANG Z G. Rhetorical summary [M]. Shanghai: Shanghai New Knowledge Press, 1957. (in Chinese)
[18] 周强, 张伟, 俞士汶. 汉语树库的构建[J]. 中文信息学报, 1997(4): 43-52.ZHOU Q, ZHANG W, YU S W. Construction of Chinese tree bank [J]. Journal of Chinese Information Processing, 1997(4): 43-52. (in Chinese)
[19] MA R T, PENG M L, ZHANG Q, et al. Simplify the usage of lexicon in Chinese NER [C]//Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Seattle, WA, USA: Association for Computational Linguistics, 2020: 5951-5960.
[20] VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]//Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach, USA: Curran Associates Inc. 2017: 5998-6008.
[21] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J]. Neural Networks, 2005, 18(5-6): 602-610.
[22] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality [C]//Neural Information Processing Systems. Harrahs and Harveys, Lake Tahoe, USA: Advances in Neural Information Processing Systems, 2013: 3111-3119.
[23] XUE N, XIA F, CHIOU F D, et al. The Penn Chinese TreeBank: Phrase structure annotation of a large corpus [J]. Natural Language Engineering, 2005, 11(2): 207-238.
[1] 王立平, 张超, 蔡恩磊, 史慧杰, 王冬. 面向自主工业软件的知识提取和知识库构建方法[J]. 清华大学学报(自然科学版), 2022, 62(5): 978-986.
[2] 张海燕, 胡宏亮, 王钰, 姜化京, 甘恒谦, 胡浩, 黄仕杰. FAST无线电干扰智能监测技术[J]. 清华大学学报(自然科学版), 2022, 62(11): 1780-1788.
[3] 胡滨, 耿天玉, 邓赓, 段磊. 基于知识蒸馏的高效生物医学命名实体识别模型[J]. 清华大学学报(自然科学版), 2021, 61(9): 936-942.
[4] 刘贺子, 陈涛. 基于视频识别的混合非机动车速度分布模型[J]. 清华大学学报(自然科学版), 2021, 61(2): 144-151.
[5] 宋宇波, 吴天琦, 胡爱群, 高尚. 基于跨域资源访问的浏览器用户追踪[J]. 清华大学学报(自然科学版), 2021, 61(11): 1254-1259.
[6] 杜晓闯, 涂红兵, 黎岢, 张洁, 王康, 刘鹤敏, 梁漫春, 汪向伟. 基于径向基神经网络仿真γ能谱模板库的核素识别方法[J]. 清华大学学报(自然科学版), 2021, 61(11): 1308-1315.
[7] 罗常伟,於俊,于灵云,李亚利,王生进. 三维人脸识别研究进展综述[J]. 清华大学学报(自然科学版), 2021, 61(1): 77-88.
[8] 尹学振, 赵慧, 赵俊保, 姚婉薇, 黄泽林. 多神经网络协作的军事领域命名实体识别[J]. 清华大学学报(自然科学版), 2020, 60(8): 648-655.
[9] 宋宇波, 祁欣妤, 黄强, 胡爱群, 杨俊杰. 基于二阶段多分类的物联网设备识别算法[J]. 清华大学学报(自然科学版), 2020, 60(5): 365-370.
[10] 朱志明, 程世佳, 于英飞, 符平坡. 焊接电弧形貌判别模型及钨极高度的影响规律[J]. 清华大学学报(自然科学版), 2020, 60(4): 285-291.
[11] 宋欣瑞, 张宪琦, 张展, 陈新昊, 刘宏伟. 多传感器数据融合的复杂人体活动识别[J]. 清华大学学报(自然科学版), 2020, 60(10): 814-821.
[12] 孙博文, 朱志明, 郭吉昌, 张天一. 基于组合激光结构光的视觉传感器检测算法及图像处理流程优化[J]. 清华大学学报(自然科学版), 2019, 59(6): 445-452.
[13] 李明扬, 孔芳. 融入自注意力机制的社交媒体命名实体识别[J]. 清华大学学报(自然科学版), 2019, 59(6): 461-467.
[14] 钟必清, 侯之超, 赵韩, 刘瑞雪, 邓斌. 混合动力汽车传动系扭振力学参数的试验获取方法[J]. 清华大学学报(自然科学版), 2019, 59(6): 482-489.
[15] 彭秋辰, 宋亦旭. 基于Mask R-CNN的物体识别和定位[J]. 清华大学学报(自然科学版), 2019, 59(2): 135-141.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn