Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2019, Vol. 59 Issue (4): 256-261    DOI: 10.16511/j.cnki.qhdxxb.2019.21.007
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
基于序列到序列模型的法律问题关键词抽取
曾道建1,2, 童国维1,2, 戴愿1,2, 李峰1,2, 韩冰3, 谢松县3
1. 长沙理工大学 计算机与通信工程学院, 长沙 410114;
2. 长沙理工大学 综合交通运输大数据智能处理湖南省重点实验室, 长沙 410114;
3. 湖南数定智能科技有限公司, 长沙 410013
Keyphrase extraction for legal questions based on a sequence to sequence model
ZENG Daojian1,2, TONG Guowei1,2, DAI Yuan1,2, LI Feng1,2, HAN Bing3, XIE Songxian3
1. School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha 410114, China;
2. Hunan Provincial Key Laboratory of Intelligent Processing of Big Data on Transportation, Changsha University of Science and Technology, Changsha 410114, China;
3. Hunan Date-driven AI Technology Co. Ltd., Changsha 410113, China
全文: PDF(1582 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 传统的关键词抽取算法不能够抽取那些没有在文本当中出现过的关键词,因此在抽取法律问题(短文本)的关键词任务上效果不佳。该文提出了一种基于强化学习的序列到序列(seq2seq)模型来从法律问题中抽取关键词。首先,编码器将给定法律问题文本的语义信息压入一个密集矢量;然后,解码器自动生成关键词。因为在关键词抽取任务中,生成的关键词的前后顺序无关紧要,所以引入强化学习来训练所提出的模型。该模型结合了强化学习在决策上的优势和序列到序列模型在长期记忆方面的优势,在真实数据集上的实验结果表明,该模型在关键词抽取任务上有较好的效果。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
曾道建
童国维
戴愿
李峰
韩冰
谢松县
关键词 关键词抽取序列到序列模型强化学习    
Abstract:Traditional keyphrase extraction algorithms cannot extract keyphrases that have not appeared in the text, so they cannot effectively extract keyphrases in short legal texts. This paper presents a sequence-to-sequence (seq2seq) model based on reinforcement learning to extract keyphrases from legal questions. First, the encoder pushes the semantic information of a given legal question text into a dense vector; then, the decoder automatically generates the keyphrases. Since the order of the generated keyphrases does not matter in the keyphrase extraction task, reinforcement learning is used to train the model. This method combines the advantages of reinforcement learning for decision-making and the advantages of the sequence-to-sequence model for long-term memory. Tests on real datasets show that the model provides accurate keyphrase extraction.
Key wordskeyphrase extraction    sequence-to-sequence model    reinforcement learning
收稿日期: 2018-11-14      出版日期: 2019-04-09
基金资助:国家自然科学基金青年基金项目(61602059);湖南省自然科学基金青年基金项目(2017JJ3334);湖南教育厅科学研究项目(16C0045);模式识别国家重点实验室开放课题基金(20170007)
引用本文:   
曾道建, 童国维, 戴愿, 李峰, 韩冰, 谢松县. 基于序列到序列模型的法律问题关键词抽取[J]. 清华大学学报(自然科学版), 2019, 59(4): 256-261.
ZENG Daojian, TONG Guowei, DAI Yuan, LI Feng, HAN Bing, XIE Songxian. Keyphrase extraction for legal questions based on a sequence to sequence model. Journal of Tsinghua University(Science and Technology), 2019, 59(4): 256-261.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2019.21.007  或          http://jst.tsinghuajournals.com/CN/Y2019/V59/I4/256
  图1 关键词抽取的例子
  图2 基于序列到序列的关键词抽取的框图
  表1 关键词抽取方法的实验结果
  表2 关键词生成的例子
[1] TURNEY P D. Learning algorithms for keyphrase extraction[J]. Information Retrieval Journal, 2002, 2(4):303-336.
[2] FRANK E, PAYNTER G W, WITTEN I H, et al. Domain-specific keyphrase extraction[C]//International Joint Conference on Artificial Intelligence. San Francisco,CA:Morgan Kaufmann Publishers,1999, 2:668-673.
[3] LIU Z, LI P, ZHENG Y, et al. Clustering to find exemplar terms for keyphrase extraction[C]//Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2009:257-266.
[4] MEDELYAN O, FRANK E, WITTEN I H. Human-competitive tagging using automatic keyphrase extraction[C]//Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2009:1318-1327.
[5] HASAN K S, NG V. Automatic keyphrase extraction:A survey of the state of the art[C]//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1:Long Papers). Stroudsburg, PA:Association for Computational Linguistics, 2014, 1:1262-1273.
[6] WANG M, ZHAO B, HUANG Y. PTR:Phrase-based topical ranking for automatic keyphrase extraction in scientific publications[C]//International Conference on Neural Information Processing. Berlin:Springer, 2016:120-128.
[7] BELLAACHIA A, AL-DHELAAN M. Ne-rank:A novel graph-based keyphrase extraction in twitter[C]//Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology- Volume 1. IEEE Computer Society, NJ:IEEE, 2012:372-379.
[8] ZHANG Q, WANG Y, GONG Y, et al. Keyphrase extraction using deep recurrent neural networks on Twitter[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2016:836-845.
[9] CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using rnn encoder-decoder for statistical machine translation[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2014:1724-1734.
[10] VINYALS O, KAISER , KOO T, et al. Grammar as a foreign language[C]//Advances in Neural Information Processing Systems. Cambridge, MA:MIT Press, 2015:2773-2781.
[11] RANZATO M, CHOPRA S, AULI M, et al. Sequence level training with recurrent neural networks[J/OL].(2015-11-20). https://arxiv.org/abs/1511.06732.
[12] Mihalcea R, Tarau P. Textrank:Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA:Association for Computational Linguistics, 2004:404-411.
[13] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J/OL]. (2013-12-19). https://arxiv.org/abs/1312.5602.
[14] SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation[C]//Advances in Neural Information Processing Systems. Cambridge, MA:MIT Press, 2000:1057-1063.
[15] LI J, GALLEY M, BROCKETT C, et al. A diversity-promoting objective function for neural conversation models[J/OL].(2015-10-11). https://arxiv.org/abs/1510.03055.
[16] KINGMA D P, BA J. Adam:A method for stochastic optimization[J/OL].(2014-12-22). https://arxiv.org/abs/1412.6980.
[17] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J/OL].(2014-12-11). https://arxiv.org/abs/1412.3555.
[18] MENG R, ZHAO S, HAN S, et al. Deep keyphrase generation[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. Vancouver, Canada:Association for Computational Linguistics, 2017:582-592.
[1] 黄号, 马文卉, 李家诚, 方洋旺. 未知环境下无人机编队智能避障控制方法[J]. 清华大学学报(自然科学版), 2024, 64(2): 358-369.
[2] 何启嘉, 王启明, 李佳璇, 王正佳, 王通. 基于优势竞争网络的转运机器人路径规划[J]. 清华大学学报(自然科学版), 2022, 62(11): 1751-1757.
[3] 王庭晗, 罗禹贡, 刘金鑫, 李克强. 基于考虑状态分布的深度确定性策略梯度算法的端到端自动驾驶策略[J]. 清华大学学报(自然科学版), 2021, 61(9): 881-888.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn