融入罪名关键词的法律判决预测多任务学习模型

doi:10.16511/j.cnki.qhdxxb.2019.21.020

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(2615 KB)
输出: BibTeX | EndNote (RIS)

摘要作为新兴的智慧法院技术之一，基于案情描述文本的法律判决预测越来越引起自然语言处理界的关注。罪名预测和法条推荐是法律判决预测的2个重要子任务。这2个子任务密切相关、相互影响，但常常当作独立的任务分别处理。此外，罪名预测和法条推荐还面临易混淆罪名问题。为了解决这些问题，该文提出一种多任务学习模型对这2个任务进行联合建模，同时采用统计方法从案情描述中抽取有助于区分易混淆罪名的指示性罪名关键词，并将它们融入到多任务学习模型中。在CAIL2018法律数据集上的实验结果表明：融入罪名关键词信息的多任务学习模型能够有效解决易混淆罪名问题，并且能够显著地提高罪名预测和法条推荐这2个任务的性能。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	刘宗林
	张梅山
	甄冉冉
	公佐权
	余南
	付国宏

关键词 ：法律判决预测, 多任务学习, 罪名关键词

Abstract：The legal field is using more artificial intelligence methods such as legal judgment prediction (LJP) based on case description texts using natural language processing. Charge prediction and law article recommendations are two important LJP sub-tasks that are closely related and interact with each other. However, previous studies have usually analyzed them as two independent tasks that are analyzed separately. Furthermore, charge prediction and law article recommendations both face the problem of confusing charges. To this end, this paper presents a multi-task learning model for joint modeling of charge prediction and law article recommendations. Confusing charges are handled by using a set of charge keywords extracted from case description texts using statistical techniques for integration into the multi-task learning model. This method was evaluated using the CAIL2018 legal dataset. The results show that incorporating the charge keywords into the multi-task learning model effectively resolves the confusing charge problem and significantly improves both the charge prediction and the law article recommendation results.

Key words： legal judgment prediction multi-task learning charge keywords

收稿日期: 2018-12-30 出版日期: 2019-06-21

基金资助:国家自然科学基金资助项目（61672211，61602160，U1836222）；黑龙江省自然科学基金资助项目（F2016036）

通讯作者: 付国宏,教授,E-mail:ghfu@hotmail.com E-mail: ghfu@hotmail.com

引用本文:

刘宗林, 张梅山, 甄冉冉, 公佐权, 余南, 付国宏. 融入罪名关键词的法律判决预测多任务学习模型[J]. 清华大学学报（自然科学版）, 2019, 59(7): 497-504.
LIU Zonglin, ZHANG Meishan, ZHEN Ranran, GONG Zuoquan, YU Nan, FU Guohong. Multi-task learning model for legal judgment predictions with charge keywords. Journal of Tsinghua University(Science and Technology), 2019, 59(7): 497-504.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2019.21.020 或 http://jst.tsinghuajournals.com/CN/Y2019/V59/I7/497

图１罪名与法条关系

表１罪名关键词

图２易混淆罪名案例

图３ (网络版彩图)模型整体结构图

图４罪名分布不均衡图

表2 CAIL 2018实验结果

图５罪名准确率

表３罪名预测任务宏平均各项指标

图６易混淆罪名误判率

图７ (网络版彩图)易混淆罪名分析案例

[1] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(8):2493-2537.
[2] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[Z/OL]. (2013-01-16)[2017-09-03] https://arxiv.org/abs/1301.3781.
[3] BAHARUDIN B, LEE L H, KHAN K, et al. A review of machine learning algorithms for text-documents classification[J]. Journal of Advances in Information Technology, 2010, 1(1):4-20.
[4] FIRAT O, CHO K, SANKARAN B, et al. Multi-way, multilingual neural machine translation[J]. Computer Speech & Language, 2017, 45:236-252.
[5] ZHONG H X, XIAO C J, GUO Z P, et al. Overview of CAIL2018:Legal judgment prediction competition[Z/OL]. (2018-10-13)[2018-10-20].https://arxiv.org/abs/1810.0585.
[6] LUO B F, FENG Y S, XU J B, et al. Learning to predict charges for criminal cases with legal basis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark:ACL, 2017:2727-2736.
[7] HU Z K, LI X, TU C C, et al. Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, NM, USA:ACL, 2018:487-498.
[8] JIANG X, YE H, LUO Z C, et al. Interpretable rationale augmented charge prediction system[C]//Proceedings of the 27th International Conference on Computational Linguistics:System Demonstrations. Santa Fe, NM, USA:ACL, 2018:146-151.
[9] LONG S B, TU C C, LIU Z Y, et al. Automatic judgment prediction via legal reading comprehension[Z/OL]. (2018-09-18)[2018-10-12].https://arxiv.org/abs/1809.0653.
[10] ZHONG H X, ZHIPENG G P, TU C C, et al. Legal judgment prediction via topological learning[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium:ACL, 2018:3540-3549.
[11] LIU C L, CHANG C T, HO J H. Case instance generation and refinement for case-based criminal summary judgments in Chinese[J]. Journal of Information Science and Engineering, 2004, 20(4):783-800.
[12] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[13] XIAO C J, ZHONG H X, GUO Z P, et al. CAIL2018:A large-scale legal dataset for judgment prediction[Z/OL]. (2018-07-04)[2018-09-03].https://arxiv.org/abs/1807.0247.
[14] SEGAL J A. Predicting supreme court cases probabilistically:The search and seizure cases, 1962-1981[J]. American Political Science Review, 1984, 78(4):891-900.
[15] AAMODT A, PLAZA E. Case-based reasoning:Foundational issues, methodological variations, and system approaches[J]. AI Communications, 1994, 7(1):39-59.
[16] LAUDERDALE B E, CLARK T S. The supreme court's many median justices[J]. American Political Science Review, 2012, 106(4):847-866.
[17] LIU C L, HSIEH C D. Exploring phrase-based classification of judicial documents for criminal charges in chinese[C]//International Symposium on Methodologies for Intelligent Systems. Bari, Italy:Springer, 2006:681-690.
[18] LIN W C, KUO T T, CHANG T J. Exploiting machine learning models for Chinese legal documents labeling, case classification, and sentencing prediction[C]//Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012). Chung-Li, Taiwan, China:ACL-CLP, 2012:140-141.
[19] ZENG J, USTUN B, RUDIN C. Interpretable classification models for recidivism prediction[J]. Journal of the Royal Statistical Society:Series A (Statistics in Society), 2017, 180(3):689-722.
[20] BERK R, BLEICH J. Forecasts of violence to inform sentencing decisions[J]. Journal of Quantitative Criminology, 2014, 30(1):79-96.
[21] SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5):513-523.
[22] 李静月, 李培峰, 朱巧明. 一种改进的TFIDF网页关键词提取方法[J]. 计算机应用与软件, 2011, 28(5):25-27. LI J Y, LI P F, ZHU Q M. An improved tfidf-based approach to extract key words from web pages[J]. Computer Applications and Software, 2011, 28(5):25-27. (in Chinese)
[23] MIHALCEA R, TARAU P. Textrank:Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain:ACL, 2004:404-411.
[24] 李素建, 王厚峰, 俞士汶, 等. 关键词自动标引的最大熵模型应用研究[J]. 计算机学报, 2004, 27(9):1192-1197. LI S J, WANG H F, YU T W, et al. Research on maximum entropy model for keyword indexing[J]. Chinese Journal of Computers, 2004, 27(9):1192-1197. (in Chinese)
[25] ZHANG K, XU H, TANG J, et al. Keyword extraction using support vector machine[C]//International Conference on Web-Age Information Management. Hong Kong, China:Springer, 2006:85-96.
[26] ERCAN G, CICEKLI I. Using lexical chains for keyword extraction[J]. Information Processing & Management, 2007, 43(6):1705-1714.
[27] 高学东, 吴玲玉.基于高维聚类技术的中文关键词提取算法[J].中国管理信息化, 2011, 14(9):23-27. GAO X D, WU L Y. Chinese keywords extraction algorithm based on the high-dimensional clustering technique[J]. China Management Informationization, 2011, 14(9):23-27. (in Chinese)
[28] ZHANG Q, WANG Y, GONG Y Y, et al. Keyphrase extraction using deep recurrent neural networks on Twitter[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin:ACL, 2016:836-845.
[29] YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. San Diego, California:ACL, 2016:1480-1489.
[30] GRAVES A, JAITLY N, MOHAMED A. Hybrid speech recognition with deep bidirectional lstm[C]//Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, Olomouc, Czech Republic:IEEE, 2013:273-278.
[31] MATHUR A, FOODY G M. Multiclass and binary SVM classification:Implications for training and classification users[J]. IEEE Geoscience and Remote Sensing Letters, 2008, 5(2):241-245.
[32] HUANG G B, ZHOU H, DING X, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012, 42(2):513-529.
[33] KINGMA D P, BA J. Adam:A method for stochastic optimization[Z/OL]. (2017-01-30)[2017-09-10] https://arxiv.org/abs/1412.6980.

[1]	宋欣瑞, 张宪琦, 张展, 陈新昊, 刘宏伟. 多传感器数据融合的复杂人体活动识别[J]. 清华大学学报（自然科学版）, 2020, 60(10): 814-821.
[2]	张宇, 张鹏远, 颜永红. 基于注意力LSTM和多任务学习的远场语音识别[J]. 清华大学学报（自然科学版）, 2018, 58(3): 249-253.

Viewed

Full text

Abstract

Cited

Shared

Discussed