Multi-task learning model for legal judgment predictions with charge keywords
LIU Zonglin1, ZHANG Meishan1, ZHEN Ranran1, GONG Zuoquan2, YU Nan1, FU Guohong1
1. School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China; 2. School of Information, Guizhou University of Finance and Economics, Guiyang 550025, China
Abstract:The legal field is using more artificial intelligence methods such as legal judgment prediction (LJP) based on case description texts using natural language processing. Charge prediction and law article recommendations are two important LJP sub-tasks that are closely related and interact with each other. However, previous studies have usually analyzed them as two independent tasks that are analyzed separately. Furthermore, charge prediction and law article recommendations both face the problem of confusing charges. To this end, this paper presents a multi-task learning model for joint modeling of charge prediction and law article recommendations. Confusing charges are handled by using a set of charge keywords extracted from case description texts using statistical techniques for integration into the multi-task learning model. This method was evaluated using the CAIL2018 legal dataset. The results show that incorporating the charge keywords into the multi-task learning model effectively resolves the confusing charge problem and significantly improves both the charge prediction and the law article recommendation results.
[1] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(8):2493-2537. [2] MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space[Z/OL]. (2013-01-16)[2017-09-03] https://arxiv.org/abs/1301.3781. [3] BAHARUDIN B, LEE L H, KHAN K, et al. A review of machine learning algorithms for text-documents classification[J]. Journal of Advances in Information Technology, 2010, 1(1):4-20. [4] FIRAT O, CHO K, SANKARAN B, et al. Multi-way, multilingual neural machine translation[J]. Computer Speech & Language, 2017, 45:236-252. [5] ZHONG H X, XIAO C J, GUO Z P, et al. Overview of CAIL2018:Legal judgment prediction competition[Z/OL]. (2018-10-13)[2018-10-20].https://arxiv.org/abs/1810.0585. [6] LUO B F, FENG Y S, XU J B, et al. Learning to predict charges for criminal cases with legal basis[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. Copenhagen, Denmark:ACL, 2017:2727-2736. [7] HU Z K, LI X, TU C C, et al. Few-shot charge prediction with discriminative legal attributes[C]//Proceedings of the 27th International Conference on Computational Linguistics. Santa Fe, NM, USA:ACL, 2018:487-498. [8] JIANG X, YE H, LUO Z C, et al. Interpretable rationale augmented charge prediction system[C]//Proceedings of the 27th International Conference on Computational Linguistics:System Demonstrations. Santa Fe, NM, USA:ACL, 2018:146-151. [9] LONG S B, TU C C, LIU Z Y, et al. Automatic judgment prediction via legal reading comprehension[Z/OL]. (2018-09-18)[2018-10-12].https://arxiv.org/abs/1809.0653. [10] ZHONG H X, ZHIPENG G P, TU C C, et al. Legal judgment prediction via topological learning[C]//Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium:ACL, 2018:3540-3549. [11] LIU C L, CHANG C T, HO J H. Case instance generation and refinement for case-based criminal summary judgments in Chinese[J]. Journal of Information Science and Engineering, 2004, 20(4):783-800. [12] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780. [13] XIAO C J, ZHONG H X, GUO Z P, et al. CAIL2018:A large-scale legal dataset for judgment prediction[Z/OL]. (2018-07-04)[2018-09-03].https://arxiv.org/abs/1807.0247. [14] SEGAL J A. Predicting supreme court cases probabilistically:The search and seizure cases, 1962-1981[J]. American Political Science Review, 1984, 78(4):891-900. [15] AAMODT A, PLAZA E. Case-based reasoning:Foundational issues, methodological variations, and system approaches[J]. AI Communications, 1994, 7(1):39-59. [16] LAUDERDALE B E, CLARK T S. The supreme court's many median justices[J]. American Political Science Review, 2012, 106(4):847-866. [17] LIU C L, HSIEH C D. Exploring phrase-based classification of judicial documents for criminal charges in chinese[C]//International Symposium on Methodologies for Intelligent Systems. Bari, Italy:Springer, 2006:681-690. [18] LIN W C, KUO T T, CHANG T J. Exploiting machine learning models for Chinese legal documents labeling, case classification, and sentencing prediction[C]//Proceedings of the 24th Conference on Computational Linguistics and Speech Processing (ROCLING 2012). Chung-Li, Taiwan, China:ACL-CLP, 2012:140-141. [19] ZENG J, USTUN B, RUDIN C. Interpretable classification models for recidivism prediction[J]. Journal of the Royal Statistical Society:Series A (Statistics in Society), 2017, 180(3):689-722. [20] BERK R, BLEICH J. Forecasts of violence to inform sentencing decisions[J]. Journal of Quantitative Criminology, 2014, 30(1):79-96. [21] SALTON G, BUCKLEY C. Term-weighting approaches in automatic text retrieval[J]. Information Processing & Management, 1988, 24(5):513-523. [22] 李静月, 李培峰, 朱巧明. 一种改进的TFIDF网页关键词提取方法[J]. 计算机应用与软件, 2011, 28(5):25-27. LI J Y, LI P F, ZHU Q M. An improved tfidf-based approach to extract key words from web pages[J]. Computer Applications and Software, 2011, 28(5):25-27. (in Chinese) [23] MIHALCEA R, TARAU P. Textrank:Bringing order into text[C]//Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain:ACL, 2004:404-411. [24] 李素建, 王厚峰, 俞士汶, 等. 关键词自动标引的最大熵模型应用研究[J]. 计算机学报, 2004, 27(9):1192-1197. LI S J, WANG H F, YU T W, et al. Research on maximum entropy model for keyword indexing[J]. Chinese Journal of Computers, 2004, 27(9):1192-1197. (in Chinese) [25] ZHANG K, XU H, TANG J, et al. Keyword extraction using support vector machine[C]//International Conference on Web-Age Information Management. Hong Kong, China:Springer, 2006:85-96. [26] ERCAN G, CICEKLI I. Using lexical chains for keyword extraction[J]. Information Processing & Management, 2007, 43(6):1705-1714. [27] 高学东, 吴玲玉.基于高维聚类技术的中文关键词提取算法[J].中国管理信息化, 2011, 14(9):23-27. GAO X D, WU L Y. Chinese keywords extraction algorithm based on the high-dimensional clustering technique[J]. China Management Informationization, 2011, 14(9):23-27. (in Chinese) [28] ZHANG Q, WANG Y, GONG Y Y, et al. Keyphrase extraction using deep recurrent neural networks on Twitter[C]//Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin:ACL, 2016:836-845. [29] YANG Z C, YANG D Y, DYER C, et al. Hierarchical attention networks for document classification[C]//Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies. San Diego, California:ACL, 2016:1480-1489. [30] GRAVES A, JAITLY N, MOHAMED A. Hybrid speech recognition with deep bidirectional lstm[C]//Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on. IEEE, Olomouc, Czech Republic:IEEE, 2013:273-278. [31] MATHUR A, FOODY G M. Multiclass and binary SVM classification:Implications for training and classification users[J]. IEEE Geoscience and Remote Sensing Letters, 2008, 5(2):241-245. [32] HUANG G B, ZHOU H, DING X, et al. Extreme learning machine for regression and multiclass classification[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 2012, 42(2):513-529. [33] KINGMA D P, BA J. Adam:A method for stochastic optimization[Z/OL]. (2017-01-30)[2017-09-10] https://arxiv.org/abs/1412.6980.