随着健康医疗数据的快速积累,数据驱动的医疗分析越来越受重视,合适的医疗活动表征对这些分析至关重要。然而,当前大多数表征方法缺乏对医疗数据时序性、数值敏感性的考虑,影响了分析方法的效果和可解释性。该文针对住院病例,提出了一种基于主题模型加强的医疗活动表征学习方法,该方法利用活动间时序关系和主题分配情况,构建了一个无监督学习的多层感知机模型。在大规模真实住院数据集上的测试结果表明:该方法所得表征可以有效提升疾病聚类、后续活动预测、剩余住院天数预测3项医疗分析任务的效果,同时表征具有良好的医学可解释性。
With the explosion of the amount of medical data, data-driven medical analyses are receiving increasing attention. Proper representation of medical activities is crucial for such analyses. However, most existing representations are designed without considering the temporality and numerical sensitivity of medical data, which limits the performance and interpretability of the analysis tasks. This paper presents a representation learning approach for medical activities that is enhanced by topical modeling for inpatient data. The approach leverages the temporal relations between activities and the topic assignment to construct a multilayer perceptron model. Evaluations using large real data sets demonstrate that this approach significantly improves three typical medical analysis tasks, while providing medical interpretations.
[1] BENGIO Y, COURVILLE A, VINCENT P. Representation learning:A review and new perspectives[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1798-1828.
[2] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the 26th InternationalConference on Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc, 2013:3111-3119.
[3] CHOI Y, CHIU C Y I, SONTAG D. Learning low-dimensional representations of medical concepts[J]. AMIA Summits on Translational Science Proceedings, 2016, 2016:41-50.
[4] DE VINE L, ZUCCON G, KOOPMAN B, et al. Medical semantic similarity with a neural language model[C]//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management. New York, NY, USA:ACM, 2014:1819-1822.
[5] LI C Y, HOU Y L, SUN M, et al. An evaluation of China's new rural cooperative medical system:Achievements and inadequacies from policy goals[J]. BMC Public Health, 2015, 15:1079.
[6] NGUYEN P, TRAN T, WICKRAMASINGHE N, et al. Deepr:A convolutional net for medical records[J]. IEEE Journal of Biomedical and Health Informatics, 2017, 21(1):22-30.
[7] ZHU Z H, YIN C C, QIAN B Y, et al. Measuring patient similarities via a deep architecture with medical concept embedding[C]//Proceedings of the 2016 IEEE 16th International Conference on Data Mining. Barcelona, Spain:IEEE, 2016:749-758.
[8] CHOI E, BAHADORI M T, SEARLES E, et al. Multi-layer representation learning for medical concepts[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM, 2016:1495-1504.
[9] PHAM T, TRAN T, PHUNG D, et al. Deepcare:A deep dynamic memory model for predictive medicine[C]//Proceedings of Pacific-Asia Conference on Knowledge Discovery and Data Mining. Cham, Germany:Springer, 2016:30-41.
[10] CHOI E, BAHADORI M T, KULAS J A, et al. Retain:An interpretable predictive model for healthcare using reverse time attention mechanism[C]//Proceedings of 30th Conference on Neural Information Processing Systems. Barcelona, Spain, 2016:3504-3512.
[11] CHOI E, SCHUETZ A, STEWART W F, et al. Using recurrent neural network models for early detection of heart failure onset[J]. Journal of the American Medical Informatics Association, 2017, 24(2):361-370.
[12] MA F L, CHITTA R, ZHOU J, et al. Dipole:Diagnosis prediction in healthcare via attention-based bidirectional recurrent neural networks[C]//Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA:ACM, 2017:1903-1911.
[13] CHOI E, BAHADORI M T, SCHUETZ A, et al. Doctor ai:Predicting clinical events via recurrent neural networks[C]//Proceedings of the 1st Machine Learning for Healthcare Conference. 2016:301-318.
[14] 中华人民共和国国家卫生和计划生育委员会. 卫生部关于印发《临床路径管理指导原则(试行)》的通知[R/OL]. (2009-10-16).[2018-03-13]. http://www.nhfpc.gov.cn/yzygj/s3589/200910/479af260b55a4fc3b4b978321b56b465.shtml.National Health and Family Planning Commission of the PRC. The notification on the instruction principle of clinical pathway management[R/OL]. (2009-10-16).[2018-03-13]. http://www.nhfpc.gov.cn/yzygj/s3589/200910/479af260b55a4fc3b4b978321b56b465.shtml. (in Chinese)
[15] 中华人民共和国国家卫生和计划生育委员会. 2016年我国卫生和计划生育事业发展统计公报[R/OL]. (2017-08-18).[2018-03-13]. http://www.nhfpc.gov.cn/guihuaxxs/s10748/201708/d82fa7141696407abb4ef764f3edf095.shtml?from=groupmessage&isappinstalled=1.National Health and Family Planning Commission of the PRC. The statistical communique on the development of national health and family planning in 2016[R/OL]. (2017-08-18).[2018-03-13]. http://www.nhfpc.gov.cn/guihuaxxs/s10748/201708/d82fa7141696407abb4ef764f3edf095.shtml?from=gro-upmessage&isappinstalled=1. (in Chinese)
[16] BLEI D M, NG A Y, JORDAN M I. Latent dirichlet allocation[J]. Journal of Machine Learning Research, 2003, 3:993-1022.
[17] LE Q, MIKOLOV T. Distributed representations of sentences and documents[C]//Proceedings of the 31st International Conference on International Conference on Machine Learning. Beijing, China:ACM, 2014:1188-1196.
[18] HUANG Z X, LU X D, DUAN H L, et al. Summarizing clinical pathways from event logs[J]. Journal of Biomedical Informatics, 2013, 46(1):111-127.
[19] XU X, JIN T, WEI Z J, et al. Tcpm:Topic-based clinical pathway mining[C]//Proceedings of 1st International Conference on Connected Health:Applications, Systems and Engineering Technologies, 2016 IEEE First International Conference on. Washington DC, USA:IEEE, 2016:292-301.
[20] CHAKRABORTY S, TOMSETT R, RAGHAVENDRA R, et al. Interpretability of deep learning models:A survey of results[C]//Proceedings of IEEE Smart World Congress 2017 Workshop:DAIS 2017-Workshop on Distributed Analytics InfraStructure and Algorithms for Multi-Organization Federations. San Francisco, CA, USA:IEEE, 2017.
[21] BENGIO Y, LAMBLIN P, POPOVICI D, et al. Greedy layer-wise training of deep networks[C]//Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge, MA, USA:MIT Press, 2007:153-160.