[1] SCHEET P, STEPHENS M. A fast and flexible statistical model for large-scale population genotype data:Applications to inferring missing genotypes and haplotypic phase[J]. The American Journal of Human Genetics, 2006, 78(4):629-644.
[2] JEREZ J M, MOLINA I, GARCÍA-LAENCINA P J, et al. Missing data imputation using statistical and machine learning methods in a real breast cancer problem[J]. Artificial Intelligence in Medicine, 2010, 50(2):105-115.
[3] YUAN H W, XU G M, YAO Z J, et al. Imputation of missing data in time series for air pollutants using long short-term memory recurrent neural networks[C]//Proceedings of the 2018 ACM International Joint Conference and 2018 International Symposium on Pervasive and Ubiquitous Computing and Wearable Computers. Singapore:ACM, 2018:1293-1300.
[4] VAN BUUREN S, GROOTHUIS-OUDSHOORN K. MICE:Multivariate imputation by chained equations in R[J]. Journal of Statistical Software, 2011, 45(3):1-67.
[5] NGUYEN C D, CARLIN J B, LEE K J. Model checking in multiple imputation:An overview and case study[J]. Emerging Themes in Epidemiology, 2017, 14(1):1-12.
[6] ZHANG Z H. Missing data imputation:Focusing on single imputation[J]. Annals of Translational Medicine, 2016, 4(1):9.
[7] KHOSRAVI P, LIANG Y T, CHOI Y J, et al. What to expect of classifiers? Reasoning about logistic regression with missing features[C]//Twenty-Eighth International Joint Conference on Artificial Intelligence (IJCAI-19). Macao, China:IJCAI, 2019:2716-2724.
[8] SEFIDIAN A M, DANESHPOUR N. Missing value imputation using a novel grey based fuzzy c-means, mutual information based feature selection, and regression model[J]. Expert Systems with Applications, 2019, 115:68-94.
[9] ZHU X F, ZHANG S C, JIN Z, et al. Missing value estimation for mixed-attribute data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(1):110-121.
[10] LIU X W, ZHU X Z, LI M M, et al. Multiple kernel k-means with incomplete kernels[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, 42(5):1191-1204.
[11] CHEN X Y, YANG J M, SUN L J. A nonconvex low-rank tensor completion model for spatiotemporal traffic data imputation[J]. Transportation Research Part C:Emerging Technologies, 2020, 117:102673.
[12] ZHU Y M, LI Z, ZHU H Z, et al. A compressive sensing approach to urban traffic estimation with probe vehicles[J]. IEEE Transactions on Mobile Computing, 2013, 12(11):2289-2302.
[13] YU J R, STETTLER M E J, ANGELOUDIS P, et al. Urban network-wide traffic speed estimation with massive ride-sourcing GPS traces[J]. Transportation Research Part C:Emerging Technologies, 2020, 112:136-152.
[14] HUANG J L, KEUNG J W, SARRO F, et al. Cross-validation based K nearest neighbor imputation for software quality datasets:An empirical study[J]. Journal of Systems and Software, 2017, 132:226-252.
[15] AL-HELALI B, CHEN Q, XUE B, et al. A hybrid GP-KNN imputation for symbolic regression with missing values[C]//Australasian Joint Conference on Artificial Intelligence. Cham:Springer, 2018:345-357.
[16] CHOUDHURY S J, PAL N R. Imputation of missing data with neural networks for classification[J]. Knowledge-Based Systems, 2019, 182:104838.
[17] YOON J, JORDON J, VAN DER SCHAAR M. GAIN:Missing data imputation using generative adversarial nets[C]//Proceeding of the 35th International Conference on Machine Learning. Stockholm, Sweden:PMLR, 2018:5689-5698.
[18] MA Q, LEE W C, FU T Y, et al. MIDIA:Exploring denoising autoencoders for missing data imputation[J]. Data Mining and Knowledge Discovery, 2020, 34(6):1859-1897.
[19] LIN W C, TSAI C F, ZHONG J R. Deep learning for missing value imputation of continuous data and the effect of data discretization[J]. Knowledge-Based Systems, 2022, 239:108079.
[20] HE K M, CHEN X L, XIE S N, et al. Masked autoencoders are scalable vision learners[C]//2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans, USA:IEEE, 2022:16000-16009.
[21] BADSHA M B, LI R, LIU B X, et al. Imputation of single-cell gene expression with an autoencoder neural network[J]. Quantitative Biology, 2020, 8(1):78-94.
[22] BANSAL P, DESHPANDE P, SARAWAGI S. Missing value imputation on multidimensional time series[J]. Proceedings of the VLDB Endowment, 2021, 14(11):2533-2545.
[23] 孟小峰, 慈祥. 大数据管理:概念、技术与挑战[J]. 计算机研究与发展, 2013, 50(1):146-169. MENG X F, CI X. Big data management:Concept, techniques and challenge[J]. Journal of Computer Research and Development, 2013, 50(1):146-169. (in Chinese)
[24] CHEN M, MAO S, ZHANG Y, et al. Big data storage[M]. Cham:Springer, 2014:33-49.
[25] SHVACHKO K, KUANG H R, RADIA S, et al. The hadoop distributed file system[C]//2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). Incline Village, USA:IEEE, 2010:1-10.
[26] DEAN J, GHEMAWAT S. MapReduce:Simplified data processing on large clusters[C]//Proceedings of the 6th Conference on Symposium on Operating Systems Design & Implementation. San Francisco, USA:ACM, 2004:137-150.
[27] 黄哲学, 何玉林, 魏丞昊, 等. 大数据随机样本划分模型及相关分析计算技术[J]. 数据采集与处理, 2019, 34(3):373-385. HUANG Z X, HE Y L, WEI C H, et al. Random sample partition data model and related technologies for big data analysis[J]. Journal of Data Acquisition and Processing, 2019, 34(3):373-385. (in Chinese)
[28] STRIKE K, EL-EMAM K, MADHAVJI N. Software cost estimation with incomplete data[J]. IEEE Transactions on Software Engineering, 2001, 27(10):890-908.
[29] ZHANG S C, QIN Z, LING C X, et al. "Missing is useful":Missing values in cost-sensitive decision trees[J]. IEEE Transactions on Knowledge and Data Engineering, 2005, 17(12):1689-1693.
[30] LITTLE R J A, RUBIN D B. Statistical analysis with missing data[M]. Hoboken:John Wiley & Sons, 2019.
[31] MORRIS T P, WHITE I R, ROYSTON P. Tuning multiple imputation by predictive mean matching and local residual draws[J]. BMC Medical Research Methodology, 2014, 14(1):75.
[32] LI D W, ZHANG H Q, LI T R, et al. Hybrid missing value imputation algorithms using fuzzy C-means and vaguely quantified rough set[J]. IEEE Transactions on Fuzzy Systems, 2022, 30(5):1396-1408.
[33] REZVAN P H, LEE K J, SIMPSON J A. The rise of multiple imputation:A review of the reporting and implementation of the method in medical research[J]. BMC Medical Research Methodology, 2015, 15(1):30.
[34] HUGHES R A, HERON J, STERNE J A C, et al. Accounting for missing data in statistical analyses:Multiple imputation is not always the answer[J]. International Journal of Epidemiology, 2019, 48(4):1294-1304.
[35] ZHANG S C. Nearest neighbor selection for iteratively KNN imputation[J]. Journal of Systems and Software, 2012, 85(11):2541-2552.
[36] KEERIN P, KURUTACH W, BOONGOEN T. Cluster-based KNN missing value imputation for DNA microarray data[C]//2012 IEEE International Conference on Systems, Man, and Cybernetics (SMC). Korea:IEEE, 2012:445-450.
[37] VINCENT P, LAROCHELLE H, BENGIO Y, et al. Extracting and composing robust features with denoising autoencoders[C]//Proceedings of the 25th International Conference on Machine Learning. Helsinki, Finland:ACM, 2008:1096-1103.
[38] GONDARA L, WANG K. MIDA:Multiple imputation using denoising autoencoders[C]//Proceeding of the 22nd Pacific-Asia Conference on Knowledge Discovery and Data Mining. Melbourne, Australia:Springer, 2018:260-272.
[39] SOVILJ D, EIROLA E, MICHE Y, et al. Extreme learning machine for missing data using multiple imputations[J]. Neurocomputing, 2016, 174:220-231.
[40] LU C B, MEI Y. An imputation method for missing data based on an extreme learning machine auto-encoder[J]. IEEE Access, 2018, 6:52930-52935.
[41] GARCÍA J C F, KALENATIC D, BELLO C A L. Missing data imputation in multivariate data by evolutionary algorithms[J]. Computers in Human Behavior, 2011, 27(5):1468-1474.
[42] ZHANG Y, ZHOU B H, CAI X R, et al. Missing value imputation in multivariate time series with end-to-end generative adversarial networks[J]. Information Sciences, 2021, 551:67-82.
[43] DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via theEM algorithm[J]. Journal of the Royal Statistical Society:Series B (Methodological), 1977, 39(1):1-22.
[44] RAHMAN M G, ISLAM M Z. Missing value imputation using a fuzzy clustering-based EM approach[J]. Knowledge and Information Systems, 2016, 46(2):389-422.
[45] GOLD M S, BENTLER P M. Treatments of missing data:A Monte Carlo comparison of RBHDI, iterative stochastic regression imputation, and expectation-maximization[J]. Structural Equation Modeling:A Multidisciplinary Journal, 2000, 7(3):319-355.
[46] KURUCZ M, BENCZỨR A A, CSALOGÁNY K. Methods for large scale SVD with missing values[C]//Proceedings of KDD Cup and Workshop. San Jose, USA:ACM, 2007, 12:31-38.
[47] MAZUMDER R, HASTIE T, TIBSHIRANI R. Spectral regularization algorithms for learning large incomplete matrices[J]. The Journal of Machine Learning Research, 2010, 11:2287-2322.
[48] ZHU X S, WANG J Y, SUN B, et al. An efficient ensemble method for missing value imputation in microarray gene expression data[J]. BMC Bioinformatics, 2021, 22(1):188.
[49] PETROZZIELLO A, JORDANOV I, SOMMEREGGER C.Distributed neural networks for missing big data imputation[C]//2018 International Joint Conference on Neural Networks (IJCNN). Rio de Janeiro, Brazil:IEEE, 2018:1-8.
[50] 何玉林, 黄德发, 戴德鑫, 等. 最大均方差异统计量的一般界[J]. 应用数学, 2021, 34(2):284-288. HE Y L, HUANG D F, DAI D X, et al. General bounds for maximum mean discrepancy statistics[J]. Mathematica Applicata, 2021, 34(2):284-288. (in Chinese)
[51] GRETTON A, BORGWARDT K M, RASCH M J, et al. A kernel two-sample test[J]. The Journal of Machine Learning Research, 2012, 13(1):723-773.
[52] TANG F, ISHWARAN H. Random forest missing data algorithms[J]. Statistical Analysis and Data Mining:The ASA Data Science Journal, 2017, 10(6):363-377.
[53] OpenIDEA-Yunan University, Ycimpute[EB/OL]. (2020-08-23)[2022-09-25]. https://github.com/OpenIDEA-YunanUniversity/ycimpute.
[54] RUBINSTEYN A, FELDMAN S. Fancyimpute:An imputation library for python[EB/OL].(2021-10-22)[2022-09-25]. https://github.com/iskandr/fancyimpute.
[55] UC Irvine Machine Leaning Repository. UCI machine learning repository[EB/OL]. (2021-05-02)[2022-09-25]. http://archive.ics.uci.edu/ml.
[56] TRIGUERO I, GONZÁLEZ S, MOYANO J M, et al. KEEL 3.0:An open source software for multi-stage analysis in data mining[J]. International Journal of Computational Intelligence Systems, 2017, 10(1):1238-1249.
[57] WHITE I R, ROYSTON P, WOOD A M. Multiple imputation using chained equations:Issues and guidance for practice[J]. Statistics in Medicine, 2011, 30(4):377-399.