Machine learning algorithm for a homomorphic encrypted data set
JIA Chunfu1,2, WANG Yafei1,2, CHEN Yang1, SUN Mengjie1, GE Fengyi1
1. College of Cyberspace Security, Nankai University, Tianjin 300350, China; 2. Tianjin Key Laboratory of Network and Data Security Technology, Tianjin 300350, China
Abstract:The continuous development of big data requires that data be stored and analyzed in the cloud, which leads to privacy leakage of sensitive data. This paper presents a machine learning classification algorithm for homomorphic encrypted data sets. Firstly, preprocess the data set to meet the requirements of homomofphic encryption. The encrypted data set is then sorted by protocol and classified. Finally, the classification results are obtained. The client can then upload encrypted data and ensure that the server will not get any sensitive information. A homomorphic encryption algorithm is used to ensure that the server can still perform required operations on the ciphertext. Tests show that this scheme can provide accurate, useful results with Bayes, hyperplane and decision tree classifiers.
贾春福, 王雅飞, 陈阳, 孙梦洁, 葛凤仪. 机器学习算法在同态加密数据集上的应用[J]. 清华大学学报(自然科学版), 2020, 60(6): 456-463.
JIA Chunfu, WANG Yafei, CHEN Yang, SUN Mengjie, GE Fengyi. Machine learning algorithm for a homomorphic encrypted data set. Journal of Tsinghua University(Science and Technology), 2020, 60(6): 456-463.
[1] BOST R, ADA POPA R, TU S, et al. Machine learning classification over encrypted data[C]//22nd Annual Network and Distributed System Security Symposium. San Diego, USA:MIT CSAIL, 2015:186-219. [2] BOST R, POPA R A, TU S, et al. Machine learning classification over encrypted data[C]//22nd Annual Network and Distributed System Security Symposium. San Diego, USA:MIT CSAIL, 2015:4325. [3] BARNI M, FAILLA P, LAZZERETTI R, et al. Privacy-preserving ECG classification with branching programs and neural networks[J]. IEEE Transactions on Information Forensics and Security, 2011, 6(2):452-468. [4] GRAEPEL T, LAUTER K, NAEHRIG M. ML confidential:Machine learning on encrypted data[M]//KWON T, LEE M K, KWON D. Information Security and Cryptology. Berlin, Germany:Springer, 2012:1-21. [5] CARPOV S, GAMA N, GEORGIEVA M, et al. Privacy-preserving semi-parallel logistic regression training with fully homomorphic encryption[J]. ePrint Archive, 2019:101. [20] VEUGEN T. Efficient coding for secure computing with additively-homomorphic encrypted data[J]. IACR Cryptology ePrint Archive, 2019:437. [6] 曹来成, 刘宇飞, 董晓晔, 等. 基于属性加密的用户隐私保护云存储方案[J]. 清华大学学报(自然科学版), 2018, 58(2):150-156. CAO L C, LIU Y F, DONG X Y, et al. User privacy-preserving cloud storage scheme on CP-ABE[J]. Journal of Tsinghua University (Science and Technology), 2018, 58(2):150-156. (in Chinese) [7] CHEON J H, JEONG J, KI D, et al. Privacy-preserving k-means clustering with multiple data owners[J]. IACR Cryptology ePrint Archive, 2019:466. [8] SO J, GULER B, AVESTIMEHR A S, et al. CodedPrivateML:A fast and privacy-preserving framework for distributed machine learning[Z]. arXiv:1902.00641, 2019. [9] KISS Á, NADERPOUR M, LIU J, et al. SoK:Modular and efficient private decision tree evaluation[J]. Proceedings on Privacy Enhancing Technologies, 2019(2):187-208. [10] BLOM F, BOUMAN N J, SCHOENMAKERS B, et al. Efficient secure ridge regression from randomized Gaussian elimination[Z]. IACR Cryptology ePrint Archive 2019/773, 2019. [11] 蒋林智, 许春香, 王晓芳, 等. (全)同态加密在基于密文计算模型中的应用[J]. 密码学报, 2017, 4(6):596-610. JIANG L Z, XU C X, WANG X F, et al. Application of (fully) homomorphic encryption for encrypted computing models[J]. Journal of Cryptologic Research, 2017, 4(6):596-610. (in Chinese) [12] 李增鹏, 马春光, 周红生. 全同态加密研究[J]. 密码学报, 2017, 4(6):561-578. LI Z P, MA C G, ZHOU H S. Overview on fully homomorphic encryption[J]. Journal of Cryptologic Research, 2017, 4(6):561-578. (in Chinese) [13] ACAR A, AKSU H, ULUAGAC A S, et al. A survey on homomorphic encryption schemes:Theory and implementation[J]. ACM Computing Surveys, 2018, 51(4):79. [14] BRAKERSKI Z, VAIKUNTANATHAN V. Fully homomorphic encryption from ring-LWE and security for key dependent messages[M]//ROGAWAY P. Advances in Cryptology. Berlin, Germany:Springer, 2011:505-524. [15] ELGAMAL T. A public key cryptosystem and a signature scheme based on discrete logarithms[M]//Advances in Cryptology. Berlin, Germany:Springer, 1985:10-18. [16] PAILLIER P. Public-key cryptosystems based on composite degree residuosity classes[M]//Advances in Cryptology-EUROCRYPT'99. Berlin, Germany:Springer, 1999:223-238. [17] GENTRY C. Fully homomorphic encryption using ideal lattices[C]//Proceedings of the 41st Annual ACM Symposium on Theory of Computing. Bethesda, USA:ACM, 2009:169-179. [18] BRAKERSKI Z, VAIKUNTANATHAN V. Efficient fully homomorphic encryption from (standard) LWE[C]//2011 IEEE 52nd Annual Symposium on Foundations of Computer Science. Palm Springs, USA:IEEE, 2011:97-106. [19] VAIDYA J, KANTARCIOĞLU M, CLIFTON C. Privacy-preserving naive Bayes classification[J]. The VLDB Journal, 2008, 17(4):879-898. [20] LAUR S, LIPMAA H, MIELIKÄINEN T. Cryptographically private support vector machines[C]//Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Philadelphia, USA:ACM, 2006:618-624. [21] BLUM A, DWORK C, MCSHERRY F, et al. Practical privacy:The SuLQ framework[C]//Proceedings of the Twenty-Fourth Symposium on Principles of Database Systems, Baltimore, USA, 2005:128-138. [22] Cortes C, Vapnik V. Support-vector networks[J]. Machine Learning, 1995, 20(3):273-297. [23] VEUGEN T. Comparing encrypted data[EB/OL]. (2011). https://www.researchgate.net/publication/266527434_COMPARING_EnCRYPTED_DATA. [24] AVIDAN S, BUTMAN M. Efficient methods for privacy preserving face detection[C]//Advances in Neural Information Processing Systems. Cambridge, USA:MIT Press, 2006:57. [25] BACHE K, LICHMAN M. UCI machine learning repository[EB/OL].[2019-07-26]. https://archive.ics.uci.edu/ml/index.php.