Comparison and integration of machine learning based ethylene cracking process models
ZHAO Qiming1,2, BI Kexin1,2,3, QIU Tong1,2
1. Department of Chemical Engineering, Tsinghua University, Beijing 100084, China; 2. Beijing Key Laboratory of Industrial Big Data System and Application, Beijing 100084, China; 3. School of Chemical Engineering, Sichuan University, Chengdu 610065, China
Abstract:Ethylene is an essential petrochemical industry product produced in a complex steam cracking process. Fast, accurate predictions of ethylene cracking depths depend on accurate naphtha cracking models. This paper compares three machine learning models based on a support vector regression (SVR), a k-nearest neighbor regression, and an extreme gradient boosting (XGBoost) to predict the ethylene cracking depth. Several industrial datasets are screened to identify the critical variables controlling the process using the density-based spatial clustering of applications with noise (DBSCAN) and a local abnormal factor detection algorithm. These three models are then trained and combined into an ensemble model to provide better predictions. The ensemble model combines the advantages of the three models and reduces the overfitting, the sensitivity to noise and other shortcomings. The ensemble model then has better prediction stability and generalization ability. The ensemble model predictions have R2=0.955 and an average absolute percentage error of about 0.23%, which is sufficient for process research and industrial applications.
赵祺铭, 毕可鑫, 邱彤. 基于机器学习的乙烯裂解过程模型比较与集成[J]. 清华大学学报(自然科学版), 2022, 62(9): 1450-1457.
ZHAO Qiming, BI Kexin, QIU Tong. Comparison and integration of machine learning based ethylene cracking process models. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1450-1457.
[1] 胡杰, 王松汉, 中国石油天然气股份有限公司石油化工研究院. 乙烯工艺与原料[M]. 北京: 化学工业出版社, 2018. HU J, WANG S H, CNPC Research Institute of Petrochemical Technology. Ethylene production process and raw materials[M]. Beijing: Chemical Industry Press, 2018. (in Chinese) [2] PLEHIERS P P, SYMOENS S H, AMGHIZAR I, et al. Artificial intelligence in steam cracking modeling: A deep learning algorithm for detailed effluent prediction[J]. Engineering, 2019, 5(6): 1027-1040. [3] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012. LI H. Statistical learning methods[M]. Beijing: Tsinghua University Press, 2012. (in Chinese) [4] 朱哲熹. 基于支持向量机的石脑油裂解制乙烯过程软测量建模研究[D]. 北京: 北京化工大学, 2019. ZHU Z X. Research on soft sensor modelling method using support vector regression (SVR) in naphtha cracking processes[D]. Beijing: Beijing University of Chemical Technology, 2019. (in Chinese) [5] MAHDIANI M R, KHAMEHCHI E, HAJIREZAIE S, et al. Modeling viscosity of crude oil using k-nearest neighbor algorithm[J]. Advances in Geo-Energy Research, 2020, 4(4): 435-447. [6] SERFIDAN A C, TVRKAY M. Explanatory and predictive analysis of naphtha splitter products[J]. Computer Aided Chemical Engineering, 2021, 50: 1-6. [7] GÓMEZ-RÍOS A, LUENGO J, HERRERA F. A study on the noise label influence in boosting algorithms: AdaBoost, GBM and XGBoost[C]//12th International Conference on Hybrid Artificial Intelligent Systems. La Rioja, Spain, 2017: 268-280. [8] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000: 93-104. [9] PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238. [10] 陈贵华, 王昕, 王振雷, 等. 基于模糊核聚类的乙烯裂解深度DE-LSSVM多模型建模[J]. 化工学报, 2012, 63(6): 1790-1796. CHEN G H, WANG X, WANG Z L, et al. Multiple DE-LSSVM modeling of ethylene cracking severity based on fuzzy kernel clustering[J]. CIESC Journal, 2012, 63(6):1790-1796. (in Chinese) [11] MOGHADASI M, OZGOLI H A, FARHANI F. A machine learning-based operational control framework for reducing energy consumption of an amine-based gas sweetening process[J]. International Journal of Energy Research, 2021, 45(1): 1055-1068. [12] SCHUBERT E, SANDER J, ESTER M, et al. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN[J]. ACM Transactions on Database Systems, 2017, 42(3): 19. [13] RAHMAH N, SITANGGANG I S. Determination of optimal epsilon (Eps) value on DBSCAN algorithm to clustering data on peatland hotspots in Sumatra[J]. IOP Conference Series: Earth and Environmental Science, 2016, 31(1): 012012. [14] LAMESKI P, ZDRAVEVSKI E, MINGOV R, et al. SVM parameter tuning with grid search and its impact on reduction of model over-fitting[M]//YAO Y J, HU Q H, YU H, et al. Rough sets, fuzzy sets, data mining, and granular computing. Cham, Switzerland: Springer, 2015: 464-474. [15] CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016: 785-794. [16] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. ZHOU Z H. Machine learning[M]. Beijing: Tsinghua University Press, 2016. (in Chinese) [17] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit-learn: Machine learning in Python[J]. The Journal of Machine Learning Research, 2011, 12: 2825-2830. [18] ARLOT S, CELISSE A. A survey of cross-validation procedures for model selection[J]. Statistics Surveys, 2010, 4: 40-79.