基于机器学习的乙烯裂解过程模型比较与集成

doi:10.16511/j.cnki.qhdxxb.2022.22.028

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(4808 KB) HTML
输出: BibTeX | EndNote (RIS)

摘要乙烯是石油化工的重要产品,蒸汽裂解生产乙烯的工艺十分复杂。构建精确的石脑油裂解模型,可以实现石脑油裂解制乙烯过程的裂解深度快速、准确预测。该文比较了支持向量回归、k-近邻回归和极限梯度提升3种机器学习模型。通过具有噪声的基于密度的聚类算法(DBSCAN)和局部异常因子检测算法,对工业数据集进行重要变量和样本筛选,训练3个子模型,并构建集成模型以提高预测效果。集成模型结合各子模型的优势,减轻过拟合、对噪声敏感等不足,加强稳定性与泛化能力。实测集成模型的预测值R²为0.955,平均绝对百分比误差约为0.23%,满足过程研究和工业应用的实际需求。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	赵祺铭
	毕可鑫
	邱彤

关键词 ：机器学习, 支持向量回归, k-近邻回归, 极限梯度提升(XGBoost), 集成学习, 乙烯裂解

Abstract：Ethylene is an essential petrochemical industry product produced in a complex steam cracking process. Fast, accurate predictions of ethylene cracking depths depend on accurate naphtha cracking models. This paper compares three machine learning models based on a support vector regression (SVR), a k-nearest neighbor regression, and an extreme gradient boosting (XGBoost) to predict the ethylene cracking depth. Several industrial datasets are screened to identify the critical variables controlling the process using the density-based spatial clustering of applications with noise (DBSCAN) and a local abnormal factor detection algorithm. These three models are then trained and combined into an ensemble model to provide better predictions. The ensemble model combines the advantages of the three models and reduces the overfitting, the sensitivity to noise and other shortcomings. The ensemble model then has better prediction stability and generalization ability. The ensemble model predictions have R²=0.955 and an average absolute percentage error of about 0.23%, which is sufficient for process research and industrial applications.

Key words： machine learning support vector regression k-nearest neighbor regression extreme gradient boosting (XGBoost) ensemble learning ethylene cracking

收稿日期: 2022-01-10 出版日期: 2022-08-18

基金资助:邱彤,教授,E-mail:qiutong@tsinghua.edu.cn

引用本文:

赵祺铭, 毕可鑫, 邱彤. 基于机器学习的乙烯裂解过程模型比较与集成[J]. 清华大学学报（自然科学版）, 2022, 62(9): 1450-1457.
ZHAO Qiming, BI Kexin, QIU Tong. Comparison and integration of machine learning based ethylene cracking process models. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1450-1457.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2022.22.028 或 http://jst.tsinghuajournals.com/CN/Y2022/V62/I9/1450

[1] 胡杰, 王松汉, 中国石油天然气股份有限公司石油化工研究院. 乙烯工艺与原料[M]. 北京: 化学工业出版社, 2018. HU J, WANG S H, CNPC Research Institute of Petrochemical Technology. Ethylene production process and raw materials[M]. Beijing: Chemical Industry Press, 2018. (in Chinese)
[2] PLEHIERS P P, SYMOENS S H, AMGHIZAR I, et al. Artificial intelligence in steam cracking modeling: A deep learning algorithm for detailed effluent prediction[J]. Engineering, 2019, 5(6): 1027-1040.
[3] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012. LI H. Statistical learning methods[M]. Beijing: Tsinghua University Press, 2012. (in Chinese)
[4] 朱哲熹. 基于支持向量机的石脑油裂解制乙烯过程软测量建模研究[D]. 北京: 北京化工大学, 2019. ZHU Z X. Research on soft sensor modelling method using support vector regression (SVR) in naphtha cracking processes[D]. Beijing: Beijing University of Chemical Technology, 2019. (in Chinese)
[5] MAHDIANI M R, KHAMEHCHI E, HAJIREZAIE S, et al. Modeling viscosity of crude oil using k-nearest neighbor algorithm[J]. Advances in Geo-Energy Research, 2020, 4(4): 435-447.
[6] SERFIDAN A C, TVRKAY M. Explanatory and predictive analysis of naphtha splitter products[J]. Computer Aided Chemical Engineering, 2021, 50: 1-6.
[7] GÓMEZ-RÍOS A, LUENGO J, HERRERA F. A study on the noise label influence in boosting algorithms: AdaBoost, GBM and XGBoost[C]//12th International Conference on Hybrid Artificial Intelligent Systems. La Rioja, Spain, 2017: 268-280.
[8] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000: 93-104.
[9] PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238.
[10] 陈贵华, 王昕, 王振雷, 等. 基于模糊核聚类的乙烯裂解深度DE-LSSVM多模型建模[J]. 化工学报, 2012, 63(6): 1790-1796. CHEN G H, WANG X, WANG Z L, et al. Multiple DE-LSSVM modeling of ethylene cracking severity based on fuzzy kernel clustering[J]. CIESC Journal, 2012, 63(6):1790-1796. (in Chinese)
[11] MOGHADASI M, OZGOLI H A, FARHANI F. A machine learning-based operational control framework for reducing energy consumption of an amine-based gas sweetening process[J]. International Journal of Energy Research, 2021, 45(1): 1055-1068.
[12] SCHUBERT E, SANDER J, ESTER M, et al. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN[J]. ACM Transactions on Database Systems, 2017, 42(3): 19.
[13] RAHMAH N, SITANGGANG I S. Determination of optimal epsilon (Eps) value on DBSCAN algorithm to clustering data on peatland hotspots in Sumatra[J]. IOP Conference Series: Earth and Environmental Science, 2016, 31(1): 012012.
[14] LAMESKI P, ZDRAVEVSKI E, MINGOV R, et al. SVM parameter tuning with grid search and its impact on reduction of model over-fitting[M]//YAO Y J, HU Q H, YU H, et al. Rough sets, fuzzy sets, data mining, and granular computing. Cham, Switzerland: Springer, 2015: 464-474.
[15] CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016: 785-794.
[16] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. ZHOU Z H. Machine learning[M]. Beijing: Tsinghua University Press, 2016. (in Chinese)
[17] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit-learn: Machine learning in Python[J]. The Journal of Machine Learning Research, 2011, 12: 2825-2830.
[18] ARLOT S, CELISSE A. A survey of cross-validation procedures for model selection[J]. Statistics Surveys, 2010, 4: 40-79.

[1]	吴浩, 牛风雷. 高温球床辐射传热中的机器学习模型[J]. 清华大学学报（自然科学版）, 2023, 63(8): 1213-1218.
[2]	代鑫, 黄弘, 汲欣愉, 王巍. 基于机器学习的城市暴雨内涝时空快速预测模型[J]. 清华大学学报（自然科学版）, 2023, 63(6): 865-873.
[3]	任建强, 崔亚鹏, 倪顺江. 基于机器学习的新冠肺炎疫情趋势预测方法[J]. 清华大学学报（自然科学版）, 2023, 63(6): 1003-1011.
[4]	安健, 陈宇轩, 苏星宇, 周华, 任祝寅. 机器学习在湍流燃烧及发动机中的应用与展望[J]. 清华大学学报（自然科学版）, 2023, 63(4): 462-472.
[5]	平国楼, 曾婷玉, 叶晓俊. 基于评分迭代的无监督网络流量异常检测[J]. 清华大学学报（自然科学版）, 2022, 62(5): 819-824.
[6]	曹来成, 李运涛, 吴蓉, 郭显, 冯涛. 多密钥隐私保护决策树评估方案[J]. 清华大学学报（自然科学版）, 2022, 62(5): 862-870.
[7]	王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬. 面向新一代神威超级计算机的高效内存分配器[J]. 清华大学学报（自然科学版）, 2022, 62(5): 943-951.
[8]	陆思聪, 李春文. 基于场景与话题的聊天型人机会话系统[J]. 清华大学学报（自然科学版）, 2022, 62(5): 952-958.
[9]	李维, 李城龙, 杨家海. As-Stream：一种针对波动数据流的算子智能并行化策略[J]. 清华大学学报（自然科学版）, 2022, 62(12): 1851-1863.
[10]	刘强墨, 何旭, 周佰顺, 吴昊霖, 张弛, 秦羽, 沈晓梅, 高小榕. 基于机器学习和瞳孔响应的简易高性能自闭症分类模型[J]. 清华大学学报（自然科学版）, 2022, 62(10): 1730-1738.
[11]	马晓悦, 孟啸. 用户参与视角下多图推文的图像位置和布局效应[J]. 清华大学学报（自然科学版）, 2022, 62(1): 77-87.
[12]	汤志立, 王雪, 徐千军. 基于过采样和客观赋权法的岩爆预测[J]. 清华大学学报（自然科学版）, 2021, 61(6): 543-555.
[13]	王志国, 章毓晋. 监控视频异常检测：综述[J]. 清华大学学报（自然科学版）, 2020, 60(6): 518-529.
[14]	宋宇波, 祁欣妤, 黄强, 胡爱群, 杨俊杰. 基于二阶段多分类的物联网设备识别算法[J]. 清华大学学报（自然科学版）, 2020, 60(5): 365-370.
[15]	宋宇波, 杨慧文, 武威, 胡爱群, 高尚. 软件定义网络DDoS联合检测系统[J]. 清华大学学报（自然科学版）, 2019, 59(1): 28-35.

Viewed

Full text

Abstract

Cited

Shared

Discussed