Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2022, Vol. 62 Issue (9): 1450-1457    DOI: 10.16511/j.cnki.qhdxxb.2022.22.028
  过程系统工程 本期目录 | 过刊浏览 | 高级检索 |
基于机器学习的乙烯裂解过程模型比较与集成
赵祺铭1,2, 毕可鑫1,2,3, 邱彤1,2
1. 清华大学 化学工程系, 北京 100084;
2. 工业大数据系统与应用北京市重点实验室, 北京 100084;
3. 四川大学 化学工程学院, 成都 610065
Comparison and integration of machine learning based ethylene cracking process models
ZHAO Qiming1,2, BI Kexin1,2,3, QIU Tong1,2
1. Department of Chemical Engineering, Tsinghua University, Beijing 100084, China;
2. Beijing Key Laboratory of Industrial Big Data System and Application, Beijing 100084, China;
3. School of Chemical Engineering, Sichuan University, Chengdu 610065, China
全文: PDF(4808 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 乙烯是石油化工的重要产品,蒸汽裂解生产乙烯的工艺十分复杂。构建精确的石脑油裂解模型,可以实现石脑油裂解制乙烯过程的裂解深度快速、准确预测。该文比较了支持向量回归、k-近邻回归和极限梯度提升3种机器学习模型。通过具有噪声的基于密度的聚类算法(DBSCAN)和局部异常因子检测算法,对工业数据集进行重要变量和样本筛选,训练3个子模型,并构建集成模型以提高预测效果。集成模型结合各子模型的优势,减轻过拟合、对噪声敏感等不足,加强稳定性与泛化能力。实测集成模型的预测值R2为0.955,平均绝对百分比误差约为0.23%,满足过程研究和工业应用的实际需求。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
赵祺铭
毕可鑫
邱彤
关键词 机器学习支持向量回归k-近邻回归极限梯度提升(XGBoost)集成学习乙烯裂解    
Abstract:Ethylene is an essential petrochemical industry product produced in a complex steam cracking process. Fast, accurate predictions of ethylene cracking depths depend on accurate naphtha cracking models. This paper compares three machine learning models based on a support vector regression (SVR), a k-nearest neighbor regression, and an extreme gradient boosting (XGBoost) to predict the ethylene cracking depth. Several industrial datasets are screened to identify the critical variables controlling the process using the density-based spatial clustering of applications with noise (DBSCAN) and a local abnormal factor detection algorithm. These three models are then trained and combined into an ensemble model to provide better predictions. The ensemble model combines the advantages of the three models and reduces the overfitting, the sensitivity to noise and other shortcomings. The ensemble model then has better prediction stability and generalization ability. The ensemble model predictions have R2=0.955 and an average absolute percentage error of about 0.23%, which is sufficient for process research and industrial applications.
Key wordsmachine learning    support vector regression    k-nearest neighbor regression    extreme gradient boosting (XGBoost)    ensemble learning    ethylene cracking
收稿日期: 2022-01-10      出版日期: 2022-08-18
基金资助:邱彤,教授,E-mail:qiutong@tsinghua.edu.cn
引用本文:   
赵祺铭, 毕可鑫, 邱彤. 基于机器学习的乙烯裂解过程模型比较与集成[J]. 清华大学学报(自然科学版), 2022, 62(9): 1450-1457.
ZHAO Qiming, BI Kexin, QIU Tong. Comparison and integration of machine learning based ethylene cracking process models. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1450-1457.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2022.22.028  或          http://jst.tsinghuajournals.com/CN/Y2022/V62/I9/1450
  
  
  
  
  
  
  
  
[1] 胡杰, 王松汉, 中国石油天然气股份有限公司石油化工研究院. 乙烯工艺与原料[M]. 北京: 化学工业出版社, 2018. HU J, WANG S H, CNPC Research Institute of Petrochemical Technology. Ethylene production process and raw materials[M]. Beijing: Chemical Industry Press, 2018. (in Chinese)
[2] PLEHIERS P P, SYMOENS S H, AMGHIZAR I, et al. Artificial intelligence in steam cracking modeling: A deep learning algorithm for detailed effluent prediction[J]. Engineering, 2019, 5(6): 1027-1040.
[3] 李航. 统计学习方法[M]. 北京: 清华大学出版社, 2012. LI H. Statistical learning methods[M]. Beijing: Tsinghua University Press, 2012. (in Chinese)
[4] 朱哲熹. 基于支持向量机的石脑油裂解制乙烯过程软测量建模研究[D]. 北京: 北京化工大学, 2019. ZHU Z X. Research on soft sensor modelling method using support vector regression (SVR) in naphtha cracking processes[D]. Beijing: Beijing University of Chemical Technology, 2019. (in Chinese)
[5] MAHDIANI M R, KHAMEHCHI E, HAJIREZAIE S, et al. Modeling viscosity of crude oil using k-nearest neighbor algorithm[J]. Advances in Geo-Energy Research, 2020, 4(4): 435-447.
[6] SERFIDAN A C, TVRKAY M. Explanatory and predictive analysis of naphtha splitter products[J]. Computer Aided Chemical Engineering, 2021, 50: 1-6.
[7] GÓMEZ-RÍOS A, LUENGO J, HERRERA F. A study on the noise label influence in boosting algorithms: AdaBoost, GBM and XGBoost[C]//12th International Conference on Hybrid Artificial Intelligent Systems. La Rioja, Spain, 2017: 268-280.
[8] BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: Identifying density-based local outliers[C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas, USA, 2000: 93-104.
[9] PENG H C, LONG F H, DING C. Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2005, 27(8): 1226-1238.
[10] 陈贵华, 王昕, 王振雷, 等. 基于模糊核聚类的乙烯裂解深度DE-LSSVM多模型建模[J]. 化工学报, 2012, 63(6): 1790-1796. CHEN G H, WANG X, WANG Z L, et al. Multiple DE-LSSVM modeling of ethylene cracking severity based on fuzzy kernel clustering[J]. CIESC Journal, 2012, 63(6):1790-1796. (in Chinese)
[11] MOGHADASI M, OZGOLI H A, FARHANI F. A machine learning-based operational control framework for reducing energy consumption of an amine-based gas sweetening process[J]. International Journal of Energy Research, 2021, 45(1): 1055-1068.
[12] SCHUBERT E, SANDER J, ESTER M, et al. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN[J]. ACM Transactions on Database Systems, 2017, 42(3): 19.
[13] RAHMAH N, SITANGGANG I S. Determination of optimal epsilon (Eps) value on DBSCAN algorithm to clustering data on peatland hotspots in Sumatra[J]. IOP Conference Series: Earth and Environmental Science, 2016, 31(1): 012012.
[14] LAMESKI P, ZDRAVEVSKI E, MINGOV R, et al. SVM parameter tuning with grid search and its impact on reduction of model over-fitting[M]//YAO Y J, HU Q H, YU H, et al. Rough sets, fuzzy sets, data mining, and granular computing. Cham, Switzerland: Springer, 2015: 464-474.
[15] CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016: 785-794.
[16] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016. ZHOU Z H. Machine learning[M]. Beijing: Tsinghua University Press, 2016. (in Chinese)
[17] PEDREGOSA F, VAROQUAUX G, GRAMFORT A, et al. Scikit-learn: Machine learning in Python[J]. The Journal of Machine Learning Research, 2011, 12: 2825-2830.
[18] ARLOT S, CELISSE A. A survey of cross-validation procedures for model selection[J]. Statistics Surveys, 2010, 4: 40-79.
[1] 平国楼, 曾婷玉, 叶晓俊. 基于评分迭代的无监督网络流量异常检测[J]. 清华大学学报(自然科学版), 2022, 62(5): 819-824.
[2] 曹来成, 李运涛, 吴蓉, 郭显, 冯涛. 多密钥隐私保护决策树评估方案[J]. 清华大学学报(自然科学版), 2022, 62(5): 862-870.
[3] 王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬. 面向新一代神威超级计算机的高效内存分配器[J]. 清华大学学报(自然科学版), 2022, 62(5): 943-951.
[4] 陆思聪, 李春文. 基于场景与话题的聊天型人机会话系统[J]. 清华大学学报(自然科学版), 2022, 62(5): 952-958.
[5] 李维, 李城龙, 杨家海. As-Stream:一种针对波动数据流的算子智能并行化策略[J]. 清华大学学报(自然科学版), 2022, 62(12): 1851-1863.
[6] 刘强墨, 何旭, 周佰顺, 吴昊霖, 张弛, 秦羽, 沈晓梅, 高小榕. 基于机器学习和瞳孔响应的简易高性能自闭症分类模型[J]. 清华大学学报(自然科学版), 2022, 62(10): 1730-1738.
[7] 马晓悦, 孟啸. 用户参与视角下多图推文的图像位置和布局效应[J]. 清华大学学报(自然科学版), 2022, 62(1): 77-87.
[8] 汤志立, 王雪, 徐千军. 基于过采样和客观赋权法的岩爆预测[J]. 清华大学学报(自然科学版), 2021, 61(6): 543-555.
[9] 王志国, 章毓晋. 监控视频异常检测:综述[J]. 清华大学学报(自然科学版), 2020, 60(6): 518-529.
[10] 宋宇波, 祁欣妤, 黄强, 胡爱群, 杨俊杰. 基于二阶段多分类的物联网设备识别算法[J]. 清华大学学报(自然科学版), 2020, 60(5): 365-370.
[11] 宋宇波, 杨慧文, 武威, 胡爱群, 高尚. 软件定义网络DDoS联合检测系统[J]. 清华大学学报(自然科学版), 2019, 59(1): 28-35.
[12] 刘华平, 郑向梅, 孙富春. 基于雷达信息的室内移动机器人的方位估计[J]. 清华大学学报(自然科学版), 2018, 58(7): 609-613.
[13] 芦效峰, 蒋方朔, 周箫, 崔宝江, 伊胜伟, 沙晶. 基于API序列特征和统计特征组合的恶意样本检测框架[J]. 清华大学学报(自然科学版), 2018, 58(5): 500-508.
[14] 邹权臣, 张涛, 吴润浦, 马金鑫, 李美聪, 陈晨, 侯长玉. 从自动化到智能化:软件漏洞挖掘技术进展[J]. 清华大学学报(自然科学版), 2018, 58(12): 1079-1094.
[15] 方勇, 刘道胜, 黄诚. 基于层次聚类的虚假用户检测[J]. 清华大学学报(自然科学版), 2017, 57(6): 620-624.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn