基于机器学习和数据增强的洪灾人口转移需求预测方法

王沐林, 吕伟, 杨晓婷, 杨婷, 张雅静

清华大学学报(自然科学版) ›› 2026, Vol. 66 ›› Issue (1) : 160-168.

PDF(2550 KB)
PDF(2550 KB)
清华大学学报(自然科学版) ›› 2026, Vol. 66 ›› Issue (1) : 160-168. DOI: 10.16511/j.cnki.qhdxxb.2025.22.034
公共安全

基于机器学习和数据增强的洪灾人口转移需求预测方法

作者信息 +

Prediction method based on machine learning and data augmentation for population relocation demand during floods

Author information +
文章历史 +

摘要

洪涝灾害应急管理中, 准确预测转移安置人口数量对提升灾后响应效率至关重要。该文基于2014—2018年全国范围内暴雨洪涝灾害数据, 构建了结合特征选择与数据增强的极端梯度提升(XGBoost)转移人数预测模型。所用数据涵盖历史洪涝灾害的气象信息、地理要素等多维特征。通过Shapley可加性特征解释(SHAP)方法和递归特征添加方法, 模型能有效获取灾害发生时的关键影响因素; 通过加权k近邻的Gauss噪声注入方法, 模型显著提升了泛化能力与鲁棒性。实验结果显示:数据增强显著提升了模型在测试集上的表现, R2由0.854提高至0.967, RMSE由0.296降至0.123, 模型展现出更强的预测精度和更低的误差。该研究可为暴雨洪涝灾害中应急救援物资的科学配置与高效调度提供决策参考。

Abstract

Objective: This study focuses on the critical task of predicting the number of people to be evacuated (i.e., relocation number) during flood disasters. Accurate predictions of relocation numbers are vital for ensuring timely resource allocation and efficient disaster management, particularly in flood-prone areas where rapid decision-making can drastically mitigate the adverse impacts of the disaster. Methods: This research developed a robust relocation number prediction framework that combines feature selection and data augmentation techniques using the extreme gradient boosting (XGBoost) model, a widely used gradient-boosting machine learning algorithm. The model was built using historical data from flood events across China between 2014 and 2018. These events included meteorological and geographical features and the relocation number during each disaster. Feature selection was accomplished using Shapley additive explanations (SHAP), a game theory method for measuring the contribution of each feature to the model predictions. The selected features were then fed into the XGBoost model for training. A data augmentation strategy was also introduced to handle the challenge of limited training samples. This strategy involved the injection of Gaussian noise using a weighted k-nearest neighbors method to generate synthetic data points that preserved the local structure of the data, thereby enhancing the model's robustness and generalization ability. Results: The study demonstrates that the XGBoost model performs well with the selected features and augmented data. Initially, the model is trained on a small dataset, leading to satisfactory accuracy but limited generalization ability. However, after applying data augmentation, the model's performance significantly improves, especially for extreme values in the data. The testing phase reveals that R2 improves from 0.854 to 0.967, indicating a substantial increase in the model's predictive accuracy. Additionally, the root mean square error decreases from 0.296 to 0.123, signifying a considerable reduction in prediction error. These results highlight the effectiveness of combining feature selection and data augmentation to enhance the predictive power of the model. The feature selection process, guided by SHAP, identifies several key predictors that play a dominant role in determining population relocation demand. Among the most influential features are the maximum 3-day cumulative rainfall (MCR) and the maximum cumulative rainfall over the 15 days prior to the event (MRPE). These features are the most important in predicting the relocation number during flood events. Conclusions: The proposed relocation number prediction framework, integrating feature selection through SHAP and data augmentation techniques, is a highly effective tool for forecasting the relocation number during flood disasters. The XGBoost model, after optimization through Bayesian hyperparameter tuning and data augmentation, demonstrates significantly improved prediction accuracy and robustness. This approach can be instrumental in supporting disaster management teams with more reliable forecasts, allowing for better planning and more timely deployment of resources. Improving the model's ability to generalize to unseen data ensures accurate predictions even in regions with limited historical data. Thus, this study provides a valuable decision-making support tool for emergency response teams, helping to streamline resource allocation and evacuation planning during flood disasters and thereby minimizing the impact of the disaster on human lives and infrastructure.

关键词

暴雨洪涝灾害 / 极端梯度提升(XGBoost) / 转移安置人数预测 / 特征选择 / 数据增强

Key words

heavy rainfall and flooding disaster / extreme gradient boosting (XGBoost) / relocation number prediction / feature selection / data augmentation

引用本文

导出引用
王沐林, 吕伟, 杨晓婷, . 基于机器学习和数据增强的洪灾人口转移需求预测方法[J]. 清华大学学报(自然科学版). 2026, 66(1): 160-168 https://doi.org/10.16511/j.cnki.qhdxxb.2025.22.034
Mulin WANG, Wei LÜ, Xiaoting YANG, et al. Prediction method based on machine learning and data augmentation for population relocation demand during floods[J]. Journal of Tsinghua University(Science and Technology). 2026, 66(1): 160-168 https://doi.org/10.16511/j.cnki.qhdxxb.2025.22.034
中图分类号: X915.5   

参考文献

1
LIU Q , DU M , WANG Y P , et al. Global, regional and national trends and impacts of natural floods, 1990-2022[J]. Bulletin of the World Health Organization, 2024, 102 (6): 410- 420.
2
KUNDZEWICZ Z W , SU B , WANG Y J , et al. Flood risk and its reduction in China[J]. Advances in Water Resources, 2019, 130, 37- 45.
3
HEMMATI M , KORNHUBER K , KRUCZKIEWICZ A . Enhanced urban adaptation efforts needed to counter rising extreme rainfall risks[J]. npj Urban Sustainability, 2022, 2 (1): 16.
4
WANG Y Y . Multiperiod optimal allocation of emergency resources in support of cross-regional disaster sustainable rescue[J]. International Journal of Disaster Risk Science, 2021, 12 (3): 394- 409.
5
MONDAL T , BORAL N , BHATTACHARYA I , et al. Distribution of deficient resources in disaster response situation using particle swarm optimization[J]. International Journal of Disaster Risk Reduction, 2019, 41, 101308.
6
DONG L H, BAI Y B, XU Q S, et al. Optimizing the post-disaster resource allocation with Q-learning: Demonstration of 2021 China flood[C]//Proceedings of the 33rd International Conference on Database and Expert Systems Applications. Vienna, Austria: Springer, 2022: 256-262.
7
JAYAWARDENE V , HUGGINS T J , PRASANNA R , et al. The role of data and information quality during disaster response decision-making[J]. Progress in Disaster Science, 2021, 12, 100202.
8
YANG W C , YAN X , HU D , et al. A novel emergency evacuation route optimization model in flood disasters using hydrodynamic model and intelligent algorithm[J]. Safety Science, 2025, 182, 106709.
9
张琳, 王金玉, 王鑫, 等. 重大自然灾害下多灾害点应急物资智能调度优化[J]. 清华大学学报(自然科学版), 2023, 63 (5): 765- 774.
ZHANG L , WANG J Y , WANG X , et al. Intelligent dispatching optimization of emergency supplies to multidisaster areas in major natural disasters[J]. Journal of Tsinghua University (Science & Technology), 2023, 63 (5): 765- 774.
10
XU R, XIE B, GU X Q, et al. A survey on disaster prediction methods[C]//Proceedings of 2024 International Conference on Guidance, Navigation and Control (Volume 2) on Advances in Guidance, Navigation and Control. Singapore: Springer, 2025: 574-585.
11
徐宗学, 陈浩, 任梅芳, 等. 中国城市洪涝致灾机理与风险评估研究进展[J]. 水科学进展, 2020, 31 (5): 713- 724.
XU Z X , CHEN H , REN M F , et al. Progress on disaster mechanism and risk assessment of urban flood/waterlogging disasters in China[J]. Advances in Water Science, 2020, 31 (5): 713- 724.
12
LIN L , WU Z N , LIANG Q H . Urban flood susceptibility analysis using a GIS-based multi-criteria analysis framework[J]. Natural Hazards, 2019, 97 (2): 455- 475.
13
EKMEKCIOĞLU Ö , KOC K , ÖZGER M . Towards flood risk mapping based on multi-tiered decision making in a densely urbanized metropolitan city of Istanbul[J]. Sustainable Cities and Society, 2022, 80, 103759.
14
WU Z N , SHEN Y X , WANG H L . Assessing urban areas' vulnerability to flood disaster based on text data: A case study in Zhengzhou city[J]. Sustainability, 2019, 11 (17): 4548.
15
LIU W , ZHANG X , FENG Q , et al. City-scale integrated flood risk prediction under future climate change and urbanization based on the shared socioeconomic pathways (SSP) scenarios[J]. Journal of Hydrology, 2025, 655, 132971.
16
HAN F F , YU J S , ZHOU G H , et al. Projected urban flood risk assessment under climate change and urbanization based on an optimized multi-scale geographically weighted regression[J]. Sustainable Cities and Society, 2024, 112, 105642.
17
黄国如, 罗海婉, 陈文杰, 等. 广州东濠涌流域城市洪涝灾害情景模拟与风险评估[J]. 水科学进展, 2019, 30 (5): 643- 652.
HUANG G R , LUO H W , CHEN W J , et al. Scenario simulation and risk assessment of urban flood in Donghaochong basin, Guangzhou[J]. Advances in Water Science, 2019, 30 (5): 643- 652.
18
WANG Z L , LAI C G , CHEN X H , et al. Flood hazard risk assessment model based on random forest[J]. Journal of Hydrology, 2015, 527, 1130- 1141.
19
王德运, 张露丹, 吴祈. 基于社交媒体数据的城市暴雨洪涝灾害风险评估: 以郑州市"7·20"暴雨事件为例[J]. 安全与环境工程, 2024, 31 (3): 11-22, 46.
WANG D Y , ZHANG L D , WU Q . Urban storm flood disaster risk assessment based on social media data: A case study of the "7·20" rainstorm event in Zhengzhou City[J]. Safety and Environmental Engineering, 2024, 31 (3): 11-22, 46.
20
LI S P , LIN Y P , HUANG H . Relief supply-demand estimation based on social media in typhoon disasters using deep learning and a spatial information diffusion model[J]. ISPRS International Journal of Geo-Information, 2024, 13 (1): 29.
21
ZHANG H Z , ZHAO X H , FANG X , et al. Proactive resource request for disaster response: A deep learning-based optimization model[J]. Information Systems Research, 2024, 35 (2): 528- 550.
22
NGUYEN L , YANG Z , LI J , et al. Forecasting people's needs in hurricane events from social network[J]. IEEE Transactions on Big Data, 2022, 8 (1): 229- 240.
23
张颖, 杨晓婷, 韩业凡, 等. 暴雨洪涝灾害转移安置人数的组合预测模型研究[J]. 中国安全生产科学技术, 2024, 20 (3): 172- 180.
ZHANG Y , YANG X T , HAN Y F , et al. Study on combined prediction model for number of transferred and resettled people in rainstorm-flood disaster[J]. Journal of Safety Science and Technology, 2024, 20 (3): 172- 180.
24
HAN J Y , MIAO C Y , GOU J J , et al. A new daily gridded precipitation dataset for the Chinese mainland based on gauge observations[J]. Earth System Science Data, 2023, 15 (7): 3147- 3161.
25
YANG J , HUANG X . The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019[J]. Earth System Science Data, 2021, 13 (8): 3907- 3925.
26
YANG J L , DONG J W , XIAO X M , et al. Divergent shifts in peak photosynthesis timing of temperate and alpine grasslands in China[J]. Remote Sensing of Environment, 2019, 233, 111395.
27
GOU J J , MIAO C Y , DUAN Q Y , et al. Sensitivity analysis-based automatic parameter calibration of the VIC model for streamflow simulations over China[J]. Water Resources Research, 2020, 56 (1): e2019WR025968.
28
CHEN J D , GAO M , CHENG S L , et al. Global 1 km×1 km gridded revised real gross domestic product and electricity consumption during 1992-2019 based on calibrated nighttime light data[J]. Scientific Data, 2022, 9 (1): 202.
29
LLOYD C T , CHAMBERLAIN H , KERR D , et al. Global spatio-temporally harmonised datasets for producing high-resolution gridded population distribution datasets[J]. Big Earth Data, 2019, 3 (2): 108- 139.
30
崔玫意, 张玉虎, 陈秋华. Box-Cox正态分布及其在降雨极值分析中的应用[J]. 数理统计与管理, 2017, 36 (1): 8- 17.
CUI M Y , ZHANG Y H , CHEN Q H . Box-Cox normal distribution and its application in rainfall extreme value[J]. Journal of Applied Statistics and Management, 2017, 36 (1): 8- 17.
31
CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: Association for Computing Machinery, 2016: 785-794.

脚注

数据可用性说明

本文所有数据可以在合理的要求下联系第一作者后提供。

基金

国家自然科学基金面上项目(52072286)

版权

版权所有,未经授权,不得转载。
PDF(2550 KB)

审稿意见

Accesses

Citation

Detail

段落导航
相关文章

/