PDF(2550 KB)
Prediction method based on machine learning and data augmentation for population relocation demand during floods
Mulin WANG, Wei LÜ, Xiaoting YANG, Ting YANG, Yajing ZHANG
Journal of Tsinghua University(Science and Technology) ›› 2026, Vol. 66 ›› Issue (1) : 160-168.
PDF(2550 KB)
PDF(2550 KB)
Prediction method based on machine learning and data augmentation for population relocation demand during floods
Objective: This study focuses on the critical task of predicting the number of people to be evacuated (i.e., relocation number) during flood disasters. Accurate predictions of relocation numbers are vital for ensuring timely resource allocation and efficient disaster management, particularly in flood-prone areas where rapid decision-making can drastically mitigate the adverse impacts of the disaster. Methods: This research developed a robust relocation number prediction framework that combines feature selection and data augmentation techniques using the extreme gradient boosting (XGBoost) model, a widely used gradient-boosting machine learning algorithm. The model was built using historical data from flood events across China between 2014 and 2018. These events included meteorological and geographical features and the relocation number during each disaster. Feature selection was accomplished using Shapley additive explanations (SHAP), a game theory method for measuring the contribution of each feature to the model predictions. The selected features were then fed into the XGBoost model for training. A data augmentation strategy was also introduced to handle the challenge of limited training samples. This strategy involved the injection of Gaussian noise using a weighted k-nearest neighbors method to generate synthetic data points that preserved the local structure of the data, thereby enhancing the model's robustness and generalization ability. Results: The study demonstrates that the XGBoost model performs well with the selected features and augmented data. Initially, the model is trained on a small dataset, leading to satisfactory accuracy but limited generalization ability. However, after applying data augmentation, the model's performance significantly improves, especially for extreme values in the data. The testing phase reveals that R2 improves from 0.854 to 0.967, indicating a substantial increase in the model's predictive accuracy. Additionally, the root mean square error decreases from 0.296 to 0.123, signifying a considerable reduction in prediction error. These results highlight the effectiveness of combining feature selection and data augmentation to enhance the predictive power of the model. The feature selection process, guided by SHAP, identifies several key predictors that play a dominant role in determining population relocation demand. Among the most influential features are the maximum 3-day cumulative rainfall (MCR) and the maximum cumulative rainfall over the 15 days prior to the event (MRPE). These features are the most important in predicting the relocation number during flood events. Conclusions: The proposed relocation number prediction framework, integrating feature selection through SHAP and data augmentation techniques, is a highly effective tool for forecasting the relocation number during flood disasters. The XGBoost model, after optimization through Bayesian hyperparameter tuning and data augmentation, demonstrates significantly improved prediction accuracy and robustness. This approach can be instrumental in supporting disaster management teams with more reliable forecasts, allowing for better planning and more timely deployment of resources. Improving the model's ability to generalize to unseen data ensures accurate predictions even in regions with limited historical data. Thus, this study provides a valuable decision-making support tool for emergency response teams, helping to streamline resource allocation and evacuation planning during flood disasters and thereby minimizing the impact of the disaster on human lives and infrastructure.
heavy rainfall and flooding disaster / extreme gradient boosting (XGBoost) / relocation number prediction / feature selection / data augmentation
| 1 |
|
| 2 |
|
| 3 |
|
| 4 |
|
| 5 |
|
| 6 |
DONG L H, BAI Y B, XU Q S, et al. Optimizing the post-disaster resource allocation with Q-learning: Demonstration of 2021 China flood[C]//Proceedings of the 33rd International Conference on Database and Expert Systems Applications. Vienna, Austria: Springer, 2022: 256-262.
|
| 7 |
|
| 8 |
|
| 9 |
张琳, 王金玉, 王鑫, 等. 重大自然灾害下多灾害点应急物资智能调度优化[J]. 清华大学学报(自然科学版), 2023, 63 (5): 765- 774.
|
| 10 |
XU R, XIE B, GU X Q, et al. A survey on disaster prediction methods[C]//Proceedings of 2024 International Conference on Guidance, Navigation and Control (Volume 2) on Advances in Guidance, Navigation and Control. Singapore: Springer, 2025: 574-585.
|
| 11 |
徐宗学, 陈浩, 任梅芳, 等. 中国城市洪涝致灾机理与风险评估研究进展[J]. 水科学进展, 2020, 31 (5): 713- 724.
|
| 12 |
|
| 13 |
|
| 14 |
|
| 15 |
|
| 16 |
|
| 17 |
黄国如, 罗海婉, 陈文杰, 等. 广州东濠涌流域城市洪涝灾害情景模拟与风险评估[J]. 水科学进展, 2019, 30 (5): 643- 652.
|
| 18 |
|
| 19 |
王德运, 张露丹, 吴祈. 基于社交媒体数据的城市暴雨洪涝灾害风险评估: 以郑州市"7·20"暴雨事件为例[J]. 安全与环境工程, 2024, 31 (3): 11-22, 46.
|
| 20 |
|
| 21 |
|
| 22 |
|
| 23 |
张颖, 杨晓婷, 韩业凡, 等. 暴雨洪涝灾害转移安置人数的组合预测模型研究[J]. 中国安全生产科学技术, 2024, 20 (3): 172- 180.
|
| 24 |
|
| 25 |
|
| 26 |
|
| 27 |
|
| 28 |
|
| 29 |
|
| 30 |
崔玫意, 张玉虎, 陈秋华. Box-Cox正态分布及其在降雨极值分析中的应用[J]. 数理统计与管理, 2017, 36 (1): 8- 17.
|
| 31 |
CHEN T Q, GUESTRIN C. XGBoost: A scalable tree boosting system[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA: Association for Computing Machinery, 2016: 785-794.
|
数据可用性说明
本文所有数据可以在合理的要求下联系第一作者后提供。
/
| 〈 |
|
〉 |