有效的事故致因分析是预防煤矿事故发生的有效途径, 由于人工分析事故时受人员主观影响较强, 且面对海量的事故和风险文本数据, 人工分析存在局限, 因此该文针对煤矿领域, 基于集成命名实体识别(named entity recognition, NER)、 语义依存分析(semantic dependency parsing, SDP)、 文本分类(text classification, TC)和事故致因“2-4”模型(24Model), 提出了一种煤矿事故原因智能分析方法。该文首先利用NER识别事故文本中的主要实体信息, 结合SDP识别实体信息之间的语义关系, 提取个体不安全动作和组织原因的文本表示模式; 其次, 利用TC构建了个体能力原因分类模型, 用于识别个体能力方面的因素; 最后, 开发了相关应用程序, 将所提方法应用于现场事故案例分析和学习。研究结果表明: 该文构建的NER和TC模型精确率均较高, 结合SDP能自动根据24Model分析和梳理事故原因, 并识别动作的发出者、 作业工序和物资设备等信息。该文所提方法可促进事故致因理论在煤矿企业的应用, 提升事故案例分析和学习的有效性, 从而预防相关事故发生。
Abstract
[Objective] An effective causal analysis of accidents is essential for learning from and preventing coal mine accidents. Manual analysis of accidents is strongly influenced by the subjectivity of the personnel involved and becomes inefficient for analyses involving large volumes of accident and risk text data. Although considerable research has been conducted in the area of accident text mining, most studies directly apply data mining techniques to extract accident information and factors from texts without considering accident causation theories. This approach leads to results that lack systematic and logical coherence. [Methods] To address the aforementioned issues, this paper proposes a method for the intelligent identification of accident causes in the coal mining sector. This method integrates entity recognition, semantic dependency analysis, text classification, and the accident causation “2-4” model (24Model). Specific implementation steps for this method are also provided. Accident causation theory is crucial for ensuring the effectiveness and scientific validity of accident analysis. This paper introduces the 24Model as a theoretical basis for accident cause identification, and the advantages of the model in the intelligent analysis of accident causes are highlighted. Entity recognition technology is employed to identify key entity information in accident texts, including information on personnel, organizational structures, accidents, abnormal characteristics, values, safety management, building facilities, environments, equipment and materials, safety policies, procedural documents, and operational processes. To effectively identify this information, this paper integrates the bidirectional encoder representations from transformers(BERT)-bidirectional long short-term memory(BiLSTM)-conditional random fields(CRF) model and trains the combined model using 660 accident texts. This paper utilizes semantic dependency analysis technology to identify the semantic relationships among entity information. Text representation patterns were extracted according to the definitions of unsafe individual actions and organizational factors by the 24Model, and these definitions were used to determine the types of accident causes. This paper utilizes a text classification method to develop a model for identifying individual capability-related causes, and the focus is on five aspects: knowledge, awareness, habits, and psychological and physiological factors. The text classification model was based on BERT. This paper evaluates the accuracy of the proposed method in identifying accident causes by comparing both the entity recognition model and the text classification model with similar models. Test cases were selected, and the results of accident cause analysis via the proposed method were compared with those from manual analysis. Additionally, this paper develops an application based on the proposed method to facilitate the analysis and learning of onsite accident cases by employees of coal mining enterprises. [Results] This research results showed that the precision rates of the trained entity recognition model and the text classification model reached 95.42% and 96.11%, respectively. Additionally, the accuracy of the accident cause identification method, when combined with semantic dependency analysis, reached 73.09%. [Conclusions] The contributions of this paper are as follows: (1) Integration of the definition and concept of the 24Model and the automatic identification of unsafe behaviors according to the 24Model. This approach helps avoid the strong subjectivity and inconsistency often present in accident analysis conducted by different personnel. (2) Further identification of the actors of actions, operational procedures, materials, and equipment. (3) Fusion of multimodel algorithms to identify the causes of accidents, allowing for the rapid analysis of a large number of accidents. (4) Facilitating the application of accident causation theory in coal mining enterprises, enhancing the effectiveness of accident case analysis and learning, thereby achieving the objective of preventing related accidents.
关键词
事故致因分析 /
命名实体识别 /
语义依存分析 /
文本分类 /
事故致因“2-4”模型
Key words
accident causation analysis /
named entity recognition /
semantic dependency parsing /
text classification /
accident causation “2-4” model
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] JACOBSSON A, EK Å, AKSELSSON R. Method for evaluating learning from incidents using the idea of “level of learning” [J]. Journal of Loss Prevention in the Process Industries, 2011, 24(4): 333-343.
[2] MOURA R, BEER M, PATELLI E, et al. Learning from major accidents to improve system design [J]. Safety Science, 2016, 84: 37-45.
[3] 吕千, 傅贵. 事故调查报告中事故原因分析的改进: 基于企业事故案例学习视角[J]. 中国安全生产科学技术, 2021, 17(10): 172-178. LÜ Q, FU G. Improvement of cause analysis in accident investigation reports: A perspective of enterprise accident cases learning [J]. Journal of Safety Science and Technology, 2021, 17(10): 172-178. (in Chinese)
[4] LI W J, ZHANG L B, LIANG W. An accident causation analysis and taxonomy (ACAT) model of complex industrial system from both system safety and control theory perspectives [J]. Safety Science, 2017, 92: 94-103.
[5] CHEN Q L, WOOD M, ZHAO J S. Case study of the Tianjin accident: Application of barrier and systems analysis to understand challenges to industry loss prevention in emerging economies [J]. Process Safety and Environmental Protection, 2019, 131: 178-188.
[6] FU G, XIE X C, JIA Q S, et al. The development history of accident causation models in the past 100 years: 24Model, a more modern accident causation model [J]. Process Safety and Environmental Protection, 2020, 134: 47-82.
[7] LI J, QIN Y R, WANG Z, et al. How to analyse the injury based on 24Model: A case study of coal mine gas explosion injury [J]. Injury Prevention, 2021, 27(6): 542-553.
[8] YANG L, WANG X, ZHU J Q, et al. Risk factors identification of unsafe acts in deep coal mine workers based on grounded theory and HFACS [J]. Frontiers in Public Health, 2022, 10: 1-15.
[9] QIAO W G, LI X C, LIU Q L. Systemic approaches to incident analysis in coal mines: Comparison of the STAMP, FRAM and “2-4” models [J]. Resources Policy, 2019, 63: 1-10.
[10] JIA Q S, FU G, XIE X C, et al. Enhancing accident cause analysis through text classification and accident causation theory: A case study of coal mine gas explosion accidents [J]. Process Safety and Environmental Protection, 2024, 185: 989-1002.
[11] GONCALVES FILHO A P, JUN G T, WATERSON P. Four studies, two methods, one accident: An examination of the reliability and validity of Accimap and STAMP for accident analysis [J]. Safety Science, 2019, 113: 310-317.
[12] OLSEN N S, SHORROCK S T. Evaluation of the HFACS-ADF safety classification system: Inter-coder consensus and intra-coder consistency [J]. Accident Analysis & Prevention, 2010, 42(2): 437-444.
[13] LYU Q, FU G, WANG Y X, et al. How accident causation theory can facilitate smart safety management: An application of the 24Model [J]. Process Safety and Environmental Protection, 2022, 162: 878-890.
[14] CHENG P X, XIAO W X, NING P S, et al. ARTCDP: An automated data platform for monitoring emerging patterns concerning road traffic crashes in China [J]. Accident Analysis & Prevention, 2022, 174: 1-10.
[15] FENG X Y, DAI Y Y, JI X, et al. Application of natural language processing in HAZOP reports [J]. Process Safety and Environmental Protection, 2021, 155: 41-48.
[16] GUO S Y, DING L Y, LUO H B, et al. A big-data-based platform of workers’ behavior: Observations from the field [J]. Accident Analysis & Prevention, 2016, 93: 299-309.
[17] SINGLE J I, SCHMIDT J, DENECKE J. Knowledge acquisition from chemical accident databases using an ontology-based method and natural language processing [J]. Safety Science, 2020, 129: 1-13.
[18] YAN K, WANG Y H, JIA L M, et al. A content-aware corpus-based model for analysis of marine accidents [J]. Accident Analysis & Prevention, 2023, 184: 2-19.
[19] MORAIS C, YUNG K L, JOHNSON K, et al. Identification of human errors and influencing factors: A machine learning approach [J]. Safety Science, 2022, 146: 1-15.
[20] 傅贵, 陈奕燃, 许素睿, 等. 事故致因“2-4”模型的内涵解析及第6版的研究[J]. 中国安全科学学报, 2022, 32(1): 12-19. FU G, CHEN Y R, XU S R, et al. Detailed explanations of 24Model and development of its 6th version [J]. China Safety Science Journal, 2022, 32(1): 12-19. (in Chinese)
[21] HEINRICH H W. Industrial accident prevention: A scientific approach [M]. 2nd ed. New York and London: McGraw-Hill Book Company, 1941.
[22] REASON J. Human error [M]. Cambridge: Cambridge University Press, 1990.
[23] STEWART J M. Managing for world class safety [M]. New York: John Wiley & Sons, 2002.
[24] 傅贵, 王秀明, 李亚. 事故致因“2-4”模型及其事故原因因素编码研究[J]. 安全与环境学报, 2017, 17(3): 1003-1008. FU G, WANG X M, LI Y. On the 24Model and the application of its causative codes to the analysis of the related accidents [J]. Journal of Safety and Environment, 2017, 17(3): 1003-1008. (in Chinese)
[25] CAI B Q, TIAN S W, YU L, et al. ATBBC: Named entity recognition in emergency domains based on joint BERT-BILSTM-CRF adversarial training [J]. Journal of Intelligent & Fuzzy Systems, 2024, 46(2): 4063-4076.
[26] RUANO-ORDÁS D, MÉNDEZ J R, BASTO FERNANDES V, et al. Novel tools for the management, representation, and exploitation of textual information [J]. Scientific Programming, 2021, 2021:1-3.
[27] WU X, LÜ S W, ZANG L J, et al. Conditional BERT contextual augmentation [C]//Proceedings of the 19th International Conference on Computational Science. Faro, Portugal: Springer, 2019: 84-95.
[28] 王昀, 胡珉, 塔娜, 等. 大语言模型及其在政务领域的应用[J]. 清华大学学报(自然科学版), 2024, 64(4): 649-658. WANG Y, HU M, TA N, et al. Large language models and their application in government affairs [J]. Journal of Tsinghua University (Science and Technology), 2024, 64(4): 649-658. (in Chinese)
[29] DU J L, MI W, DU X L. Chinese word segmentation in electronic medical record text via graph neural network-bidirectional LSTM-CRF model [C]//Proceedings of 2020 IEEE International Conference on Bioinformatics and Biomedicine. Seoul, South of Korea: IEEE, 2020: 985-989.
[30] GAO W C, ZHENG X H, ZHAO S S. Named entity recognition method of Chinese EMR based on BERT-BiLSTM-CRF [J]. Journal of Physics: Conference Series, 2021, 1848(1): 1-9.
[31] DUAN J Y, WANG B, TAN Z, et al. Chinese spelling check via bidirectional LSTM-CRF [C]//Proceedings of the 2019 IEEE 8th Joint International Information Technology and Artificial Intelligence Conference. Chongqing, China: IEEE, 2019: 1333-1336.
[32] 王立平, 张超, 蔡恩磊, 等. 面向自主工业软件的知识提取和知识库构建方法[J]. 清华大学学报(自然科学版), 2022, 62(5): 978-986. WANG L P, ZHANG C, CAI E L, et al. Knowledge extraction and knowledge base construction method from industrial software packages [J]. Journal of Tsinghua University (Science and Technology), 2022, 62(5): 978-986. (in Chinese)
[33] RUUSKA S, HÄMÄLÄINEN W, KAJAVA S, et al. Evaluation of the confusion matrix method in the validation of an automated system for measuring feeding behaviour of cattle [J]. Behavioural Processes, 2018, 148: 56-62.
[34] ZHANG M S. A survey of syntactic-semantic parsing based on constituent and dependency structures [J]. Science China Technological Sciences, 2020, 63(10): 1898-1920.
[35] NLPInBLCU. Chinese semantic dependency document [R/OL]. (2020-12-16) [2024-6-11]. https://github.com/NLPInBLCU/ChineseSemanticDependencyDocument.
[36] 袁毓林.论元角色的层级关系和语义特征[J].世界汉语教学, 2002(3): 10-22. YUAN Y L. On the hierarchical relation and semantic features of the thematic roles [J]. Chinese Teaching in the World, 2002(3): 10-22. (in Chinese)
[37] YAHAV I, SHEHORY O, SCHWARTZ D. Comments mining with TF-IDF: The inherent bias and its removal [J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(3): 437-450.
[38] YAO T J, ZHAI Z G, GAO B T. Text classification model based on fastText [C]//Proceedings of 2020 IEEE International Conference on Artificial Intelligence and Information Systems. Dalian, China: IEEE, 2020: 154-157.
[39] WANG F, DENG X, HOU L Q. Chinese news text multi classification based on naive Bayes algorithm [C]//Proceedings of the 2nd International Symposium on Computer Science and Intelligent Control. Stockholm, Sweden: ACM, 2018: 1-5.
[40] WU S Y, SU E T, LEI B Y, et al. TextCNN-based text classification for e-government [C]//Proceedings of the 20196th International Conference on Information Science and Control Engineering. Shanghai, China: IEEE, 2019: 929-934.
基金
国家自然科学基金项目(72204139);中国博士后科学基金项目(2023T160371)