社交网络中的负面舆情事件具有不可低估的影响力, 针对基于情感分析的方法不能直接对负面网络舆情进行早期预警的问题, 该文提出了一种基于情感分类和主题提取的舆情主题建模方法, 通过研究消极情绪主题词实现对负面舆情事件统计和量化; 针对负面舆情预警方法即时性不足的问题, 构建网络舆情早期预警模型, 从爆发指数、 情绪指数、 传播指数3个指标综合评估舆情主题的发展态势, 设定舆情主题算数指数触发预警值, 实现主题词对应的负面舆情事件的早期预警。实验结果表明, 在COVID-19相关微博情感数据集TF-IDF权重排名前10的消极情绪主题词中, 最早预警时间比舆情暴发日平均提前161.01 h, 实现的早期预警平均为2.1次; 最早预警时间比舆情峰值日平均提前261.81 h, 平均早期预警5.8次。所提出的预警模型对社交网络舆情事件具有良好的早期预警效果。
Abstract
[Objective] The effect of negative public opinion events on social networks is underestimated. To address the issue of sentiment-based methods not being able to directly achieve early warning of negative online public opinion, this study proposes a sentiment classification and topic extraction-based approach to public opinion topic modeling. Using negative emotional topics as an entry point, this study shifts from investigating negative public opinion events to examining negative public opinion topics, thus facilitating statistical and quantifiable analysis of such events. Additionally, to address the persistent shortcomings of methods for negative public opinion early warning, we construct a novel early warning evaluation metric, which is known as the public opinion topic arithmetic index (POI). This index comprehensively assesses the developmental trends of public opinion topics across three dimensions: explosion index (EI), sentiment index (SI), and dissemination index (DI). [Methods] This study employs the ERNIE 3.0 large-scale language model for sentiment classification. The annotated sentiment dataset is further trained and fine-tuned to obtain the required sentiment classifier. It performs sentiment classification on a COVID-19 Weibo emotional dataset, computing various post sentiments. The topic extraction module uses the TF-IDF algorithm to extract topics. Each noun tag is considered a potential topic, whereas each Weibo post is treated as a document. The TF-IDF method captures frequently occurring words by calculating their frequencies and avoiding less important terms that appear in each document. The TF-IDF topic extraction algorithm extracts topics from negative emotional Weibo posts and identifies relevant topics associated with negative public opinion events. Finally, POI is employed for further analysis based on the extracted public opinion topics. Consequently, early warning is achieved by analyzing negative public opinion topics instead of events. Furthermore, POI comprehensively calculates the effect of negative public opinion topics by combining EI, SI, and DI. EI reflects the growth rate of the current number of textual instances related to negative emotional topics compared to the average number in a previous period; SI mainly reflects the public's emotions and sentiments triggered by public opinion topics; and DI mainly represents the scope and speed of dissemination of public opinion topics. Finally, a comprehensive negative emotional topic public opinion index is derived by calculating the EI, SI, and DI of emotional topics and postdata information, and the topics that exceeded the warning threshold are warned. [Results] The experimental results reveal that the proposed early warning model effectively predicts social media public opinion events. Among the top ten negatively perceived topics ranked based on weight, the earliest warning time exceeds the average outbreak day by 161.01 hours, with an average of 2.1 early warnings. Additionally, the earliest warning time exceeds the average peak day by 261.81 hours, with an average of 5.8 early warnings. [Conclusions] We establish a threshold for triggering the arithmetic index of public opinion topics by modeling and calculating the arithmetic index of negative public opinion topics in this study. This enables us to exclude negative topics and corresponding public opinion events that surpass the threshold, thereby achieving early warning for topic-related negative public opinion events. The proposed negative public opinion warning model accomplishes its intended objective by employing sentiment analysis methods for the early detection of online public opinions.
关键词
网络舆情 /
情感分类 /
主题提取 /
舆情指数 /
早期预警
Key words
online public opinion /
sentiment classification /
topic extraction /
public opinion index /
early warning form
{{custom_sec.title}}
{{custom_sec.title}}
{{custom_sec.content}}
参考文献
[1] 李静怡. 中国高校网络舆情的传播与应对策略研究[D]. 济南: 山东大学, 2021. LI J Y. Research on internet public opinion communication and countermeasures of Chinese universities [D]. Ji’nan: Shandong University, 2021. (in Chinese)
[2] LIAN Y, DONG X F, LIU Y J. Topological evolution of the internet public opinion [J]. Physica A: Statistical Mechanics and its Applications, 2017, 486: 567-578.
[3] 李雪. 双黄连抢购事件下的舆情传播网络特征及演化博弈研究[D]. 西安: 西安电子科技大学, 2022. LI X. Research on network characteristics and evolutionary game of public opinion dissemination under Shuanghuanglian panic buying event [D]. Xi’an: Xidian University, 2022. (in Chinese)
[4] KUMAR P, HAMA S, OMIDVARBORNA H, et al. Temporary reduction in fine particulate matter due to ‘anthropogenic emissions switch-off’ during COVID-19 lockdown in Indian cities [J]. Sustainable Cities and Society, 2020, 62: 102382.
[5] YAN L, ZHANG H T, GONCALVES J, et al. An interpretable mortality prediction model for COVID-19 patients [J]. Nature Machine Intelligence, 2020, 2(5): 283-288.
[6] GALLOTTI R, VALLE F, CASTALDO N, et al. Assessing the risks of ‘infodemics’ in response to COVID-19 epidemics [J]. Nature Human Behaviour, 2020, 4(12): 1285-1293.
[7] SHEN L, XU M H. Student public opinion management in campus commentary based on deep learning [J]. Wireless Communications and Mobile Computing, 2022, 2022: 2130391.
[8] 武兴隆. 基于主体关系情感的社交网络事件热度的预测[D]. 哈尔滨: 哈尔滨工业大学, 2018. WU X L. Prediction for popularity of events in social network based on subject relationship emotion [D]. Harbin: Harbin Institute of Technology, 2018. (in Chinese)
[9] CUI S, HAN Y, DUAN Y, et al. A two-stage voting-boosting technique for ensemble learning in social network sentiment classification[J]. Entropy, 2023, 25(4): 555.
[10] RAO D N, MIAO X, JIANG Z H, et al. STANKER: Stacking network based on level-grained attention-masked BERT for rumor detection on social media [C]//Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. EMNLP, 2021: 3347-3363.
[11] SHAH A M, YAN X B, QAYYUM A, et al. Mining topic and sentiment dynamics in physician rating websites during the early wave of the COVID-19 pandemic: Machine learning approach [J]. International Journal of Medical Informatics, 2021, 149: 104434.
[12] LI J H, PAN F Y, ZHOU B, et al. Research on the early warning of college online public opinion under the big data environment [C]//Proceedings of the IEEE 3rd International Conference on Computer and Communication Engineering Technology. Beijing, China: IEEE, 2020: 14-18.
[13] 张瑜, 李兵, 刘晨玥. 面向主题的微博热门话题舆情监测研究: 以“北京单双号限行常态化”舆情分析为例[J]. 中文信息学报, 2015, 29(5): 143-151, 159. ZHANG Y, LI B, LIU C Y. Topic-oriented monitoring of public sentiment towards popular Weibo events: A case study on “Regular ‘Odd-even’ vehicle restriction in Beijing” [J]. Journal of Chinese Information Processing, 2015, 29(5): 143-151, 159. (in Chinese)
[14] HU H J, WEI Y H, ZHOU Y. Product-harm crisis intelligent warning system design based on fine-grained sentiment analysis of automobile complaints [J]. Complex & Intelligent Systems, 2023, 9(3): 2313-2320.
[15] 程晏. 基于LDA模型的地铁投诉文本挖掘及满意度评价研究[D]. 北京: 北京交通大学, 2020. CHENG Y, Research on subway complaint text mining and satisfaction evaluation based on LDA model [D]. Beijing: Beijing Jiaotong University, 2020. (in Chinese)
[16] ZHU R B, DING Q A, YU M, et al. Early warning scheme of COVID-19 related internet public opinion based on RVM-L model [J]. Sustainable Cities and Society, 2021, 74: 103141.
[17] BRAVO-MARQUEZ F, MENDOZA M, POBLETE B. Combining strengths, emotions and polarities for boosting Twitter sentiment analysis [C]//Proceedings of the Second International Workshop On Issues of Sentiment Discovery and Opinion Mining. Chicago, USA: ACM, 2013: 2.
[18] DAVIDOV D, TSUR O, RAPPOPORT A. Enhanced sentiment learning using Twitter hashtags and smileys [C]//Proceedings of the 23rd International Conference on Computational Linguistics: Posters. Beijing, China: ACL, 2010: 241-249.
[19] MOHAMMAD S, KIRITCHENKO S, ZHU X D. NRC-Canada: Building the state-of-the-art in sentiment analysis of tweets [C]//Proceedings of the Seventh International Workshop on Semantic Evaluation. Atlanta: ACL, 2013.
[20] 王晰巍, 张柳, 文晴, 等. 基于贝叶斯模型的移动环境下网络舆情用户情感演化研究: 以新浪微博“里约奥运会中国女排夺冠”话题为例[J]. 情报学报, 2018, 37(12): 1241-1248. WANG X W, ZHANG L, WEN Q, et al. Research on sentiment evaluation of online public opinion based on the Bayesian model in a mobile environment: The case of “China women's volleyball won the championship in the Rio Olympics” in SinaWeibo [J]. Journal of the China Society for Scientific and Technical Information, 2018, 37(12): 1241-1248. (in Chinese)
[21] 李卫疆, 漆芳, 余正涛. 基于多通道特征和自注意力的情感分类方法[J]. 软件学报, 2021, 32(9): 2783-2800. LI W J, QI F, YU Z T. Sentiment classification method based on multi-channel features and self-attention [J]. Journal of Software, 2021, 32(9): 2783-2800. (in Chinese)
[22] CHOUDRIE J, PATIL S, KOTECHA K, et al. Applying and understanding an advanced, novel deep learning approach: A COVID 19, text based, emotions analysis study [J]. Information Systems Frontiers, 2021, 23(6): 1431-1465.
[23] LIU Y, OTT M, GOYAL N, et al. RoBERTa: A robustly optimized BERT pretraining approach [Z/OL]. arXiv preprint. arXiv: 1907.11692, 2019.
[24] LAU J H, COLLIER N, BALDWIN T. On-line trend analysis with topic models: Twitter trends detection topic model online [C]//Proceedings of COLING 2012. Mumbai, India, 2012: 1519-1534.
[25] 石磊, 杜军平, 梁美玉. 基于RNN和主题模型的社交网络突发话题发现[J]. 通信学报, 2018, 39(4): 189-198. SHI L, DU J P, LIANG M Y. Social network bursty topic discovery based on RNN and topic model [J]. Journal on Communications, 2018, 39(4): 189-198. (in Chinese)
[26] IFTENE A, GINSCA A L. Using opinion mining techniques for early crisis detection [J]. International Journal of Computers Communications & Control, 2014, 7(5): 857-864.
[27] HAN X H, WANG J L, ZHANG M, et al. Using social media to mine and analyze public opinion related to COVID-19 in China [J]. International Journal of Environmental Research and Public Health, 2020, 17(8): 2788.
[28] XU W, LIU L Y, SHANG W. Leveraging cross-media analytics to detect events and mine opinions for emergency management [J]. Online Information Review, 2017, 41(4): 487-506.
[29] SARIRETE A. Sentiment analysis tracking of COVID-19 vaccine through Tweets [J]. Journal of Ambient Intelligence and Humanized Computing, 2022: 1-9.
[30] DEL VICARIO M, QUATTROCIOCCHI W, SCALA A, et al. Polarization and fake news: Early warning of potential misinformation targets [J]. ACM Transactions on the Web, 2019, 13(2): 10.
[31] SUN Y, WANG S H, FENG S K, et al. ERNIE 3.0: Large-scale knowledge enhanced pre-training for language understanding and generation [Z/OL]. arXiv preprint. arXiv: 2107.02137, 2021.
[32] World Health Organization. Coronavirus disease (COVID-19): events as they happen [EB/OL]. [2020- 03-22]. https://www.who.int/emergencies/diseases/novel-coronavirus-2019/events-as-they-happen.
[33] 贾亚敏, 安璐, 李纲. 城市突发事件网络信息传播时序变化规律研究[J]. 情报杂志, 2015, 34(4): 91-96, 90. JIA Y M, AN L, LI G. On the online information dissemination pattern of city emergencies [J]. Journal of Intelligence, 2015, 34(4): 91-96, 90. (in Chinese)
基金
国家社会科学基金西部项目(20XTQ007);国家自然科学基金面上项目(61572521)