Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2015, Vol. 55 Issue (5): 497-502    
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于多粒度计算和多准则融合的情感分类
王丙坤, 黄永峰, 李星
清华大学 电子工程系, 北京 100084
Sentiment classification based on multi-granularity computing and multi-criteria fusion
WANG Bingkun, HUANG Yongfeng, LI Xing
Department of Electronic Engineering, Tsinghua University, Beijing 100084, China
全文: PDF(1167 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 随着在线用户生成内容的激增, 无监督情感分类方法有着广泛应用前景。现有基于情感词的无监督情感分类方法没有考虑句子类型和句间关系对情感分类的影响,分类效果较差; 基于自学习的无监督情感分类方法在生成伪标注数据集时, 又会引入较多错误。针对上述问题, 该文提出了一种基于多粒度计算和多准则融合的无监督情感分类方法。该方法通过多粒度计算, 提高现有基于情感词的无监督情感分类精度; 同时通过多准则融合来减少伪标注数据错误率。在3个真实中文数据集上的实验结果表明: 与现有无监督情感分类方法相比, 该方法平均提高了6.5%的分类精度。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
王丙坤
黄永峰
李星
关键词 情感分类无监督方法多粒度计算多准则融合    
Abstract:The large amount of online user-generated content on the Web has created a need for unsupervised sentiment classification methods. Unsupervised sentiment classification methods based on sentiment words do not work well because the complex sentence structures and sentence types are seldom taken into account. Unsupervised sentiment classification methods based on self-learning have many errors when generating pseudo-labelled datasets. These limitations are reduced by the current method based on multi-granularity computing and multi-criteria fusion. The multi-granularity computing improves the accuracy of unsupervised sentiment classification methods based on sentiment words. The multi-criteria fusion reduces the number of errors in the pseudo-labelled data from the self-learning. Tests using three real Chinese review datasets show that the classification accuracy is 6.5% more accurate on average than with existing unsupervised sentiment classification methods.
Key wordssentiment classification    unsupervised methods    multi-granularity computing    multi-criteria fusion
收稿日期: 2014-12-25      出版日期: 2015-05-15
ZTFLH:  TP391.1  
通讯作者: 黄永峰,教授,E-mail:yfhuang@tsinghua.edu.cn     E-mail: yfhuang@tsinghua.edu.cn
引用本文:   
王丙坤, 黄永峰, 李星. 基于多粒度计算和多准则融合的情感分类[J]. 清华大学学报(自然科学版), 2015, 55(5): 497-502.
WANG Bingkun, HUANG Yongfeng, LI Xing. Sentiment classification based on multi-granularity computing and multi-criteria fusion. Journal of Tsinghua University(Science and Technology), 2015, 55(5): 497-502.
链接本文:  
http://jst.tsinghuajournals.com/CN/  或          http://jst.tsinghuajournals.com/CN/Y2015/V55/I5/497
  图1 基于多粒度计算和多准则融合的情感分类框架
  表1 情感短语的构造规则
  表2 不同程度副词的加权系数
  表3 不同句子类型的处理方法
  表4 句子间关系的处理方法
  表5 评论数据集的分布
  表6 不同情感分类方法性能比较
  图2 不同大小初始伪标注数据下本文方法的性能
[1] Pang B, Lee L L. Opinion mining and sentiment analysis [J]. Foundations and Trends in Information Retrieval, 2008, 2(1-2): 1-135.
[2] LIU Bing. Sentiment analysis and opinion mining [J]. Synthesis Lectures on Human Language Technologies, 2012, 5(1): 1-167.
[3] ZHANG Pu, HE Zhongshi. A weakly supervised approach to Chinese sentiment classification using partitioned self-training [J]. Journal of Information Science, 2013, 39(6): 815-831.
[4] Pang B, Lee L L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques [C]//Proceedings of Conference on Empirical Methods in Natural Language Processing. Philadelphia, USA: ACL, 2002: 79-86.
[5] XIAO Min, GUO Yuhong. Feature space independent semi-supervised domain adaptation via kernel matching [J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2015, 37(1): 54-66.
[6] Pan S J, Ni X C, Sun J T, et al. Cross-domain sentiment classification via spectral feature alignment [C]//Proceedings of the 19th International Conference on World Wide Web. New York, NY, USA: ACM, 2010: 751-760.
[7] LI Shoushan, WANG Zhongqing, ZHOU Guodong. Semi-supervised learning for imbalanced sentiment classification [C]//Proceedings of the Twenty-Second international joint conference on Artificial Intelligence. Barcelona, Spanish: AAAI, 2011: 1826-1831.
[8] WAN Xiaojun. Bilingual co-training for sentiment classification of Chinese product reviews [J]. Computational Linguistics, 2011, 37(3): 587-616.
[9] Turney P D. Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews [C]//The 40th annual meeting of the Association for Computational Linguistics. Philadelphia, USA: ACL, 2002: 417-424.
[10] Ku L W, Lee L Y, Chen H H. Opinion extraction, summarization and tracking in news and blog corpora [C]//Proceedings ofAAAI-CAAW-06, the Spring Symposia on Computational Approaches to Analyzing Weblogs. Stanford, USA: AAAI, 2006.
[11] Taboada M, Brooke J, Tofiloski M, et al. Lexicon-based methods for sentiment analysis [J]. Computational Linguistics, 2011, 37(2): 267-307.
[12] TAN Songbo, WANG Yuefen, CHENG Xueqi. Combining learn-based and lexicon-based techniques for sentiment detection without using labeled examples [C]//Proceedings of the SIGIR. New York, NY, USA: ACM, 2008: 743-744.
[13] WANG Bingkun, MIN Yulin, HUANG Yongfeng, et al. Chinese reviews sentiment classification based on quantified sentiment lexicon and fuzzy set [C]//2013 International Conference on Information Science and Technology. YangZhou, China: IEEE, 2013: 677-680.null
[1] 阿不都萨拉木·达吾提, 于斯音·于苏普, 艾斯卡尔·艾木都拉. 类别区分词与情感词典相结合的维吾尔文句子情感分类[J]. 清华大学学报(自然科学版), 2017, 57(2): 197-201.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn