计算机科学与技术

集成加权软投票的众包真值推理方法

  • 张桦 ,
  • 沈菲 ,
  • 蒋世豪 ,
  • 张灵均 ,
  • 徐宏
展开
  • 1. 杭州电子科技大学 计算机学院, 杭州 310018;
    2. 浙江大学 浙江省网络多媒体技术研究重点实验室, 杭州 310058;
    3. 浙江省脑机协同智能重点实验室, 杭州 310018
张桦(1980-),女,副教授

收稿日期: 2020-12-31

  网络出版日期: 2022-01-22

基金资助

国家重点研发计划项目(2017YFE0118200);浙江省重点研发计划项目(2019C01124);国家自然科学基金青年科学基金项目(61802094)

Ensemble weighted soft voting truth inference method for crowdsourcing

  • ZHANG Hua ,
  • SHEN Fei ,
  • JIANG Shihao ,
  • ZHANG Lingjun ,
  • XU Hong
Expand
  • 1. School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China;
    2. Key Laboratory of Network Multimedia Technology of Zhejiang Province, Zhejiang University, Hangzhou 310058, China;
    3. Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China

Received date: 2020-12-31

  Online published: 2022-01-22

摘要

为了提高众包的质量并获取高质量的标注数据,已有许多基于真值推理的众包标注方法被提出。传统真值推理利用多噪声标签作为输入,通过聚合策略推断出真实标签,该过程只考虑了实例的标签而忽略了实例的特征,且忽略了不同工人对不同实例的标注质量。该文引入实例的特征以最大程度地挖掘实例中蕴含的有用信息。通过计算众包实例属于每一个类别的概率,得到新划分的众包数据集;提出一种基于元学习的集成分类器,在新数据集上训练并利用相似度计算得到工人权重,即工人对不同的实例有不同的标注能力;在投票模型的基础上引入工人权重,提出加权软投票的方法用于推测标签。在公开数据集和创建的数据集上的实验结果表明,所提出的方法优于现有的真值推理算法。

关键词: 众包; 特征; 元学习; 分类

本文引用格式

张桦 , 沈菲 , 蒋世豪 , 张灵均 , 徐宏 . 集成加权软投票的众包真值推理方法[J]. 清华大学学报(自然科学版), 2022 , 62(2) : 347 -354 . DOI: 10.16511/j.cnki.qhdxxb.2021.22.015

Abstract

Many truth inference methods have been proposed to improve crowdsourcing quality and to obtain high-quality annotated data. Traditional truth inference uses multiple noisy labels as inputs to deduce the real labels through an aggregation strategy. This paper introduces the features of the instances that most effectively mine the useful information contained in the instances. The probability that a crowdsourcing instance belongs to each category is used to divide the crowd-sourcing dataset. An integrated meta-learning classifier is trained on the new dataset to calculate a similarity degree to get worker weights that show each worker's annotation ability for different instances. Finally, a weighted soft voting method is used to predict the labels. Tests show that this method is superior to existing truth inference algorithms for public and constructed datasets.

参考文献

[1] LI Y L, GAO J, MENG C S, et al. A survey on truth discovery[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). San Francisco, USA, 2016:1-16.
[2] DAWID A P, SKENE A M. Maximum likelihood estimation of observer error-rates using the EM algorithm[J]. Applied Statistics, 1979, 28(1):20-28.
[3] ZHANG Y C, CHEN X, ZHOU D Y, et al. Spectral methods meet EM:A provably optimal algorithm for crowdsourcing[C]//Proceedings of 28th Annual Conference on Neural Information Processing Systems (NIPS). Montreal, Canada, 2014:1260-1268.
[4] DEMARTINI G, DIFALLAH D E, CUDRE'-MAUROUX P. ZenCrowd:Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]//Proceedings of the 21st World Wide Web Conference (WWW). New York, USA, 2012:469-478.
[5] ZHOU D Y, PLATT J C, BASU S, et al. Learning from the wisdom of crowds by minimax entropy[C]//Proceedings of 26th Annual Conference on Neural Information Processing Systems (NIPS). Lake Tahoe, USA, 2012:2195-2203.
[6] WELINDER P, BRANSON S, BELONGIE S, et al. The multidimensional wisdom of crowds[C]//Proceedings of 24th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada, 2010:2424-2432.
[7] ZHANG J, SHENG V S, WU J, et al. Multi-class ground truth inference in crowdsourcing with clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(4):1080-1085.
[8] RAYKAR V C, YU S P, ZHAO L H, et al. Learning from crowds[J]. The Journal of Machine Learning Research, 2010, 11:1297-1322.
[9] HOSPEDALES T, ANTONIOU A, MICAELLI P, et al. Meta-learning in neural networks:A survey[Z/OL]. arXiv:2004.05439v2, 2020.
[10] BRAZDIL P, GIRAUD CARRIER C, SOARES C, et al. Metalearning:Applications to data mining[M]. Berlin, Germany:Springer Science & Business Media, 2009.
[11] SALVADOR M M, BUDKA M, GABRYS B. Adapting multicomponent predictive systems using hybrid adaptation strategies with auto-WEKA in process industry[C]//Proceedings of the 33rd International Conference on Machine Learning (ICML). New York, USA, 2016:1-8.
[12] FINN C, XU K, LEVINE S. Probabilistic model-agnostic meta-learning[C]//Proceedings of 32nd Annual Conference on Neural Information Processing Systems (NIPS). Montreal, Canada, 2018:9516-9527.
[13] DIZAJI K G, HUANG H. Sentiment analysis via deep hybrid textual-crowd learning model[C]//Proceedings of 32nd AAAI Conference on Artificial Intelligence (AAAI). New Orleans, USA, 2018:1563-1570.
[14] ZHANG J, WU M, SHENG V S. Ensemble learning from crowds[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(8):1506-1519.
[15] TAO F N, JIANG L X, LI C Q. Label similarity-based weighted soft majority voting and pairing for crowdsourcing[J]. Knowledge and Information Systems, 2020, 62(7):2521-2538.
[16] WHITEHILL J, WU T F, BERGSMA J, et al. Whose vote should count more:Optimal integration of labels from labelers of unknown expertise[C]//Proceedings of 23rd Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada, 2009:2035-2043.
[17] ZHENG Y D, LI G L, LI Y B, et al. Truth inference in crowdsourcing:Is the problem solved?[J]. Proceedings of the VLDB Endowment, 2017, 10(5):541-552.
[18] ZHANG J, SHENG V S, NICHOLSON B, et al. CEKA:A tool for mining the wisdom of crowds[J]. The Journal of Machine Learning Research, 2015, 16(88):2853-2858.
文章导航

/