集成加权软投票的众包真值推理方法

张桦, 沈菲, 蒋世豪, 张灵均, 徐宏

清华大学学报(自然科学版) ›› 2022, Vol. 62 ›› Issue (2) : 347-354.

PDF(2747 KB)
PDF(2747 KB)
清华大学学报(自然科学版) ›› 2022, Vol. 62 ›› Issue (2) : 347-354. DOI: 10.16511/j.cnki.qhdxxb.2021.22.015
计算机科学与技术

集成加权软投票的众包真值推理方法

  • 张桦1,2, 沈菲1, 蒋世豪1, 张灵均1,3, 徐宏1
作者信息 +

Ensemble weighted soft voting truth inference method for crowdsourcing

  • ZHANG Hua1,2, SHEN Fei1, JIANG Shihao1, ZHANG Lingjun1,3, XU Hong1
Author information +
文章历史 +

摘要

为了提高众包的质量并获取高质量的标注数据,已有许多基于真值推理的众包标注方法被提出。传统真值推理利用多噪声标签作为输入,通过聚合策略推断出真实标签,该过程只考虑了实例的标签而忽略了实例的特征,且忽略了不同工人对不同实例的标注质量。该文引入实例的特征以最大程度地挖掘实例中蕴含的有用信息。通过计算众包实例属于每一个类别的概率,得到新划分的众包数据集;提出一种基于元学习的集成分类器,在新数据集上训练并利用相似度计算得到工人权重,即工人对不同的实例有不同的标注能力;在投票模型的基础上引入工人权重,提出加权软投票的方法用于推测标签。在公开数据集和创建的数据集上的实验结果表明,所提出的方法优于现有的真值推理算法。

Abstract

Many truth inference methods have been proposed to improve crowdsourcing quality and to obtain high-quality annotated data. Traditional truth inference uses multiple noisy labels as inputs to deduce the real labels through an aggregation strategy. This paper introduces the features of the instances that most effectively mine the useful information contained in the instances. The probability that a crowdsourcing instance belongs to each category is used to divide the crowd-sourcing dataset. An integrated meta-learning classifier is trained on the new dataset to calculate a similarity degree to get worker weights that show each worker's annotation ability for different instances. Finally, a weighted soft voting method is used to predict the labels. Tests show that this method is superior to existing truth inference algorithms for public and constructed datasets.

关键词

众包 / 特征 / 元学习 / 分类

Key words

crowdsourcing / feature / meta-learning / classify

引用本文

导出引用
张桦, 沈菲, 蒋世豪, 张灵均, 徐宏. 集成加权软投票的众包真值推理方法[J]. 清华大学学报(自然科学版). 2022, 62(2): 347-354 https://doi.org/10.16511/j.cnki.qhdxxb.2021.22.015
ZHANG Hua, SHEN Fei, JIANG Shihao, ZHANG Lingjun, XU Hong. Ensemble weighted soft voting truth inference method for crowdsourcing[J]. Journal of Tsinghua University(Science and Technology). 2022, 62(2): 347-354 https://doi.org/10.16511/j.cnki.qhdxxb.2021.22.015

参考文献

[1] LI Y L, GAO J, MENG C S, et al. A survey on truth discovery[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). San Francisco, USA, 2016:1-16.
[2] DAWID A P, SKENE A M. Maximum likelihood estimation of observer error-rates using the EM algorithm[J]. Applied Statistics, 1979, 28(1):20-28.
[3] ZHANG Y C, CHEN X, ZHOU D Y, et al. Spectral methods meet EM:A provably optimal algorithm for crowdsourcing[C]//Proceedings of 28th Annual Conference on Neural Information Processing Systems (NIPS). Montreal, Canada, 2014:1260-1268.
[4] DEMARTINI G, DIFALLAH D E, CUDRE'-MAUROUX P. ZenCrowd:Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]//Proceedings of the 21st World Wide Web Conference (WWW). New York, USA, 2012:469-478.
[5] ZHOU D Y, PLATT J C, BASU S, et al. Learning from the wisdom of crowds by minimax entropy[C]//Proceedings of 26th Annual Conference on Neural Information Processing Systems (NIPS). Lake Tahoe, USA, 2012:2195-2203.
[6] WELINDER P, BRANSON S, BELONGIE S, et al. The multidimensional wisdom of crowds[C]//Proceedings of 24th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada, 2010:2424-2432.
[7] ZHANG J, SHENG V S, WU J, et al. Multi-class ground truth inference in crowdsourcing with clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(4):1080-1085.
[8] RAYKAR V C, YU S P, ZHAO L H, et al. Learning from crowds[J]. The Journal of Machine Learning Research, 2010, 11:1297-1322.
[9] HOSPEDALES T, ANTONIOU A, MICAELLI P, et al. Meta-learning in neural networks:A survey[Z/OL]. arXiv:2004.05439v2, 2020.
[10] BRAZDIL P, GIRAUD CARRIER C, SOARES C, et al. Metalearning:Applications to data mining[M]. Berlin, Germany:Springer Science & Business Media, 2009.
[11] SALVADOR M M, BUDKA M, GABRYS B. Adapting multicomponent predictive systems using hybrid adaptation strategies with auto-WEKA in process industry[C]//Proceedings of the 33rd International Conference on Machine Learning (ICML). New York, USA, 2016:1-8.
[12] FINN C, XU K, LEVINE S. Probabilistic model-agnostic meta-learning[C]//Proceedings of 32nd Annual Conference on Neural Information Processing Systems (NIPS). Montreal, Canada, 2018:9516-9527.
[13] DIZAJI K G, HUANG H. Sentiment analysis via deep hybrid textual-crowd learning model[C]//Proceedings of 32nd AAAI Conference on Artificial Intelligence (AAAI). New Orleans, USA, 2018:1563-1570.
[14] ZHANG J, WU M, SHENG V S. Ensemble learning from crowds[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(8):1506-1519.
[15] TAO F N, JIANG L X, LI C Q. Label similarity-based weighted soft majority voting and pairing for crowdsourcing[J]. Knowledge and Information Systems, 2020, 62(7):2521-2538.
[16] WHITEHILL J, WU T F, BERGSMA J, et al. Whose vote should count more:Optimal integration of labels from labelers of unknown expertise[C]//Proceedings of 23rd Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada, 2009:2035-2043.
[17] ZHENG Y D, LI G L, LI Y B, et al. Truth inference in crowdsourcing:Is the problem solved?[J]. Proceedings of the VLDB Endowment, 2017, 10(5):541-552.
[18] ZHANG J, SHENG V S, NICHOLSON B, et al. CEKA:A tool for mining the wisdom of crowds[J]. The Journal of Machine Learning Research, 2015, 16(88):2853-2858.

基金

国家重点研发计划项目(2017YFE0118200);浙江省重点研发计划项目(2019C01124);国家自然科学基金青年科学基金项目(61802094)

PDF(2747 KB)

Accesses

Citation

Detail

段落导航
相关文章

/