Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2022, Vol. 62 Issue (2): 347-354    DOI: 10.16511/j.cnki.qhdxxb.2021.22.015
  计算机科学与技术 本期目录 | 过刊浏览 | 高级检索 |
集成加权软投票的众包真值推理方法
张桦1,2, 沈菲1, 蒋世豪1, 张灵均1,3, 徐宏1
1. 杭州电子科技大学 计算机学院, 杭州 310018;
2. 浙江大学 浙江省网络多媒体技术研究重点实验室, 杭州 310058;
3. 浙江省脑机协同智能重点实验室, 杭州 310018
Ensemble weighted soft voting truth inference method for crowdsourcing
ZHANG Hua1,2, SHEN Fei1, JIANG Shihao1, ZHANG Lingjun1,3, XU Hong1
1. School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018, China;
2. Key Laboratory of Network Multimedia Technology of Zhejiang Province, Zhejiang University, Hangzhou 310058, China;
3. Key Laboratory of Brain Machine Collaborative Intelligence of Zhejiang Province, Hangzhou 310018, China
全文: PDF(2747 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 为了提高众包的质量并获取高质量的标注数据,已有许多基于真值推理的众包标注方法被提出。传统真值推理利用多噪声标签作为输入,通过聚合策略推断出真实标签,该过程只考虑了实例的标签而忽略了实例的特征,且忽略了不同工人对不同实例的标注质量。该文引入实例的特征以最大程度地挖掘实例中蕴含的有用信息。通过计算众包实例属于每一个类别的概率,得到新划分的众包数据集;提出一种基于元学习的集成分类器,在新数据集上训练并利用相似度计算得到工人权重,即工人对不同的实例有不同的标注能力;在投票模型的基础上引入工人权重,提出加权软投票的方法用于推测标签。在公开数据集和创建的数据集上的实验结果表明,所提出的方法优于现有的真值推理算法。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
张桦
沈菲
蒋世豪
张灵均
徐宏
关键词 众包特征元学习分类    
Abstract:Many truth inference methods have been proposed to improve crowdsourcing quality and to obtain high-quality annotated data. Traditional truth inference uses multiple noisy labels as inputs to deduce the real labels through an aggregation strategy. This paper introduces the features of the instances that most effectively mine the useful information contained in the instances. The probability that a crowdsourcing instance belongs to each category is used to divide the crowd-sourcing dataset. An integrated meta-learning classifier is trained on the new dataset to calculate a similarity degree to get worker weights that show each worker's annotation ability for different instances. Finally, a weighted soft voting method is used to predict the labels. Tests show that this method is superior to existing truth inference algorithms for public and constructed datasets.
Key wordscrowdsourcing    feature    meta-learning    classify
收稿日期: 2020-12-31      出版日期: 2022-01-22
基金资助:国家重点研发计划项目(2017YFE0118200);浙江省重点研发计划项目(2019C01124);国家自然科学基金青年科学基金项目(61802094)
通讯作者: 张灵均,讲师,E-mail:zhanglingjun@hdu.edu.cn      E-mail: zhanglingjun@hdu.edu.cn
作者简介: 张桦(1980-),女,副教授
引用本文:   
张桦, 沈菲, 蒋世豪, 张灵均, 徐宏. 集成加权软投票的众包真值推理方法[J]. 清华大学学报(自然科学版), 2022, 62(2): 347-354.
ZHANG Hua, SHEN Fei, JIANG Shihao, ZHANG Lingjun, XU Hong. Ensemble weighted soft voting truth inference method for crowdsourcing. Journal of Tsinghua University(Science and Technology), 2022, 62(2): 347-354.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2021.22.015  或          http://jst.tsinghuajournals.com/CN/Y2022/V62/I2/347
  
  
  
  
  
  
  
[1] LI Y L, GAO J, MENG C S, et al. A survey on truth discovery[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). San Francisco, USA, 2016:1-16.
[2] DAWID A P, SKENE A M. Maximum likelihood estimation of observer error-rates using the EM algorithm[J]. Applied Statistics, 1979, 28(1):20-28.
[3] ZHANG Y C, CHEN X, ZHOU D Y, et al. Spectral methods meet EM:A provably optimal algorithm for crowdsourcing[C]//Proceedings of 28th Annual Conference on Neural Information Processing Systems (NIPS). Montreal, Canada, 2014:1260-1268.
[4] DEMARTINI G, DIFALLAH D E, CUDRE'-MAUROUX P. ZenCrowd:Leveraging probabilistic reasoning and crowdsourcing techniques for large-scale entity linking[C]//Proceedings of the 21st World Wide Web Conference (WWW). New York, USA, 2012:469-478.
[5] ZHOU D Y, PLATT J C, BASU S, et al. Learning from the wisdom of crowds by minimax entropy[C]//Proceedings of 26th Annual Conference on Neural Information Processing Systems (NIPS). Lake Tahoe, USA, 2012:2195-2203.
[6] WELINDER P, BRANSON S, BELONGIE S, et al. The multidimensional wisdom of crowds[C]//Proceedings of 24th Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada, 2010:2424-2432.
[7] ZHANG J, SHENG V S, WU J, et al. Multi-class ground truth inference in crowdsourcing with clustering[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(4):1080-1085.
[8] RAYKAR V C, YU S P, ZHAO L H, et al. Learning from crowds[J]. The Journal of Machine Learning Research, 2010, 11:1297-1322.
[9] HOSPEDALES T, ANTONIOU A, MICAELLI P, et al. Meta-learning in neural networks:A survey[Z/OL]. arXiv:2004.05439v2, 2020.
[10] BRAZDIL P, GIRAUD CARRIER C, SOARES C, et al. Metalearning:Applications to data mining[M]. Berlin, Germany:Springer Science & Business Media, 2009.
[11] SALVADOR M M, BUDKA M, GABRYS B. Adapting multicomponent predictive systems using hybrid adaptation strategies with auto-WEKA in process industry[C]//Proceedings of the 33rd International Conference on Machine Learning (ICML). New York, USA, 2016:1-8.
[12] FINN C, XU K, LEVINE S. Probabilistic model-agnostic meta-learning[C]//Proceedings of 32nd Annual Conference on Neural Information Processing Systems (NIPS). Montreal, Canada, 2018:9516-9527.
[13] DIZAJI K G, HUANG H. Sentiment analysis via deep hybrid textual-crowd learning model[C]//Proceedings of 32nd AAAI Conference on Artificial Intelligence (AAAI). New Orleans, USA, 2018:1563-1570.
[14] ZHANG J, WU M, SHENG V S. Ensemble learning from crowds[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(8):1506-1519.
[15] TAO F N, JIANG L X, LI C Q. Label similarity-based weighted soft majority voting and pairing for crowdsourcing[J]. Knowledge and Information Systems, 2020, 62(7):2521-2538.
[16] WHITEHILL J, WU T F, BERGSMA J, et al. Whose vote should count more:Optimal integration of labels from labelers of unknown expertise[C]//Proceedings of 23rd Annual Conference on Neural Information Processing Systems (NIPS). Vancouver, Canada, 2009:2035-2043.
[17] ZHENG Y D, LI G L, LI Y B, et al. Truth inference in crowdsourcing:Is the problem solved?[J]. Proceedings of the VLDB Endowment, 2017, 10(5):541-552.
[18] ZHANG J, SHENG V S, NICHOLSON B, et al. CEKA:A tool for mining the wisdom of crowds[J]. The Journal of Machine Learning Research, 2015, 16(88):2853-2858.
[1] 赵传君, 武美龄, 申利华, 上官学奎, 王彦婕, 李杰, 王素格, 李德玉. 基于句法结构迁移和领域融合的跨领域情感分类[J]. 清华大学学报(自然科学版), 2023, 63(9): 1380-1389.
[2] 马壮林, 杨兴, 胡大伟, 谭晓伟. 城市轨道交通车站客流特征影响程度分析[J]. 清华大学学报(自然科学版), 2023, 63(9): 1428-1439.
[3] 刘梅, 许林宇, 廖彬超, 黄玥诚, 孙成双. 基于数据驱动的施工安全隐患特征网络分析与预警策略[J]. 清华大学学报(自然科学版), 2023, 63(2): 191-200.
[4] 杨宏宇, 张梓锌, 张良. 基于并行特征提取和改进BiGRU的网络安全态势评估[J]. 清华大学学报(自然科学版), 2022, 62(5): 842-848.
[5] 宋宇波, 杨光, 陈立全, 胡爱群. 基于无痕嵌入的二维码不可见劫持攻击[J]. 清华大学学报(自然科学版), 2022, 62(5): 825-831.
[6] 王春艳, 张景翔, 龙洁, 刘毅. 基于面板数据回归模型的家庭水-能消费时空特征与影响因素[J]. 清华大学学报(自然科学版), 2022, 62(3): 614-626.
[7] 张东成, 强茂山, 江汉臣, 黄钰洁. 大型工程安全隐患管理协作特征挖掘[J]. 清华大学学报(自然科学版), 2022, 62(2): 208-214.
[8] 孙悦, 何可, 张执南. 多源信息拟合摩擦系数的回归集成模型[J]. 清华大学学报(自然科学版), 2022, 62(12): 1980-1988.
[9] 刘强墨, 何旭, 周佰顺, 吴昊霖, 张弛, 秦羽, 沈晓梅, 高小榕. 基于机器学习和瞳孔响应的简易高性能自闭症分类模型[J]. 清华大学学报(自然科学版), 2022, 62(10): 1730-1738.
[10] 刘树栋, 张嘉妮, 陈旭. 评论感知的异构变分自编码器推荐模型[J]. 清华大学学报(自然科学版), 2022, 62(1): 88-97.
[11] 张天一, 朱志明, 朱传辉, 孙博文. 用于弧焊过程的视觉传感图像处理及特征信息提取方法[J]. 清华大学学报(自然科学版), 2022, 62(1): 156-162.
[12] 李聪, 马勋, 杨锐, 张辉. 基于形状和纹理特征的正庚烷环形油池火特性[J]. 清华大学学报(自然科学版), 2021, 61(6): 502-508.
[13] 黄碧月, 陈雅皓, 孙海顺, 毛俞杰, 韩应生, 王东泽. 考虑静止无功补偿器的直驱风电并网系统次同步振荡[J]. 清华大学学报(自然科学版), 2021, 61(5): 446-456.
[14] 宋国兵, 张宇轩, 张晨浩, 侯俊杰, 徐瑞东. 换流站传递特性及其对交直流电网保护影响[J]. 清华大学学报(自然科学版), 2021, 61(5): 465-477.
[15] 唐颖复, 王忠静, 张子雄. 基于改进SIFT和SURF算法的沙丘图像配准[J]. 清华大学学报(自然科学版), 2021, 61(2): 161-169.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn