Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2019, Vol. 59 Issue (4) : 262-269     DOI: 10.16511/j.cnki.qhdxxb.2018.26.059
COMPUTER SCIENCE AND TECHNOLOGY |
Label noise filtering based on the data distribution
CHEN Qingqiang1, WANG Wenjian2, JIANG Gaoxia1
1. School of Computer and Information Technology, Shanxi University, Taiyuan 030006, China;
2. Key Laboratory of Computation Intelligence and Chinese Information Processing of Ministry of Education, Shanxi University, Taiyuan 030006, China
Download: PDF(9900 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Label noise can severely influence supervised learning models. Existing methods are mainly based on model predictions and robust prediction modeling. However, these methods are sometimes not effective or efficient. This paper presents a label noise filtering method based on the data distribution. First, the area formed by each sample and the vicinage samples is divided into high density area or low density areas according to the distribution of the vicinage samples. Then, different noise filtering rules are used to deal with the different areas. Thus, this approach takes the data distribution into account so that the label noise filtering is focused on the key data and can avoid over-filtering. Filter rules are used instead of a noise filter forecasting model, which improves the efficiency. Tests on 15 UCI standard multi-class data sets show that this approach is effective and efficient.
Keywords label noise      noise filtering      robust modeling      data distribution     
Issue Date: 09 April 2019
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
CHEN Qingqiang
WANG Wenjian
JIANG Gaoxia
Cite this article:   
CHEN Qingqiang,WANG Wenjian,JIANG Gaoxia. Label noise filtering based on the data distribution[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(4): 262-269.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2018.26.059     OR     http://jst.tsinghuajournals.com/EN/Y2019/V59/I4/262
  
  
  
  
  
  
  
  
  
[1] FRENAY B, VERLEYSEN M. Classification in the presence of label noise:A survey[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(5):845-869.
[2] SEGATA N, BLANZIERI E, DELANY S J, et al. Noise reduction for instance-based learning with a local maximal margin approach[J]. Journal of Intelligent Information Systems, 2010, 35(2):301-331.
[3] VAN DEN HOUT A, VAN DER HEIJDEN P G M. Randomized response, statistical disclosure control and misclassification:A review[J]. International Statistical Review, 2002, 70(2):269-288.
[4] YUAN W W, GUAN D H, MA T H, et al. Classification with class noises through probabilistic sampling[J]. Information Fusion, 2018, 41:57-67.
[5] SABZEVARI M, MARTÍNEZ-MUÑOZ G, SUÁREZ A. A two-stage ensemble method for the detection of class-label noise[J]. Neurocomputing, 2018, 275:2374-2383.
[6] SÁEZ J A, GALAR M, LUENGO J, et al. INFFC:An iterative class noise filter based on the fusion of classifiers with noise sensitivity control[J]. Information Fusion, 2016, 27:19-32.
[7] LUENGO J, SHIM S O, ALSHOMRANI S, et al. CNC-NOS:Class noise cleaning by ensemble filtering and noise scoring[J]. Knowledge-Based Systems, 2018, 140:27-49.
[8] MANWANI N, SASTRY P S. Noise tolerance under risk minimization[J]. IEEE Transactions on Cybernetics, 2013, 43(3):1146-1151.
[9] LIU T L, TAO D C. Classification with noisy labels by importance reweighting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(3):447-461.
[10] FRIEDMAN J, HASTIE T, TIBSHIRANI R. Additive logistic regression:A statistical view of boosting[J]. The Annals of Statistics, 2000, 28(2):337-374.
[11] ABELLÁN J, MASEGOSA A R. Bagging decision trees on data sets with classification noise[C]//The 6th International Symposium Foundations of Information and Knowledge Systems. Sofia, Bulgaria:Springer, 2010:248-265.
[12] BARTLETT P L, JORDAN M I, MCAULIFFE J D. Convexity, classification, and risk bounds[J]. Journal of the American Statistical Association, 2006, 101(473):138-156.
[13] WILSON D R, MARTINEZ T R. Reduction techniques for instance-based learning algorithms[J]. Machine Learning, 2000, 38(3):257-286.
[14] WILSON D L. Asymptotic properties of nearest neighbor rules using edited data[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1972, SMC-2(3):408-421.
[15] BARANDELA R, GASCA E. Decontamination of training samples for supervised pattern recognition methods[C]//Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition. Alicante, Spain:Springer, 2000:621-630.
[16] HART P. The condensed nearest neighbor rule (Corresp.)[J]. IEEE Transactions on Information Theory, 1968, 14(3):515-516.
[17] CAO J J, KWONG S, WANG R. A noise detection based AdaBoost algorithm for mislabeled data[J]. Pattern Recognition, 2012, 45(12):4451-4465.
[18] SLUBAN B, GAMBERGER D, LAVRAC N. Ensemble-based noise detection:Noise ranking and visual performance evaluation[J]. Data Mining and Knowledge Discovery, 2014, 28(2):265-303.
[19] EKAMBARAM R, FEFILATYEV S, SHREVE M, et al. Active cleaning of label noise[J]. Pattern Recognition, 2016, 51:463-480.
[20] DUA D, KARRA TANISKIDOU E. UCI machine learning repository[EB/OL].[2017-11-05]. http://archive.ics.uci.edu/ml.
[1] TU Shouzhong, YANG Jing, ZHAO Lin, ZHU Xiaoyan. Filtering Chinese microblog topics noise algorithm based on a semi-supervised model[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(3): 178-185.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd