Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2018, Vol. 58 Issue (9) : 781-787     DOI: 10.16511/j.cnki.qhdxxb.2018.22.034
AUTOMATION |
Image recognition and classification by deep belief-convolutional neural networks
LIU Qiong1, LI Zongxian2, SUN Fuchun3, TIAN Yonghong2, ZENG Wei2
1. School of Automation, Beijing Information Science and Technology University, Beijing 100192, China;
2. National Engineering Laboratory for Video Technology, School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;
3. State Key Laboratory of Intelligence Technology and System, Department of Computer Science, Tsinghua University, Beijing 100084, China
Download: PDF(2527 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Convolutional neural network (CNN) would easily converge to the local minimum if the network was randomly initialized in image classification tasks. A deep belief network pre-training method was developed by merging unsupervised and supervised methods. Feature sets were extracted from the image patches of zero component analysis (ZCA) whitening and deep belief pre-training to initialize weights of CNNs. Then, convolution features were extracted from the training samples by applying convolution and pooling operations and classified to a specific category through a fully connected network. Finally, the loss value was computed for global optimization. Extensive experimental evaluations on some public datasets show that this method is simple but very effective with the error rate decrease of 0.1% on MNIST and the accuracy increase of 0.56% on Caltech101, which indicates that this method is superior to similar methods.
Keywords deep belief networks      image recognition      convolutional neural networks     
Issue Date: 19 September 2018
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
LIU Qiong
LI Zongxian
SUN Fuchun
TIAN Yonghong
ZENG Wei
Cite this article:   
LIU Qiong,LI Zongxian,SUN Fuchun, et al. Image recognition and classification by deep belief-convolutional neural networks[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(9): 781-787.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2018.22.034     OR     http://jst.tsinghuajournals.com/EN/Y2018/V58/I9/781
  
  
  
  
  
  
  
  
  
  
  
  
  
[1] YUILLE A L, HALLINAN P W, COHEN D S. Feature extraction from faces using deformable templates[J]. International Journal of Computer Vision, 1992, 8(2):99-111.
[2] OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7):971-987.
[3] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005.
[4] LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision. Kerkyra, Greece, 1999.
[5] HINTON G E. Learning multiple layers of representation[J]. Trends in Cognitive Sciences, 2007, 11(10):428-434.
[6] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[7] SIHAG S, DUTTA P K. Faster method for deep belief network based object classification using DWT[J]. arXiv preprint arXiv:1511.06276, 2015.
[8] TORRES-CARRASQUILLO P A, SINGER E, KOHLER M A, et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//Proceedings of the 7th International Conference on Spoken Language Processing. Denver, USA, 2002.
[9] COLLOBERT R, BENGIO S. SVMTorch:Support vector machines for large-scale regression problems[J]. Journal of Machine Learning Research, 2000, 1(2):143-160.
[10] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al. Maxout networks[J]. arXiv preprint arXiv:1302.4389, 2013.
[11] JARRETT K, KAVUKCUOGLU K, RANZATO M, et al. What is the best multi-stage architecture for object recognition?[C]//IEEE 12th International Conference on Computer Vision. Kyoto, Japan, 2009:2146-2153.
[12] HINTON G E. A practical guide to training restricted Boltzmann machines[M]//MONTAVON G, ORR G B, MVLLER K R. Neural networks:Tricks of the trade. 2nd ed. Berlin, Germany:Springer, 2012.
[13] KAVUKCUOGLU K, SERMANET P, BOUREAU Y L, et al. Learning convolutional feature hierarchies for visual recognition[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, Canada, 2010:1090-1098.
[14] LEE H, GROSSE R, RANGANATH R, et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations[C]//Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Canada, 2009:609-616.
[15] DONAHUE J, JIA Y Q, VINYALS O, et al. DeCAF:A deep convolutional activation feature for generic visual recognition[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, 2014:647-655.
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012:1097-1105.
[17] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montréal, Canada, 2014:568-576.
[18] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research, 2011, 12:2493-2537.
[19] ZEILER M D, FERGUS R. Stochastic pooling for regularization of deep convolutional neural networks[J]. arXiv preprint arXiv:1301.3557, 2013.
[20] YU K, LIN Y Q, LAFFERTY J. Learning image representations from the pixel level via hierarchical sparse coding[C]//IEEE International Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA, 2011:1713-1720.
[21] BRUNA J, MALLAT S. Invariant scattering convolution networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1872-1886.
[22] CHAN T H, JIA K, GAO S H, et al. PCANet:A simple deep learning baseline for image classification?[J]. IEEE Transactions on Image Processing, 2015, 24(12):5017-5032.
[23] SERRE T, KREIMAN G, KOUH M, et al. A quantitative theory of immediate visual recognition[J]. Progress in Brain Research, 2007, 165:33-56.
[24] COATES A, NG A Y. Selecting receptive fields in deep networks[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain, 2011:2528-2536.
[25] DENG Li. The MNIST database of handwritten digit images for machine learning research[J]. IEEE Signal Processing Magazine, 2012, 29(6):141-142.
[26] BELONGIE S, MALIK J, PUZICHA J. Shape matching and object recognition using shape contexts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4):509-522.
[27] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[28] CIREŞAN D, MEIER U, MASCI J, et al. Multi-column deep neural network for traffic sign classification[J]. Neural Networks, 2012, 32:333-338.
[29] STALLKAMP J, SCHLIPSING M, SALMEN J, et al. Man vs. computer:Benchmarking machine learning algorithms for traffic sign recognition[J]. Neural Networks, 2012, 32:323-332.
[30] SERMANET P, LECUN Y. Traffic sign recognition with multi-scale convolutional networks[C]//Proceedings of 2011 International Joint Conference on Neural Networks. San Jose, USA, 2011:2809-2813.
[31] ZAKLOUTA F, STANCIULESCU B, HAMDOUN O. Traffic sign classification using K-d trees and random forests[C]//Proceedings of 2011 International Joint Conference on Neural Networks. San Jose, USA, 2011.
[1] DU Xiaochuang, LIANG Manchun, LI Ke, YU Yancheng, LIU Xin, WANG Xiangwei, WANG Rudong, ZHANG Guojie, FU Qi. A gamma radionuclide identification method based on convolutional neural networks[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(6): 980-986.
[2] DENG Qing, ZHANG Bo, LI Yihao, ZHOU Liang, ZHOU Zhengqing, JIANG Huiling, GAO Yang. Crowd counting model for evacuation scenarios based on a cascaded CNN[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(1): 146-152.
[3] HAN Kun, PAN Haiwei, ZHANG Wei, BIAN Xiaofei, CHEN Chunling, HE Shuning. Alzheimer's disease classification method based on multi-modal medical images[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(8): 664-671,682.
[4] LIN Peng, WEI Pengcheng, FAN Qixiang, CHEN Wenqi. CNN model for mining safety hazard data from a construction site[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(8): 628-634.
[5] LU Xiaofeng, ZHANG Shengfei, YI Shengwei. Free-text keystroke continuous authentication using CNN and RNN[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(12): 1072-1078.
[6] XIE Ying, YANG Xiangdong, RUI Xiaofei, REN Shunan, CHEN Ken. Implicit equation description and fitting method for cylinder perspective contours[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(6): 640-645.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd