基于深度信念卷积神经网络的图像识别与分类

doi:10.16511/j.cnki.qhdxxb.2018.22.034

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(2527 KB)
输出: BibTeX | EndNote (RIS)

摘要针对基于卷积神经网络的图像识别采用随机初始化网络权值的方法易收敛到局部最优值的问题，该文提出了一种结合无监督和有监督学习的网络权值预训练算法。融合零成分分析白化与深度信念网络预学习得到的特征，对卷积神经网络权值进行初始化；通过卷积、池化等操作，对训练样本进行特征提取并使用全连接网络对特征进行分类；计算分类损失函数并优化网络参数。在公开图像数据库中进行了大量实验，与公开最佳算法比较，该算法在MNIST中的识别错误率降低了0.1%，在Caltech101中的分类准确率提升了0.56%，验证了该算法优于现有算法。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	刘琼
	李宗贤
	孙富春
	田永鸿
	曾炜

关键词 ：深度信念网络, 图像识别, 卷积神经网络

Abstract：Convolutional neural network (CNN) would easily converge to the local minimum if the network was randomly initialized in image classification tasks. A deep belief network pre-training method was developed by merging unsupervised and supervised methods. Feature sets were extracted from the image patches of zero component analysis (ZCA) whitening and deep belief pre-training to initialize weights of CNNs. Then, convolution features were extracted from the training samples by applying convolution and pooling operations and classified to a specific category through a fully connected network. Finally, the loss value was computed for global optimization. Extensive experimental evaluations on some public datasets show that this method is simple but very effective with the error rate decrease of 0.1% on MNIST and the accuracy increase of 0.56% on Caltech101, which indicates that this method is superior to similar methods.

Key words： deep belief networks image recognition convolutional neural networks

收稿日期: 2018-01-15 出版日期: 2018-09-19

基金资助:国家自然科学基金项目（61327809，91420302，61633002）；国家“九七三”重点基础研究发展计划（2015CB351806）；2018年度北京市属高校青年拔尖人才项目（CIT&TCD201804054）

通讯作者: 李宗贤,E-mail:zongxian_lee@pku.edu.cn E-mail: zongxian_lee@pku.edu.cn

引用本文:

刘琼, 李宗贤, 孙富春, 田永鸿, 曾炜. 基于深度信念卷积神经网络的图像识别与分类[J]. 清华大学学报（自然科学版）, 2018, 58(9): 781-787.
LIU Qiong, LI Zongxian, SUN Fuchun, TIAN Yonghong, ZENG Wei. Image recognition and classification by deep belief-convolutional neural networks. Journal of Tsinghua University(Science and Technology), 2018, 58(9): 781-787.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2018.22.034 或 http://jst.tsinghuajournals.com/CN/Y2018/V58/I9/781

图１算法总体流程图

图２ MNIST图像碎片

图３ MNIST图像碎片白化处理结果

图４深度信念卷积网络算法流程图

表１特征采样窗口大小对识别性能的影响

图５白化系数ε＝０．０５时的白化特征碎片

图６白化系数ε＝０．１时的白化特征碎片

图７不同白化系数下识别错误率的变化

表２本文算法与现有算法对比实验１——MNIST

图８ (网络版彩图)识别精度为１００％的部分类别图片

表３本文算法与现有算法对比实验２——Caltech１０１

表４本文算法与现有算法对比实验３——GTSRB(SPEZIAL)

图９ (网络版彩图)部分误识样本

[1] YUILLE A L, HALLINAN P W, COHEN D S. Feature extraction from faces using deformable templates[J]. International Journal of Computer Vision, 1992, 8(2):99-111.
[2] OJALA T, PIETIKAINEN M, MAENPAA T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7):971-987.
[3] DALAL N, TRIGGS B. Histograms of oriented gradients for human detection[C]//Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. San Diego, USA, 2005.
[4] LOWE D G. Object recognition from local scale-invariant features[C]//Proceedings of the Seventh IEEE International Conference on Computer Vision. Kerkyra, Greece, 1999.
[5] HINTON G E. Learning multiple layers of representation[J]. Trends in Cognitive Sciences, 2007, 11(10):428-434.
[6] HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786):504-507.
[7] SIHAG S, DUTTA P K. Faster method for deep belief network based object classification using DWT[J]. arXiv preprint arXiv:1511.06276, 2015.
[8] TORRES-CARRASQUILLO P A, SINGER E, KOHLER M A, et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features[C]//Proceedings of the 7th International Conference on Spoken Language Processing. Denver, USA, 2002.
[9] COLLOBERT R, BENGIO S. SVMTorch:Support vector machines for large-scale regression problems[J]. Journal of Machine Learning Research, 2000, 1(2):143-160.
[10] GOODFELLOW I J, WARDE-FARLEY D, MIRZA M, et al. Maxout networks[J]. arXiv preprint arXiv:1302.4389, 2013.
[11] JARRETT K, KAVUKCUOGLU K, RANZATO M, et al. What is the best multi-stage architecture for object recognition?[C]//IEEE 12th International Conference on Computer Vision. Kyoto, Japan, 2009:2146-2153.
[12] HINTON G E. A practical guide to training restricted Boltzmann machines[M]//MONTAVON G, ORR G B, MVLLER K R. Neural networks:Tricks of the trade. 2nd ed. Berlin, Germany:Springer, 2012.
[13] KAVUKCUOGLU K, SERMANET P, BOUREAU Y L, et al. Learning convolutional feature hierarchies for visual recognition[C]//Proceedings of the 23rd International Conference on Neural Information Processing Systems. Vancouver, Canada, 2010:1090-1098.
[14] LEE H, GROSSE R, RANGANATH R, et al. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations[C]//Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Canada, 2009:609-616.
[15] DONAHUE J, JIA Y Q, VINYALS O, et al. DeCAF:A deep convolutional activation feature for generic visual recognition[C]//Proceedings of the 31st International Conference on Machine Learning. Beijing, 2014:647-655.
[16] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, USA, 2012:1097-1105.
[17] SIMONYAN K, ZISSERMAN A. Two-stream convolutional networks for action recognition in videos[C]//Proceedings of the 27th International Conference on Neural Information Processing Systems. Montréal, Canada, 2014:568-576.
[18] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. The Journal of Machine Learning Research, 2011, 12:2493-2537.
[19] ZEILER M D, FERGUS R. Stochastic pooling for regularization of deep convolutional neural networks[J]. arXiv preprint arXiv:1301.3557, 2013.
[20] YU K, LIN Y Q, LAFFERTY J. Learning image representations from the pixel level via hierarchical sparse coding[C]//IEEE International Conference on Computer Vision and Pattern Recognition. Colorado Springs, USA, 2011:1713-1720.
[21] BRUNA J, MALLAT S. Invariant scattering convolution networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(8):1872-1886.
[22] CHAN T H, JIA K, GAO S H, et al. PCANet:A simple deep learning baseline for image classification?[J]. IEEE Transactions on Image Processing, 2015, 24(12):5017-5032.
[23] SERRE T, KREIMAN G, KOUH M, et al. A quantitative theory of immediate visual recognition[J]. Progress in Brain Research, 2007, 165:33-56.
[24] COATES A, NG A Y. Selecting receptive fields in deep networks[C]//Proceedings of the 24th International Conference on Neural Information Processing Systems. Granada, Spain, 2011:2528-2536.
[25] DENG Li. The MNIST database of handwritten digit images for machine learning research[J]. IEEE Signal Processing Magazine, 2012, 29(6):141-142.
[26] BELONGIE S, MALIK J, PUZICHA J. Shape matching and object recognition using shape contexts[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(4):509-522.
[27] HE K M, ZHANG X Y, REN S Q, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9):1904-1916.
[28] CIREŞAN D, MEIER U, MASCI J, et al. Multi-column deep neural network for traffic sign classification[J]. Neural Networks, 2012, 32:333-338.
[29] STALLKAMP J, SCHLIPSING M, SALMEN J, et al. Man vs. computer:Benchmarking machine learning algorithms for traffic sign recognition[J]. Neural Networks, 2012, 32:323-332.
[30] SERMANET P, LECUN Y. Traffic sign recognition with multi-scale convolutional networks[C]//Proceedings of 2011 International Joint Conference on Neural Networks. San Jose, USA, 2011:2809-2813.
[31] ZAKLOUTA F, STANCIULESCU B, HAMDOUN O. Traffic sign classification using K-d trees and random forests[C]//Proceedings of 2011 International Joint Conference on Neural Networks. San Jose, USA, 2011.

[1]	杨波, 邱雷, 吴书. 异质图神经网络协同过滤模型[J]. 清华大学学报（自然科学版）, 2023, 63(9): 1339-1349.
[2]	陈波, 张华, 陈永灿, 李永龙, 熊劲松. 基于特征增强的水工结构裂缝语义分割方法[J]. 清华大学学报（自然科学版）, 2023, 63(7): 1135-1143.
[3]	杜晓闯, 梁漫春, 黎岢, 俞彦成, 刘欣, 汪向伟, 王汝栋, 张国杰, 付起. 基于卷积神经网络的γ放射性核素识别方法[J]. 清华大学学报（自然科学版）, 2023, 63(6): 980-986.
[4]	邓青, 张博, 李宜豪, 周亮, 周正青, 蒋慧灵, 高扬. 基于级联CNN的疏散场景中人群数量估计模型[J]. 清华大学学报（自然科学版）, 2023, 63(1): 146-152.
[5]	王晓萌, 管志斌, 辛伟, 王嘉捷. 基于深度卷积神经网络的源代码缺陷检测方法[J]. 清华大学学报（自然科学版）, 2021, 61(11): 1267-1272.
[6]	韩坤, 潘海为, 张伟, 边晓菲, 陈春伶, 何舒宁. 基于多模态医学图像的Alzheimer病分类方法[J]. 清华大学学报（自然科学版）, 2020, 60(8): 664-671,682.
[7]	孙博文, 张鹏, 成茗宇, 李新童, 李祺. 基于代码图像增强的恶意代码检测方法[J]. 清华大学学报（自然科学版）, 2020, 60(5): 386-392.
[8]	林鹏, 魏鹏程, 樊启祥, 陈闻起. 基于CNN模型的施工现场典型安全隐患数据学习[J]. 清华大学学报（自然科学版）, 2019, 59(8): 628-634.
[9]	梁杰, 陈嘉豪, 张雪芹, 周悦, 林家骏. 基于独热编码和卷积神经网络的异常检测[J]. 清华大学学报（自然科学版）, 2019, 59(7): 523-529.
[10]	张思聪, 谢晓尧, 徐洋. 基于dCNN的入侵检测方法[J]. 清华大学学报（自然科学版）, 2019, 59(1): 44-52.
[11]	芦效峰, 张胜飞, 伊胜伟. 基于CNN和RNN的自由文本击键模式持续身份认证[J]. 清华大学学报（自然科学版）, 2018, 58(12): 1072-1078.
[12]	李晓飞, 许庆, 熊辉, 王建强, 李克强. 基于候选区域选择及深度网络模型的骑车人识别[J]. 清华大学学报（自然科学版）, 2017, 57(5): 491-496.
[13]	谢颖, 杨向东, 芮晓飞, 任书楠, 陈恳. 圆柱透视投影轮廓的隐式方程描述和拟合方法[J]. 清华大学学报（自然科学版）, 2016, 56(6): 640-645.

Viewed

Full text

Abstract

Cited

Shared

Discussed