基于AVX2指令集的深度学习混合运算策略

doi:10.16511/j.cnki.qhdxxb.2020.21.001

摘要
图/表
参考文献
相关文章
Metrics

全文: PDF(1443 KB)
输出: BibTeX | EndNote (RIS)

摘要由于图形处理器（GPU）内存容量有限，其所能承载的深度学习网络模型规模受到很大限制。该文提出了一种深度学习混合运算策略，借助于Intel新的单指令多数据AVX2指令集，充分挖掘CPU对GPU的辅助支撑潜力。为节省GPU内存，将中间数据规模较大的网络层放在CPU端计算，并通过AVX2指令集提高CPU端的计算效率。核心技术点包括网络模型的切分与协调、基于AVX2指令的应用代码矢量化等。上述策略最终在Caffe上实现。在包括CIFAR-10、ImageNet在内的典型数据集上的实验结果表明：采用混合运算策略后，Caffe能够正常运行更大型神经网络模型，并保持较高的执行效率。

	服务

	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS

	作者相关文章
	蒋文斌
	王宏斌
	刘湃
	陈雨浩

关键词 ：混合运算, 深度学习, AVX2指令集, 图形处理器(GPU)内存, Caffe

Abstract：Small GPU memories usually restrict the scale of deep learning network models that can be handled in the GPU processors. To address this problem, a hybrid strategy for deep learning was developed which also uses the potential of the CPU by means of the new Intel SIMD instruction set AVX2. The neural network layers which need much memory for the intermediate data are migrated to the CPU to reduce the GPU memory usage. AVX2 is then used to improve the CPU efficiency. The key points include coordinating the network partitioning scheme and the code vectorization based on AVX2. The hybrid strategy is implemented on Caffe. Tests on some typical datasets, such as CIFAR-10 and ImageNet, show that the hybrid computation strategy enables training of larger neural network models on the GPU with acceptable performance.

Key words： hybrid computation deep learning AVX2 instruction set GPU memory Caffe

收稿日期: 2019-09-26 出版日期: 2020-04-26

引用本文:

蒋文斌, 王宏斌, 刘湃, 陈雨浩. 基于AVX2指令集的深度学习混合运算策略[J]. 清华大学学报（自然科学版）, 2020, 60(5): 408-414.
JIANG Wenbin, WANG Hongbin, LIU Pai, CHEN Yuhao. Hybrid computational strategy for deep learning based on AVX2. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 408-414.

链接本文:

http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.21.001 或 http://jst.tsinghuajournals.com/CN/Y2020/V60/I5/408

[1] LECUN Y, BENGIO Y, HINTON G E, et al. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E, et al. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. New York, USA:NIPS, 2012:1097-1105.
[3] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5-6):602-610.
[4] DAVIDSON J, LIEBALD B, LIU J N, et al. The YouTube video recommendation system[C]//Proceedings of the Fourth ACM Conference on Recommender Systems. Barcelona, Spain:ACM, 2010:293-296.
[5] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537.
[6] LOMONT C. Introduction to Intel^® advanced vector extensions[R/OL]. (2011-06-21). https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions/.
[7] JIA Y Q, SHELHAMER E, DONAHUE J, et al. Caffe:Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM international Conference on Multimedia. Orlando, USA:ACM, 2014:675-678.
[8] RHU M, GIMELSHEIN N, CLEMONS J, et al. vDNN:Virtualized deep neural networks for scalable, memory-efficient neural network design[C]//Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. Taipei, China:IEEE, 2016:18.
[9] WANG L N, YE J M, ZHAO Y Y, et al. Superneurons:Dynamic GPU memory management for training deep neural networks[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Vienna, Austria:ACM, 2018:41-53.
[10] JIN H, LIU B, JIANG W B, et al. Layer-centric memory reuse and data migration for extreme-scale deep learning on many-core architectures[J]. ACM Transactions on Architecture and Code Optimization, 2018, 15(3):37.
[11] WANG E D, ZHANG Q, SHEN B, et al. Intel math kernel library[M]//WANG E D, ZHANG Q, SHEN B, et al. High-Performance Computing on the Intel^® Xeon Phi^TM. Cham:Springer, 2014:167-188.
[12] RODRIGUEZ A, SEGAL E, MEIRI E, et al. Lower numerical precision deep learning inference and training[R/OL]. (2018-01-19). https://software.intel.com/en-us/articles/accelerate-lower-numerical-precision-inference-with-intel-deep-learning-boost.
[13] HECHT-NIELSEN R. Theory of the backpropagation neural network[M]//WECHSLER H. Neural Networks for Perception. New York:Academic, 1992.
[14] CHANDRA R, DAGUM L, KOHR D, et al. Parallel programming in OpenMP[M]. San Francisco:Morgan Kaufmann, 2001.

[1]	黄贲, 康飞, 唐玉. 基于目标检测的混凝土坝裂缝实时检测方法[J]. 清华大学学报（自然科学版）, 2023, 63(7): 1078-1086.
[2]	苗旭鹏, 张敏旭, 邵蓥侠, 崔斌. PS-Hybrid: 面向大规模推荐模型训练的混合通信框架[J]. 清华大学学报（自然科学版）, 2022, 62(9): 1417-1425.
[3]	梅杰, 李庆斌, 陈文夫, 邬昆, 谭尧升, 刘春风, 王东民, 胡昱. 基于目标检测模型的混凝土坯层覆盖间歇时间超时预警[J]. 清华大学学报（自然科学版）, 2021, 61(7): 688-693.
[4]	管志斌, 王晓萌, 辛伟, 王嘉捷. 源代码缺陷检测数据生成及标注方法[J]. 清华大学学报（自然科学版）, 2021, 61(11): 1240-1245.
[5]	韩坤, 潘海为, 张伟, 边晓菲, 陈春伶, 何舒宁. 基于多模态医学图像的Alzheimer病分类方法[J]. 清华大学学报（自然科学版）, 2020, 60(8): 664-671,682.
[6]	王志国, 章毓晋. 监控视频异常检测：综述[J]. 清华大学学报（自然科学版）, 2020, 60(6): 518-529.
[7]	余传明, 原赛, 胡莎莎, 安璐. 基于深度学习的多语言跨领域主题对齐模型[J]. 清华大学学报（自然科学版）, 2020, 60(5): 430-439.
[8]	宋欣瑞, 张宪琦, 张展, 陈新昊, 刘宏伟. 多传感器数据融合的复杂人体活动识别[J]. 清华大学学报（自然科学版）, 2020, 60(10): 814-821.
[9]	张思聪, 谢晓尧, 徐洋. 基于dCNN的入侵检测方法[J]. 清华大学学报（自然科学版）, 2019, 59(1): 44-52.
[10]	芦效峰, 蒋方朔, 周箫, 崔宝江, 伊胜伟, 沙晶. 基于API序列特征和统计特征组合的恶意样本检测框架[J]. 清华大学学报（自然科学版）, 2018, 58(5): 500-508.
[11]	张新钰, 高洪波, 赵建辉, 周沫. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报（自然科学版）, 2018, 58(4): 438-444.
[12]	邹权臣, 张涛, 吴润浦, 马金鑫, 李美聪, 陈晨, 侯长玉. 从自动化到智能化:软件漏洞挖掘技术进展[J]. 清华大学学报（自然科学版）, 2018, 58(12): 1079-1094.
[13]	张敏, 丁弼原, 马为之, 谭云志, 刘奕群, 马少平. 基于深度学习加强的混合推荐方法[J]. 清华大学学报（自然科学版）, 2017, 57(10): 1014-1021.

Viewed

Full text

Abstract

Cited

Shared

Discussed