Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2020, Vol. 60 Issue (5): 408-414    DOI: 10.16511/j.cnki.qhdxxb.2020.21.001
  专题:大数据 本期目录 | 过刊浏览 | 高级检索 |
基于AVX2指令集的深度学习混合运算策略
蒋文斌, 王宏斌, 刘湃, 陈雨浩
华中科技大学 计算机科学与技术学院, 大数据技术与系统国家工程研究中心, 服务计算技术与系统教育部重点实验室, 武汉 430074
Hybrid computational strategy for deep learning based on AVX2
JIANG Wenbin, WANG Hongbin, LIU Pai, CHEN Yuhao
Services Computing Technology and System Laboratory, National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
全文: PDF(1443 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 由于图形处理器(GPU)内存容量有限,其所能承载的深度学习网络模型规模受到很大限制。该文提出了一种深度学习混合运算策略,借助于Intel新的单指令多数据AVX2指令集,充分挖掘CPU对GPU的辅助支撑潜力。为节省GPU内存,将中间数据规模较大的网络层放在CPU端计算,并通过AVX2指令集提高CPU端的计算效率。核心技术点包括网络模型的切分与协调、基于AVX2指令的应用代码矢量化等。上述策略最终在Caffe上实现。在包括CIFAR-10、ImageNet在内的典型数据集上的实验结果表明:采用混合运算策略后,Caffe能够正常运行更大型神经网络模型,并保持较高的执行效率。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
蒋文斌
王宏斌
刘湃
陈雨浩
关键词 混合运算深度学习AVX2指令集图形处理器(GPU)内存Caffe    
Abstract:Small GPU memories usually restrict the scale of deep learning network models that can be handled in the GPU processors. To address this problem, a hybrid strategy for deep learning was developed which also uses the potential of the CPU by means of the new Intel SIMD instruction set AVX2. The neural network layers which need much memory for the intermediate data are migrated to the CPU to reduce the GPU memory usage. AVX2 is then used to improve the CPU efficiency. The key points include coordinating the network partitioning scheme and the code vectorization based on AVX2. The hybrid strategy is implemented on Caffe. Tests on some typical datasets, such as CIFAR-10 and ImageNet, show that the hybrid computation strategy enables training of larger neural network models on the GPU with acceptable performance.
Key wordshybrid computation    deep learning    AVX2 instruction set    GPU memory    Caffe
收稿日期: 2019-09-26      出版日期: 2020-04-26
引用本文:   
蒋文斌, 王宏斌, 刘湃, 陈雨浩. 基于AVX2指令集的深度学习混合运算策略[J]. 清华大学学报(自然科学版), 2020, 60(5): 408-414.
JIANG Wenbin, WANG Hongbin, LIU Pai, CHEN Yuhao. Hybrid computational strategy for deep learning based on AVX2. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 408-414.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.21.001  或          http://jst.tsinghuajournals.com/CN/Y2020/V60/I5/408
  
  
  
  
  
  
[1] LECUN Y, BENGIO Y, HINTON G E, et al. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E, et al. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. New York, USA:NIPS, 2012:1097-1105.
[3] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5-6):602-610.
[4] DAVIDSON J, LIEBALD B, LIU J N, et al. The YouTube video recommendation system[C]//Proceedings of the Fourth ACM Conference on Recommender Systems. Barcelona, Spain:ACM, 2010:293-296.
[5] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537.
[6] LOMONT C. Introduction to Intel® advanced vector extensions[R/OL]. (2011-06-21). https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions/.
[7] JIA Y Q, SHELHAMER E, DONAHUE J, et al. Caffe:Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM international Conference on Multimedia. Orlando, USA:ACM, 2014:675-678.
[8] RHU M, GIMELSHEIN N, CLEMONS J, et al. vDNN:Virtualized deep neural networks for scalable, memory-efficient neural network design[C]//Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. Taipei, China:IEEE, 2016:18.
[9] WANG L N, YE J M, ZHAO Y Y, et al. Superneurons:Dynamic GPU memory management for training deep neural networks[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Vienna, Austria:ACM, 2018:41-53.
[10] JIN H, LIU B, JIANG W B, et al. Layer-centric memory reuse and data migration for extreme-scale deep learning on many-core architectures[J]. ACM Transactions on Architecture and Code Optimization, 2018, 15(3):37.
[11] WANG E D, ZHANG Q, SHEN B, et al. Intel math kernel library[M]//WANG E D, ZHANG Q, SHEN B, et al. High-Performance Computing on the Intel® Xeon PhiTM. Cham:Springer, 2014:167-188.
[12] RODRIGUEZ A, SEGAL E, MEIRI E, et al. Lower numerical precision deep learning inference and training[R/OL]. (2018-01-19). https://software.intel.com/en-us/articles/accelerate-lower-numerical-precision-inference-with-intel-deep-learning-boost.
[13] HECHT-NIELSEN R. Theory of the backpropagation neural network[M]//WECHSLER H. Neural Networks for Perception. New York:Academic, 1992.
[14] CHANDRA R, DAGUM L, KOHR D, et al. Parallel programming in OpenMP[M]. San Francisco:Morgan Kaufmann, 2001.
[1] 王志国, 章毓晋. 监控视频异常检测:综述[J]. 清华大学学报(自然科学版), 2020, 60(6): 518-529.
[2] 余传明, 原赛, 胡莎莎, 安璐. 基于深度学习的多语言跨领域主题对齐模型[J]. 清华大学学报(自然科学版), 2020, 60(5): 430-439.
[3] 张思聪, 谢晓尧, 徐洋. 基于dCNN的入侵检测方法[J]. 清华大学学报(自然科学版), 2019, 59(1): 44-52.
[4] 芦效峰, 蒋方朔, 周箫, 崔宝江, 伊胜伟, 沙晶. 基于API序列特征和统计特征组合的恶意样本检测框架[J]. 清华大学学报(自然科学版), 2018, 58(5): 500-508.
[5] 张新钰, 高洪波, 赵建辉, 周沫. 基于深度学习的自动驾驶技术综述[J]. 清华大学学报(自然科学版), 2018, 58(4): 438-444.
[6] 邹权臣, 张涛, 吴润浦, 马金鑫, 李美聪, 陈晨, 侯长玉. 从自动化到智能化:软件漏洞挖掘技术进展[J]. 清华大学学报(自然科学版), 2018, 58(12): 1079-1094.
[7] 张敏, 丁弼原, 马为之, 谭云志, 刘奕群, 马少平. 基于深度学习加强的混合推荐方法[J]. 清华大学学报(自然科学版), 2017, 57(10): 1014-1021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn