Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2020, Vol. 60 Issue (5) : 408-414     DOI: 10.16511/j.cnki.qhdxxb.2020.21.001
SPECIAL SECTION:BIG DATA |
Hybrid computational strategy for deep learning based on AVX2
JIANG Wenbin, WANG Hongbin, LIU Pai, CHEN Yuhao
Services Computing Technology and System Laboratory, National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
Download: PDF(1443 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Small GPU memories usually restrict the scale of deep learning network models that can be handled in the GPU processors. To address this problem, a hybrid strategy for deep learning was developed which also uses the potential of the CPU by means of the new Intel SIMD instruction set AVX2. The neural network layers which need much memory for the intermediate data are migrated to the CPU to reduce the GPU memory usage. AVX2 is then used to improve the CPU efficiency. The key points include coordinating the network partitioning scheme and the code vectorization based on AVX2. The hybrid strategy is implemented on Caffe. Tests on some typical datasets, such as CIFAR-10 and ImageNet, show that the hybrid computation strategy enables training of larger neural network models on the GPU with acceptable performance.
Keywords hybrid computation      deep learning      AVX2 instruction set      GPU memory      Caffe     
Issue Date: 26 April 2020
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
JIANG Wenbin
WANG Hongbin
LIU Pai
CHEN Yuhao
Cite this article:   
JIANG Wenbin,WANG Hongbin,LIU Pai, et al. Hybrid computational strategy for deep learning based on AVX2[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 408-414.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2020.21.001     OR     http://jst.tsinghuajournals.com/EN/Y2020/V60/I5/408
  
  
  
  
  
  
[1] LECUN Y, BENGIO Y, HINTON G E, et al. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E, et al. ImageNet classification with deep convolutional neural networks[C]//International Conference on Neural Information Processing Systems. New York, USA:NIPS, 2012:1097-1105.
[3] GRAVES A, SCHMIDHUBER J. Framewise phoneme classification with bidirectional LSTM and other neural network architectures[J]. Neural Networks, 2005, 18(5-6):602-610.
[4] DAVIDSON J, LIEBALD B, LIU J N, et al. The YouTube video recommendation system[C]//Proceedings of the Fourth ACM Conference on Recommender Systems. Barcelona, Spain:ACM, 2010:293-296.
[5] COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch[J]. Journal of Machine Learning Research, 2011, 12(1):2493-2537.
[6] LOMONT C. Introduction to Intel® advanced vector extensions[R/OL]. (2011-06-21). https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions/.
[7] JIA Y Q, SHELHAMER E, DONAHUE J, et al. Caffe:Convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM international Conference on Multimedia. Orlando, USA:ACM, 2014:675-678.
[8] RHU M, GIMELSHEIN N, CLEMONS J, et al. vDNN:Virtualized deep neural networks for scalable, memory-efficient neural network design[C]//Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. Taipei, China:IEEE, 2016:18.
[9] WANG L N, YE J M, ZHAO Y Y, et al. Superneurons:Dynamic GPU memory management for training deep neural networks[C]//Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Vienna, Austria:ACM, 2018:41-53.
[10] JIN H, LIU B, JIANG W B, et al. Layer-centric memory reuse and data migration for extreme-scale deep learning on many-core architectures[J]. ACM Transactions on Architecture and Code Optimization, 2018, 15(3):37.
[11] WANG E D, ZHANG Q, SHEN B, et al. Intel math kernel library[M]//WANG E D, ZHANG Q, SHEN B, et al. High-Performance Computing on the Intel® Xeon PhiTM. Cham:Springer, 2014:167-188.
[12] RODRIGUEZ A, SEGAL E, MEIRI E, et al. Lower numerical precision deep learning inference and training[R/OL]. (2018-01-19). https://software.intel.com/en-us/articles/accelerate-lower-numerical-precision-inference-with-intel-deep-learning-boost.
[13] HECHT-NIELSEN R. Theory of the backpropagation neural network[M]//WECHSLER H. Neural Networks for Perception. New York:Academic, 1992.
[14] CHANDRA R, DAGUM L, KOHR D, et al. Parallel programming in OpenMP[M]. San Francisco:Morgan Kaufmann, 2001.
[1] HUANG Ben, KANG Fei, TANG Yu. A real-time detection method for concrete dam cracks based on an object detection algorithm[J]. Journal of Tsinghua University(Science and Technology), 2023, 63(7): 1078-1086.
[2] MIAO Xupeng, ZHANG Minxu, SHAO Yingxia, CUI Bin. PS-Hybrid: Hybrid communication framework for large recommendation model training[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1417-1425.
[3] DING Guangyao, CHEN Qihang, XU Chen, QIAN Weining, ZHOU Aoying. Model sharing for GPU-accelerated DNN inference in big data processing systems[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1435-1441.
[4] MEI Jie, LI Qingbin, CHEN Wenfu, WU Kun, TAN Yaosheng, LIU Chunfeng, WANG Dongmin, HU Yu. Overtime warning of concrete pouring interval based on object detection model[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(7): 688-693.
[5] GUAN Zhibin, WANG Xiaomeng, XIN Wei, WANG Jiajie. Data generation and annotation method for source code defect detection[J]. Journal of Tsinghua University(Science and Technology), 2021, 61(11): 1240-1245.
[6] HAN Kun, PAN Haiwei, ZHANG Wei, BIAN Xiaofei, CHEN Chunling, HE Shuning. Alzheimer's disease classification method based on multi-modal medical images[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(8): 664-671,682.
[7] WANG Zhiguo, ZHANG Yujin. Anomaly detection in surveillance videos: A survey[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(6): 518-529.
[8] YU Chuanming, YUAN Sai, HU Shasha, AN Lu. Deep learning multi-language topic alignment model across domains[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 430-439.
[9] SONG Xinrui, ZHANG Xianqi, ZHANG Zhan, CHEN Xinhao, LIU Hongwei. Multi-sensor data fusion for complex human activity recognition[J]. Journal of Tsinghua University(Science and Technology), 2020, 60(10): 814-821.
[10] ZHANG Sicong, XIE Xiaoyao, XU Yang. Intrusion detection method based on a deep convolutional neural network[J]. Journal of Tsinghua University(Science and Technology), 2019, 59(1): 44-52.
[11] LU Xiaofeng, JIANG Fangshuo, ZHOU Xiao, CUI Baojiang, YI Shengwei, SHA Jing. API based sequence and statistical features in a combined malware detection architecture[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(5): 500-508.
[12] ZHANG Xinyu, GAO Hongbo, ZHAO Jianhui, ZHOU Mo. Overview of deep learning intelligent driving methods[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(4): 438-444.
[13] ZOU Quanchen, ZHANG Tao, WU Runpu, MA Jinxin, LI Meicong, CHEN Chen, HOU Changyu. From automation to intelligence: Survey of research on vulnerability discovery techniques[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(12): 1079-1094.
[14] ZHANG Min, DING Biyuan, MA Weizhi, TAN Yunzhi, LIU Yiqun, MA Shaoping. Hybrid recommendation approach enhanced by deep learning[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(10): 1014-1021.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd