清华大学学报(自然科学版)  2020, Vol. 60 Issue (5): 408-414    DOI: 10.16511/j.cnki.qhdxxb.2020.21.001
蒋文斌, 王宏斌, 刘湃, 陈雨浩
华中科技大学 计算机科学与技术学院, 大数据技术与系统国家工程研究中心, 服务计算技术与系统教育部重点实验室, 武汉 430074
Hybrid computational strategy for deep learning based on AVX2
JIANG Wenbin, WANG Hongbin, LIU Pai, CHEN Yuhao
Services Computing Technology and System Laboratory, National Engineering Research Center for Big Data Technology and System, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China
摘要 由于图形处理器(GPU)内存容量有限,其所能承载的深度学习网络模型规模受到很大限制。该文提出了一种深度学习混合运算策略,借助于Intel新的单指令多数据AVX2指令集,充分挖掘CPU对GPU的辅助支撑潜力。为节省GPU内存,将中间数据规模较大的网络层放在CPU端计算,并通过AVX2指令集提高CPU端的计算效率。核心技术点包括网络模型的切分与协调、基于AVX2指令的应用代码矢量化等。上述策略最终在Caffe上实现。在包括CIFAR-10、ImageNet在内的典型数据集上的实验结果表明:采用混合运算策略后,Caffe能够正常运行更大型神经网络模型,并保持较高的执行效率。
关键词 混合运算深度学习AVX2指令集图形处理器(GPU)内存Caffe    
Abstract:Small GPU memories usually restrict the scale of deep learning network models that can be handled in the GPU processors. To address this problem, a hybrid strategy for deep learning was developed which also uses the potential of the CPU by means of the new Intel SIMD instruction set AVX2. The neural network layers which need much memory for the intermediate data are migrated to the CPU to reduce the GPU memory usage. AVX2 is then used to improve the CPU efficiency. The key points include coordinating the network partitioning scheme and the code vectorization based on AVX2. The hybrid strategy is implemented on Caffe. Tests on some typical datasets, such as CIFAR-10 and ImageNet, show that the hybrid computation strategy enables training of larger neural network models on the GPU with acceptable performance.
Key wordshybrid computation    deep learning    AVX2 instruction set    GPU memory    Caffe
收稿日期: 2019-09-26      出版日期: 2020-04-26
蒋文斌, 王宏斌, 刘湃, 陈雨浩. 基于AVX2指令集的深度学习混合运算策略[J]. 清华大学学报(自然科学版), 2020, 60(5): 408-414.
JIANG Wenbin, WANG Hongbin, LIU Pai, CHEN Yuhao. Hybrid computational strategy for deep learning based on AVX2. Journal of Tsinghua University(Science and Technology), 2020, 60(5): 408-414.
