Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2022, Vol. 62 Issue (9) : 1417-1425     DOI: 10.16511/j.cnki.qhdxxb.2021.22.041
DATABASE |
PS-Hybrid: Hybrid communication framework for large recommendation model training
MIAO Xupeng1, ZHANG Minxu1, SHAO Yingxia2, CUI Bin1
1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;
2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
Download: PDF(4639 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Most traditional distributed deep learning training systems have been based on parameter servers which have centralized communication architectures that face serious communication bottlenecks due to the large amounts of communications and AllReduce communication frameworks which have decentralized communication architectures that cannot store the entire model due to the large number of parameters. This paper presents PS-Hybrid, a hybrid communication framework, for large deep learning recommendation model training which decouples the communication logic from the embedded parameters and other parameters. Tests show that this prototype system achieves better performance than previous parameter servers for recommendation model training. The system is 48% faster than TensorFlow-PS with 16 computing nodes.
Keywords recommendation model      distributed deep learning      parameter server      AllReduce     
Issue Date: 18 August 2022
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
MIAO Xupeng
ZHANG Minxu
SHAO Yingxia
CUI Bin
Cite this article:   
MIAO Xupeng,ZHANG Minxu,SHAO Yingxia, et al. PS-Hybrid: Hybrid communication framework for large recommendation model training[J]. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1417-1425.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2021.22.041     OR     http://jst.tsinghuajournals.com/EN/Y2022/V62/I9/1417
  
  
  
  
  
  
  
  
  
  
  
  
  
[1] FEDUS W, ZOPH B, SHAZEER N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. [Z/OL].(2021-01-11)[2021-05-12]. https://arxiv.org/abs/2101.03961.
[2] CHENG H T, KOC L, HARMSEN J, et al. Wide & deep learning for recommender systems[C]// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston, USA, 2016: 7-10.
[3] WANG R X, FU B, FU G, et al. Deep & cross network for ad click predictions[C]// Proceedings of the ADKDD'17. Halifax, Canada, 2017: 1-7.
[4] LI M, ANDERSEN D G, PARK J W, et al. Scaling distributed machine learning with the parameter server[C]// Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. Broomfield, USA, 2014: 583-598.
[5] JIANG J, YU L L, JIANG J W, et al. Angel: A new large-scale machine learning system[J]. National Science Review, 2018, 5(2): 216-236.
[6] LI M, ANDERSEN D G, SMOLA A, et al. Communication efficient distributed machine learning with the parameter server[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014, 1: 19-27.
[7] ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: A system for large-scale machine learning[C]// Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. Savannah, USA, 2016: 265-283.
[8] PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, 2019: 8026-8037.
[9] LI S, ZHAO Y L, VARMA R, et al. PyTorch distributed: Experiences on accelerating data parallel training[J]. Proceedings of the VLDB Endowment, 2020, 13(12): 3005-3018.
[10] PS-Lite. A light and efficient implementation of the parameter server framework [R/OL]. [2021-05-12]. https://github.com/dmlc/ps-lite.
[11] NVIDIA Developer. NVIDIA collective communications library (NCCL).[R/OL]. [2021-05-12]. https://developer.nvidia.com/nccl.
[12] JIANG B Y, DENG C, YI H M, et al. XDL: An industrial deep learning framework for high-dimensional sparse data[C]// Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. Anchorage, USA, 2019: 1-9.
[13] PATARASUK P, YUAN X. Bandwidth efficient all-reduce operation on tree topologies[C]// 2007 IEEE International Parallel and Distributed Processing Symposium. Long Beach, USA, 2007: 1-8.
[14] ANDREW G. Bringing HPC techniques to deep learning.[R/OL]. [2021-05-12]. https://andrew.gibiansky.com/blog/machine-learning/baidu-allreduce/.
[15] JIA X Y, SONG S T, HE W, et al. Highly scalable deep learning training system with mixed-precision: Training ImageNet in four minutes[C]// Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal, Canada, 2018.
[16] Hetu. A high-performance distributed deep learning system targeting large-scale and automated distributed training [R/OL]. [2021-05-12]. https://github.com/PKU-DAIR/Hetu.
[17] PENG Y H, ZHU Y B, CHEN Y R, et al. A generic communication scheduler for distributed DNN training acceleration[C]// Proceedings of the 27th ACM Symposium on Operating Systems Principles. Huntsville, Canada, 2019: 16-29.
[18] SHAN Y, HOENS T R, JIAO J, et al. Deep crossing: Web-scale modeling without manually crafted combinatorial features[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016: 255-262.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd