Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2022, Vol. 62 Issue (9): 1417-1425    DOI: 10.16511/j.cnki.qhdxxb.2021.22.041
  数据库 本期目录 | 过刊浏览 | 高级检索 |
PS-Hybrid: 面向大规模推荐模型训练的混合通信框架
苗旭鹏1, 张敏旭1, 邵蓥侠2, 崔斌1
1. 北京大学 信息科学技术学院, 北京 100871;
2. 北京邮电大学 计算机学院, 北京 100876
PS-Hybrid: Hybrid communication framework for large recommendation model training
MIAO Xupeng1, ZHANG Minxu1, SHAO Yingxia2, CUI Bin1
1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China;
2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China
全文: PDF(4639 KB)   HTML
输出: BibTeX | EndNote (RIS)      
摘要 传统的分布式深度学习训练系统大多基于参数服务器和全局规约通信框架,缺陷日益显著:参数量大,基于全局规约的去中心化通信架构由于无法存储全量模型而无法使用;通信量大,基于参数服务器的中心化通信架构面临着严重的通信瓶颈。为了解决以上问题,该文提出了面向大规模深度学习推荐模型的混合通信训练框架PS-Hybrid,分离了嵌入层参数和其他参数的通信逻辑,实现了PS-Hybrid原型系统。实验结果证明了所提出的混合通信方案能够比纯参数服务器方案取得更好的性能,在16个计算节点下比TensorFlow-PS加速48%。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
苗旭鹏
张敏旭
邵蓥侠
崔斌
关键词 推荐模型分布式深度学习参数服务器全局规约    
Abstract:Most traditional distributed deep learning training systems have been based on parameter servers which have centralized communication architectures that face serious communication bottlenecks due to the large amounts of communications and AllReduce communication frameworks which have decentralized communication architectures that cannot store the entire model due to the large number of parameters. This paper presents PS-Hybrid, a hybrid communication framework, for large deep learning recommendation model training which decouples the communication logic from the embedded parameters and other parameters. Tests show that this prototype system achieves better performance than previous parameter servers for recommendation model training. The system is 48% faster than TensorFlow-PS with 16 computing nodes.
Key wordsrecommendation model    distributed deep learning    parameter server    AllReduce
收稿日期: 2021-07-22      出版日期: 2022-08-18
基金资助:崔斌,教授,E-mail:bin.cui@pku.edu.cn
引用本文:   
苗旭鹏, 张敏旭, 邵蓥侠, 崔斌. PS-Hybrid: 面向大规模推荐模型训练的混合通信框架[J]. 清华大学学报(自然科学版), 2022, 62(9): 1417-1425.
MIAO Xupeng, ZHANG Minxu, SHAO Yingxia, CUI Bin. PS-Hybrid: Hybrid communication framework for large recommendation model training. Journal of Tsinghua University(Science and Technology), 2022, 62(9): 1417-1425.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2021.22.041  或          http://jst.tsinghuajournals.com/CN/Y2022/V62/I9/1417
  
  
  
  
  
  
  
  
  
  
  
  
  
[1] FEDUS W, ZOPH B, SHAZEER N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. [Z/OL].(2021-01-11)[2021-05-12]. https://arxiv.org/abs/2101.03961.
[2] CHENG H T, KOC L, HARMSEN J, et al. Wide & deep learning for recommender systems[C]// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston, USA, 2016: 7-10.
[3] WANG R X, FU B, FU G, et al. Deep & cross network for ad click predictions[C]// Proceedings of the ADKDD'17. Halifax, Canada, 2017: 1-7.
[4] LI M, ANDERSEN D G, PARK J W, et al. Scaling distributed machine learning with the parameter server[C]// Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. Broomfield, USA, 2014: 583-598.
[5] JIANG J, YU L L, JIANG J W, et al. Angel: A new large-scale machine learning system[J]. National Science Review, 2018, 5(2): 216-236.
[6] LI M, ANDERSEN D G, SMOLA A, et al. Communication efficient distributed machine learning with the parameter server[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014, 1: 19-27.
[7] ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: A system for large-scale machine learning[C]// Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. Savannah, USA, 2016: 265-283.
[8] PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, 2019: 8026-8037.
[9] LI S, ZHAO Y L, VARMA R, et al. PyTorch distributed: Experiences on accelerating data parallel training[J]. Proceedings of the VLDB Endowment, 2020, 13(12): 3005-3018.
[10] PS-Lite. A light and efficient implementation of the parameter server framework [R/OL]. [2021-05-12]. https://github.com/dmlc/ps-lite.
[11] NVIDIA Developer. NVIDIA collective communications library (NCCL).[R/OL]. [2021-05-12]. https://developer.nvidia.com/nccl.
[12] JIANG B Y, DENG C, YI H M, et al. XDL: An industrial deep learning framework for high-dimensional sparse data[C]// Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. Anchorage, USA, 2019: 1-9.
[13] PATARASUK P, YUAN X. Bandwidth efficient all-reduce operation on tree topologies[C]// 2007 IEEE International Parallel and Distributed Processing Symposium. Long Beach, USA, 2007: 1-8.
[14] ANDREW G. Bringing HPC techniques to deep learning.[R/OL]. [2021-05-12]. https://andrew.gibiansky.com/blog/machine-learning/baidu-allreduce/.
[15] JIA X Y, SONG S T, HE W, et al. Highly scalable deep learning training system with mixed-precision: Training ImageNet in four minutes[C]// Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal, Canada, 2018.
[16] Hetu. A high-performance distributed deep learning system targeting large-scale and automated distributed training [R/OL]. [2021-05-12]. https://github.com/PKU-DAIR/Hetu.
[17] PENG Y H, ZHU Y B, CHEN Y R, et al. A generic communication scheduler for distributed DNN training acceleration[C]// Proceedings of the 27th ACM Symposium on Operating Systems Principles. Huntsville, Canada, 2019: 16-29.
[18] SHAN Y, HOENS T R, JIAO J, et al. Deep crossing: Web-scale modeling without manually crafted combinatorial features[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016: 255-262.
No related articles found!
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn