DATABASE |
|
|
|
|
|
PS-Hybrid: Hybrid communication framework for large recommendation model training |
MIAO Xupeng1, ZHANG Minxu1, SHAO Yingxia2, CUI Bin1 |
1. School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, China; 2. School of Computer Science, Beijing University of Posts and Telecommunications, Beijing 100876, China |
|
|
Abstract Most traditional distributed deep learning training systems have been based on parameter servers which have centralized communication architectures that face serious communication bottlenecks due to the large amounts of communications and AllReduce communication frameworks which have decentralized communication architectures that cannot store the entire model due to the large number of parameters. This paper presents PS-Hybrid, a hybrid communication framework, for large deep learning recommendation model training which decouples the communication logic from the embedded parameters and other parameters. Tests show that this prototype system achieves better performance than previous parameter servers for recommendation model training. The system is 48% faster than TensorFlow-PS with 16 computing nodes.
|
Keywords
recommendation model
distributed deep learning
parameter server
AllReduce
|
Issue Date: 18 August 2022
|
|
|
[1] FEDUS W, ZOPH B, SHAZEER N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. [Z/OL].(2021-01-11)[2021-05-12]. https://arxiv.org/abs/2101.03961. [2] CHENG H T, KOC L, HARMSEN J, et al. Wide & deep learning for recommender systems[C]// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston, USA, 2016: 7-10. [3] WANG R X, FU B, FU G, et al. Deep & cross network for ad click predictions[C]// Proceedings of the ADKDD'17. Halifax, Canada, 2017: 1-7. [4] LI M, ANDERSEN D G, PARK J W, et al. Scaling distributed machine learning with the parameter server[C]// Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. Broomfield, USA, 2014: 583-598. [5] JIANG J, YU L L, JIANG J W, et al. Angel: A new large-scale machine learning system[J]. National Science Review, 2018, 5(2): 216-236. [6] LI M, ANDERSEN D G, SMOLA A, et al. Communication efficient distributed machine learning with the parameter server[C]// Proceedings of the 27th International Conference on Neural Information Processing Systems. Montreal, Canada, 2014, 1: 19-27. [7] ABADI M, BARHAM P, CHEN J M, et al. TensorFlow: A system for large-scale machine learning[C]// Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation. Savannah, USA, 2016: 265-283. [8] PASZKE A, GROSS S, MASSA F, et al. PyTorch: An imperative style, high-performance deep learning library[C]// Proceedings of the 33rd Conference on Neural Information Processing Systems. Vancouver, Canada, 2019: 8026-8037. [9] LI S, ZHAO Y L, VARMA R, et al. PyTorch distributed: Experiences on accelerating data parallel training[J]. Proceedings of the VLDB Endowment, 2020, 13(12): 3005-3018. [10] PS-Lite. A light and efficient implementation of the parameter server framework [R/OL]. [2021-05-12]. https://github.com/dmlc/ps-lite. [11] NVIDIA Developer. NVIDIA collective communications library (NCCL).[R/OL]. [2021-05-12]. https://developer.nvidia.com/nccl. [12] JIANG B Y, DENG C, YI H M, et al. XDL: An industrial deep learning framework for high-dimensional sparse data[C]// Proceedings of the 1st International Workshop on Deep Learning Practice for High-Dimensional Sparse Data. Anchorage, USA, 2019: 1-9. [13] PATARASUK P, YUAN X. Bandwidth efficient all-reduce operation on tree topologies[C]// 2007 IEEE International Parallel and Distributed Processing Symposium. Long Beach, USA, 2007: 1-8. [14] ANDREW G. Bringing HPC techniques to deep learning.[R/OL]. [2021-05-12]. https://andrew.gibiansky.com/blog/machine-learning/baidu-allreduce/. [15] JIA X Y, SONG S T, HE W, et al. Highly scalable deep learning training system with mixed-precision: Training ImageNet in four minutes[C]// Proceedings of the 32nd Conference on Neural Information Processing Systems. Montréal, Canada, 2018. [16] Hetu. A high-performance distributed deep learning system targeting large-scale and automated distributed training [R/OL]. [2021-05-12]. https://github.com/PKU-DAIR/Hetu. [17] PENG Y H, ZHU Y B, CHEN Y R, et al. A generic communication scheduler for distributed DNN training acceleration[C]// Proceedings of the 27th ACM Symposium on Operating Systems Principles. Huntsville, Canada, 2019: 16-29. [18] SHAN Y, HOENS T R, JIAO J, et al. Deep crossing: Web-scale modeling without manually crafted combinatorial features[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, USA, 2016: 255-262. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|