Abstract:Supercomputers provide enormous computing power for large applications. Traditional supercomputers have mainly targeted scientific computing problems. However, other applications have new requirements for the both supercomputer software and hardware designs. The New Generation Sunway supercomputer has an inefficient memory allocator when running in the dynamic mode. This study develops an efficient memory allocator, SWAlloc, that reduces the memory allocation time of the brain scale pretrained model training framework, BaGuaLu, by up to 75 839 times. Evaluations using PARSEC also show that SWAlloc can speed up the memory allocation by up to 51 times (36% on average). SWAlloc has been deployed on the New Generation Sunway supercomputer for use by various large applications, including SWPytorch and SWTensorFlow.
王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬. 面向新一代神威超级计算机的高效内存分配器[J]. 清华大学学报(自然科学版), 2022, 62(5): 943-951.
WANG Haojie, MA Zixuan, ZHENG Liyan, WANG Yuanwei, WANG Fei, ZHAI Jidong. Efficient memory allocator for the New Generation Sunway supercomputer. Journal of Tsinghua University(Science and Technology), 2022, 62(5): 943-951.
[1] KURTH T, TREICHLER S, ROMERO J, et al. Exascale deep learning for climate analytics[C]//SC18:International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, USA, 2018:649-660. [2] LIN H, ZHU X W, YU B W, et al. ShenTu:Processing multi-trillion edge graphs on millions of cores in seconds[C]//SC18:International Conference for High Performance Computing, Networking, Storage and Analysis. Dallas, USA, 2018:706-716. [3] FU H H, LIAO J F, YANG J Z, et al. The Sunway TaihuLight supercomputer:System and applications[J]. Science China Information Sciences, 2016, 59(7):072001. [4] BIENIA C, KUMAR S, SINGH J P, et al. The PARSEC benchmark suite:Characterization and architectural implications[C]//2008 International Conference on Parallel Architectures and Compilation Techniques (PACT). Toronto, Canada, 2008:72-81. [5] KNOWLTON K C. A fast storage allocator[J]. Communications of the ACM, 1965, 8(10):623-624. [6] VON PUTTKAMER E. A simple hardware buddy system memory allocator[J]. IEEE Transactions on Computers, 1975, 24(10):953-957. [7] BRYANT R E, O'HALLARON D R. Computer systems:A programmer's perspective[M]. Upper Saddle River, USA:Prentice Hall, 2003. [8] BONWICK J. The slab allocator:An object-caching kernel memory allocator[C]//USENIX Summer 1994 Technical Conference. Boston, USA, 1994:87-98. [9] AL-YATAMA A, AHMAD I, AL-DABBOUS N. Memory allocation algorithm for cloud services[J]. The Journal of Supercomputing, 2017, 73(11):5006-5033. [10] KHALED H. Enhancing recursive brute force algorithm with static memory allocation:Solving motif finding problem as a case study[C]//2019 14th International Conference on Computer Engineering and Systems (ICCES). Cairo, Egypt, 2019:66-70. [11] PUPYKINA A, AGOSTA G. Optimizing memory management in deeply heterogeneous HPC accelerators[C]//2017 46th International Conference on Parallel Processing Workshops (ICPPW). Bristol, UK, 2017:291-300. [12] 曾非一, 桑楠, 熊光泽. 嵌入式系统内存管理方案研究[J]. 单片机与嵌入式系统应用, 2005(1):5-7. ZENG F Y, SANG N, XIONG G Z. Study on memory management scheme of embedded systems[J]. Microcontrollers & Embedded Systems, 2005(1):5-7. (in Chinese) [13] 宋敏超, 李少波. 一种新型嵌入式动态内存分配算法[J]. 计算机应用, 2017, 37(S2):244-247, 254. SONG M C, LI S B. A new embedded dynamic memory allocation algorithm[J]. Journal of Computer Application, 2017, 37(S2):244-247, 254. (in Chinese) [14] 高珂, 陈荔城, 范东睿, 等. 多核系统共享内存资源分配和管理研究[J]. 计算机学报, 2015, 38(5):1020-1034. GAO K, CHEN L C, FAN D R, et al. Shared memory resources allocation and management research on multicore systems[J]. Chinese Journal of Computers, 2015, 38(5):1020-1034. (in Chinese) [15] 李涛, 李慧, 谷建华, 等. 基于ACE的并发编程模式和池式内存分配的研究[J]. 计算机工程与设计, 2006, 27(1):26-28. LI T, LI H, GU J H, et al. Study of concurrency programming pattern and pooled memory allocation using ACE[J]. Computer Engineering and Design, 2006, 27(1):26-28. (in Chinese) [16] 魏海涛, 姜昱明, 李建武, 等. 内存管理机制的高效实现研究[J]. 计算机工程与设计, 2009, 30(16):3708-3712. WEI H T, JIANG Y M, LI J W, et al. Research of high efficient implementation of memory management mechanism[J]. Computer Engineering and Design, 2009, 30(16):3708-3712. (in Chinese) [17] 杨雷, 吴珏, 陈汶滨. 实时系统中动静结合的内存管理实现[J]. 微计算机信息, 2005, 21(19):15-16, 101. YANG L, WU Y, CHEN W B. The actualization of dynamic and static memery management in RTOS[J]. Microcomputer Information, 2005, 21(19):15-16, 101. (in Chinese) [18] 谢长生, 刘志斌. Linux2.6内存管理研究[J]. 计算机应用研究, 2005(3):58-60. XIE C S, LIU Z B. Research on Linux memory management[J]. Application Research of Computers, 2005(3):58-60. (in Chinese) [19] 杜娇, 钱育蓉, 张猛, 等. 基于写页面热度的混合内存页面管理策略[J]. 东北师大学报(自然科学版), 2021, 53(2):53-59. DU J, QIAN Y R, ZHANG M, et al. Hybrid-memory page management strategy based on write page popularity[J]. Journal of Northeast Normal University (Natural Science Edition), 2021, 53(2):53-59. (in Chinese) [20] 张峰, 翟季冬, 陈政, 等. 面向异构融合处理器的性能分析、优化及应用综述[J]. 软件学报, 2020, 31(8):2603-2624. ZHANG F, ZHAI J D, CHEN Z, et al. Survey on performance analysis, optimization, and applications of heterogeneous fusion processors[J]. Journal of Software, 2020, 31(8):2603-2624. (in Chinese) [21] 杜小勇, 卢卫, 张峰. 大数据管理系统的历史、现状与未来[J]. 软件学报, 2019, 30(1):127-141. DU X Y, LU W, ZHANG F. History, present, and future of big data management systems[J]. Journal of Software, 2019, 30(1):127-141. (in Chinese) [22] WALKER D W, DONGARRA J J. MPI:A standard message passing interface[J]. Supercomputer, 1996, 12(1):56-68.