Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们 横山亮次奖 百年刊庆
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  横山亮次奖  |  百年刊庆
清华大学学报(自然科学版)  2020, Vol. 60 Issue (4): 312-320    DOI: 10.16511/j.cnki.qhdxxb.2020.25.005
  电子工程 本期目录 | 过刊浏览 | 高级检索 |
基于变量相关性分解方法的稀疏线性方程组并行求解算法
刘珊瑕, 靳松, 王文琛, 汪宇航, 王瑜
华北电力大学 电子与通信工程系, 保定 071001
Parallel algorithm for solving sparse linear equations based on variable correlation decomposition
LIU Shanxia, JIN Song, WANG Wenchen, WANG Yuhang, WANG Yu
Department of Electronics and Communication Engineering, North China Electric Power University, Baoding 071001, China
全文: PDF(2868 KB)  
输出: BibTeX | EndNote (RIS)      
摘要 稀疏线性方程组的求解是许多大规模科学计算任务的核心环节。目前,并行算法的发展为稀疏线性方程组的求解提供了新的思路和强有力的工具。然而,现有的并行算法存在一些缺陷,如最优子矩阵的划分难以获得、并行任务间的同步开销较大等。针对上述问题,该文提出一种基于变量相关性分解方法的稀疏线性方程组并行求解算法。该算法首先对系数矩阵进行不完全LU分解,得到上三角和下三角方程组,然后在这2个方程组求解过程中利用y与x的关系分解变量的相关性,同时并行计算变量的独立部分值,最后将所有的独立部分值相加得到变量的最终值。由于算法中变量的求解无需等待其所有前继变量计算完成即可进行部分值计算,因此有效减少了算法的执行时间,进而提高了算法的求解速度及并行度。实验结果表明:与调用cusparse库函数实现的并行求解方法相比,该文提出的算法能将稀疏线性方程组的求解速度提升了50%以上。
服务
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章
刘珊瑕
靳松
王文琛
汪宇航
王瑜
关键词 图形处理器稀疏线性方程组并行计算变量部分值线程    
Abstract:Sparse linear equations are the core of many large scientific computing tasks. Although parallel algorithms are providing powerful, new tools for solving sparse linear equations, existing parallel algorithms have some drawbacks such as the optimal sub-matrix being difficult to obtain and the algorithms requiring a large overhead to synchronize parallel tasks. This paper presents a parallel algorithm for sparse linear equations based on the variable correlation decomposition method. Firstly, the algorithm performs an incomplete LU decomposition on the coefficient matrix to obtain two equations for the upper and lower triangles. Then, the correlation between y and x is used to solve for the independent part of the variable in parallel from the upper and lower triangles. All the individual partial values are then added to get the final value of the variable. Since the solution does not need to wait for all the previous variables to be calculated, the partial value calculation can be performed in parallel as the calculation proceeds, which significantly reduces the algorithm execution time and improves the algorithm solution speed and parallelism. Tests show that this sparse linear equation solver is more than 50% faster than the parallel solution method in the cusparse library.
Key wordsgraphics processing unit (GPU)    sparse linear equations    parallel computation    variable partial value    thread
收稿日期: 2019-09-08      出版日期: 2020-04-03
基金资助:河北省自然科学基金资助项目(F2017502043);计算机体系结构国家重点实验室开放课题(CARCH201802);中央高校基本科研业务费专项资金项目(2017MS114)
通讯作者: 靳松,副教授,E-mail:jinsong@ncepu.edu.cn     E-mail: jinsong@ncepu.edu.cn
引用本文:   
刘珊瑕, 靳松, 王文琛, 汪宇航, 王瑜. 基于变量相关性分解方法的稀疏线性方程组并行求解算法[J]. 清华大学学报(自然科学版), 2020, 60(4): 312-320.
LIU Shanxia, JIN Song, WANG Wenchen, WANG Yuhang, WANG Yu. Parallel algorithm for solving sparse linear equations based on variable correlation decomposition. Journal of Tsinghua University(Science and Technology), 2020, 60(4): 312-320.
链接本文:  
http://jst.tsinghuajournals.com/CN/10.16511/j.cnki.qhdxxb.2020.25.005  或          http://jst.tsinghuajournals.com/CN/Y2020/V60/I4/312
  图1 变量求解顺序的关联图G(V, E)
  图2 y x y 求解子图
  图3 y x 部分值计算过程
  图4 b ≠0的部分值计算
  图5 并行求解y的伪代码
  图6 (网络版彩图)本文算法和c u s p a r s e库函数计算时间
  表1 求解时间和分析时间占总时间百分比
  图7 本文算法速度提升百分比
[1] SALINAS S, LUO C Q, CHEN X H, et al. Efficient secure outsourcing of large-scale sparse linear systems of equations[J]. IEEE Transactions on Big Data, 2017, 4(1):26-39.
[2] 沈海龙. 线性代数系统迭代解法与预条件方法研究[D]. 沈阳:东北大学, 2013. SHEN H L. The study of iterative methods and preconditioning methods for linear algebraic systems[D]. Shenyang:Northeastern University, 2013. (in Chinese)
[3] 周挺辉, 赵文恺, 严正, 等. 基于图形处理器的电力系统稀疏线性方程组求解方法[J]. 电力系统自动化, 2015, 39(2):74-80. ZHOU T K, ZHAO W K, YAN Z, et al. A method for solving sparse linear equations of power systems based on GPU[J]. Automation of Electric Power System, 2015, 39(2):74-80. (in Chinese)
[4] 谢力, 王武, 冯仰德. 基于多层半可分结构矩阵的快速算法与并行实现[J]. 数值计算与计算机应用, 2017, 38(1):37-48. XIE L, WANG W, FENG Y D. A fast algorithm based on hierarchically semiseparable structured matrix and its parallel implementation[J]. Journal on Numerical Methods and Computer Applications, 2017, 38(1):37-48. (in Chinese)
[5] KECKLER S W, DALLY W J, KHAILANY B, et al. GPUs and the future of parallel computing[J]. IEEE Micro, 2011, 31(5):7-17.
[6] 郑方, 张昆, 邬贵明, 等. 面向高性能计算的众核处理器结构级高能效技术[J]. 计算机学报, 2014, 37(10):2176-2186. ZHENG F, ZHANG K, WU G M, et al. Architecture techniques of many-core processor for energy-efficient in high performance computing[J]. Journal of Computers, 2014, 37(10):2176-2186. (in Chinese)
[7] 朱宇兰. 基于GPU通用计算的并行算法和计算框架的实现[J]. 山东农业大学学报(自然科学版), 2016, 47(3):473-476, 480. ZHU Y L. Parallel algorithm based on general purpose computing on GPU and the implementation of calculation framework[J]. Journal of Shandong Agricultural University (Natural Science Edition), 2016, 47(3):473-476, 480. (in Chinese)
[8] PARK J, SMELYANSKIY M, SUNDARAM N, et al. Sparsifying synchronization for high-performance shared memory sparse-triangular solver[M]//KUNKEL J M, LUDWIG T, MEUER H W. Supercomputing. Cham:Springer, 2014:124-140.
[9] NAUMOV M. Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU[R]. Technical Report NVR-2011-001 NVIDIA. NVIDIA, 2011.
[10] LI R P, SAAD Y. GPU-accelerated preconditioned iterative linear solvers[J]. The Journal of Supercomputing, 2013, 63(2):443-466.
[11] WOLF M M, HEROUX M A, BOMAN E G. Factors impacting performance of multithreaded sparse triangular solve[M]//PALMA J M L M, DAYDÉ M, MARQUES O, et al. High Performance Computing for Computational Science-VECPAR 2010. Berlin, Heidelberg:Springer, 2011, 6449:32-44.
[12] KABIR H, BOOTH J D, AUPY G, et al. STS-k:A multilevel sparse triangular solution scheme for NUMA multicores[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA:ACM, 2015.
[13] MAADASAMY S, DORAISWAMY H, NATARAJAN V. A hybrid parallel algorithm for computing and tracking level set topology[C]//2012 19th International Conference on High Performance Computing. Pune, India:IEEE, 2012.
[14] PICCIAU A, INGGS G E, WICKERSON J, et al. Balancing locality and concurrency:Solving sparse triangular systems on GPUs[C]//2016 IEEE 23rd International Conference on High Performance Computing. Hyderabad, India:IEEE, 2016.
[15] SUCHOSKI B, SEVERN C, SHANTHARAM M, et al. Adapting sparse triangular solution to GPUs[C]//201241stInternational Conferenceon Parallel ProcessingWorkshops. Pittsburgh, USA:IEEE, 2012.
[16] LI A, VAN DEN BRAAK G J, CORPORAAL H, et al. Fine-grained synchronizations and dataflow programming on GPUs[C]//Proceedings of the 29th ACM on International Conference on Supercomputing. New York, USA:ACM, 2015.
[17] CHE S, SHEAFFER J W, SKADRON K. Dymaxion:optimizing memory access patterns for heterogeneous systems[C]//Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA:IEEE, 2011.
[18] DAVIS T A, HU Y F. The university of Florida sparse matrix collection[J]. ACM Transactions on Mathematical Software, 2011, 38(1):1-25.
[19] NVIDIA Corporation. CUDA Toolkit 10. 0, cusparse library, 2018[EB/OL]. (2018-09-19) http://developer.nvidia.com/cuda-toolkit-40.
[20] Ahamed A K C, MAGOULōS F. Parallel sub-structuring methods for solving sparse linear systems on a cluster of GPUs[C]//2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Syst. Paris, France:IEEE, 2015.
[21] SAAD Y. Iterative methods for sparse linear systems[M]. Boston:PWS Pub. Co., 2009:216-220.
[22] YOUNG D M. Iterative solution of large linear systems[M]. Amsterdam:Elsevier, 2014.
[23] DAVIS T A. Direct methods for sparse linear systems[M]. Philadelphia:Fundamentals of Algorithms, 2006.
[24] DAVIS T A, RAJAMANICKAM S, SID-LAKHDAR W M. A survey of direct methods for sparse linear systems[J]. Acta Numerica, 2016, 25:383-566.
[25] AMENT M, KNITTEL G, WEISKOPF D, et al. A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform[C]//2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. Pisa, Italy:IEEE, 2010:583-592.
[26] BAHI J M, RAPHAëL C, KHODJA L Z. Parallel sparse linear solver GMRES for GPU clusters with compression of exchanged data[M]. Berlin Heidelberg:Parallel Processing Workshops, 2012.
[27] KLIE H M, SUDAN H H, LI R, et al. Exploiting capabilities of many core platforms in reservoir simulation[C]//Society of Petroleum Engineers SPE Reservoir Simulation Symposium. The Woodlands, Texas, USA:SPE Reservoir Simulation Symposium, 2011.
[28] DZIEKONSKI A, LAMECKI A, MROZOWSKI M. Jacobi and Gauss-Seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method[C]//The 40th European Microwave Conference. Paris, France:IEEE, 2010.
[29] GAIKWAD A, TOKE I M. Parallel iterative linear solvers on GPU:A financial engineering case[C]//2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. Pisa, Italy:IEEE, 2010.
[30] DUFF I S, ERISMAN A M, REID J K. Direct methods for sparse matrices[M]. 2nd ed. Oxford:Oxford University Press, 2017.
[31] Yeralan S N, Davis T A, Ranka S. Sparse QR factorization on GPU architectures[R]. Florida:University of Florida, 2013:1-36.
[32] AGULLO E, BUTTARI A, GUERMOUCHE A, et al. Multifrontal QR factorization for multicore architectures over runtime systems[M]//WOLF F, MOHR B, AN MEY D. Euro-Par 2013 Parallel Processing. Berlin Heidelberg:Springer, 2013.
[33] BUTTARI A. Fine-grained multithreading for themultifrontal QR factorization of sparse matrices[J]. SIAM Journal on Scientific Computing, 2013, 35(4):C323-C345.
[34] GEORGE T, SAXENAA V, GUPTA A, et al. Multifrontal factorization of sparse SPD matrices on GPUs[C]//2011 IEEE International Parallel & Distributed Processing Symposium, Anchorage, USA:IEEE, 2011:372-383.
[35] LUCAS R F, WAGENBRETH G, TRAN J J, et al. Multifrontal sparse matrix factorization on graphics processing units[EB/OL].[2012-01-13]. http://dx.doi.org/.2012.
[36] RENNICH S C, STOSIC D, DAVIS T A. Accelerating sparse cholesky factorization on GPUs[J]. Parallel Computing, 2016, 59:140-150.
[37] CHEN X M, REN L, WANG Y, et al. GPU-accelerated sparse LU factorization for circuit simulation with performance modeling[J]. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(3):786-795.
[38] KAYAASLAN E, UÇAR B. Reducing elimination tree height for parallel LU factorization of sparse unsymmetric matrices[C]//201421st International Conference on High Performance Computing. Dona Paula, India:IEEE, 2014:1-10.
[39] NVIDIA. CUDA toolkit[EB/OL].[2019-05-18]. https://developer.nvidia.com/cuda-toolkit.
[40] NVIDIA. NVIDIA CUDA C programming guide. Version 3.1[M]. San Jose:NVIDIA, 2010.
[1] 蒋文斌, 王宏斌, 刘湃, 陈雨浩. 基于AVX2指令集的深度学习混合运算策略[J]. 清华大学学报(自然科学版), 2020, 60(5): 408-414.
[2] 刘德天, 傅旭东, 王光谦. CFD-DEM耦合计算中的体积分数算法[J]. 清华大学学报(自然科学版), 2017, 57(7): 720-727.
[3] 吕振华, 刘赛. 枪弹穿甲过程仿真的有限元接触模型与分区并行计算误差特性[J]. 清华大学学报(自然科学版), 2017, 57(5): 483-490.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
版权所有 © 《清华大学学报(自然科学版)》编辑部
本系统由北京玛格泰克科技发展有限公司设计开发 技术支持:support@magtech.com.cn