Abstract:Sparse linear equations are the core of many large scientific computing tasks. Although parallel algorithms are providing powerful, new tools for solving sparse linear equations, existing parallel algorithms have some drawbacks such as the optimal sub-matrix being difficult to obtain and the algorithms requiring a large overhead to synchronize parallel tasks. This paper presents a parallel algorithm for sparse linear equations based on the variable correlation decomposition method. Firstly, the algorithm performs an incomplete LU decomposition on the coefficient matrix to obtain two equations for the upper and lower triangles. Then, the correlation between y and x is used to solve for the independent part of the variable in parallel from the upper and lower triangles. All the individual partial values are then added to get the final value of the variable. Since the solution does not need to wait for all the previous variables to be calculated, the partial value calculation can be performed in parallel as the calculation proceeds, which significantly reduces the algorithm execution time and improves the algorithm solution speed and parallelism. Tests show that this sparse linear equation solver is more than 50% faster than the parallel solution method in the cusparse library.
刘珊瑕, 靳松, 王文琛, 汪宇航, 王瑜. 基于变量相关性分解方法的稀疏线性方程组并行求解算法[J]. 清华大学学报(自然科学版), 2020, 60(4): 312-320.
LIU Shanxia, JIN Song, WANG Wenchen, WANG Yuhang, WANG Yu. Parallel algorithm for solving sparse linear equations based on variable correlation decomposition. Journal of Tsinghua University(Science and Technology), 2020, 60(4): 312-320.
[1] SALINAS S, LUO C Q, CHEN X H, et al. Efficient secure outsourcing of large-scale sparse linear systems of equations[J]. IEEE Transactions on Big Data, 2017, 4(1):26-39. [2] 沈海龙. 线性代数系统迭代解法与预条件方法研究[D]. 沈阳:东北大学, 2013. SHEN H L. The study of iterative methods and preconditioning methods for linear algebraic systems[D]. Shenyang:Northeastern University, 2013. (in Chinese) [3] 周挺辉, 赵文恺, 严正, 等. 基于图形处理器的电力系统稀疏线性方程组求解方法[J]. 电力系统自动化, 2015, 39(2):74-80. ZHOU T K, ZHAO W K, YAN Z, et al. A method for solving sparse linear equations of power systems based on GPU[J]. Automation of Electric Power System, 2015, 39(2):74-80. (in Chinese) [4] 谢力, 王武, 冯仰德. 基于多层半可分结构矩阵的快速算法与并行实现[J]. 数值计算与计算机应用, 2017, 38(1):37-48. XIE L, WANG W, FENG Y D. A fast algorithm based on hierarchically semiseparable structured matrix and its parallel implementation[J]. Journal on Numerical Methods and Computer Applications, 2017, 38(1):37-48. (in Chinese) [5] KECKLER S W, DALLY W J, KHAILANY B, et al. GPUs and the future of parallel computing[J]. IEEE Micro, 2011, 31(5):7-17. [6] 郑方, 张昆, 邬贵明, 等. 面向高性能计算的众核处理器结构级高能效技术[J]. 计算机学报, 2014, 37(10):2176-2186. ZHENG F, ZHANG K, WU G M, et al. Architecture techniques of many-core processor for energy-efficient in high performance computing[J]. Journal of Computers, 2014, 37(10):2176-2186. (in Chinese) [7] 朱宇兰. 基于GPU通用计算的并行算法和计算框架的实现[J]. 山东农业大学学报(自然科学版), 2016, 47(3):473-476, 480. ZHU Y L. Parallel algorithm based on general purpose computing on GPU and the implementation of calculation framework[J]. Journal of Shandong Agricultural University (Natural Science Edition), 2016, 47(3):473-476, 480. (in Chinese) [8] PARK J, SMELYANSKIY M, SUNDARAM N, et al. Sparsifying synchronization for high-performance shared memory sparse-triangular solver[M]//KUNKEL J M, LUDWIG T, MEUER H W. Supercomputing. Cham:Springer, 2014:124-140. [9] NAUMOV M. Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU[R]. Technical Report NVR-2011-001 NVIDIA. NVIDIA, 2011. [10] LI R P, SAAD Y. GPU-accelerated preconditioned iterative linear solvers[J]. The Journal of Supercomputing, 2013, 63(2):443-466. [11] WOLF M M, HEROUX M A, BOMAN E G. Factors impacting performance of multithreaded sparse triangular solve[M]//PALMA J M L M, DAYDÉ M, MARQUES O, et al. High Performance Computing for Computational Science-VECPAR 2010. Berlin, Heidelberg:Springer, 2011, 6449:32-44. [12] KABIR H, BOOTH J D, AUPY G, et al. STS-k:A multilevel sparse triangular solution scheme for NUMA multicores[C]//Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA:ACM, 2015. [13] MAADASAMY S, DORAISWAMY H, NATARAJAN V. A hybrid parallel algorithm for computing and tracking level set topology[C]//2012 19th International Conference on High Performance Computing. Pune, India:IEEE, 2012. [14] PICCIAU A, INGGS G E, WICKERSON J, et al. Balancing locality and concurrency:Solving sparse triangular systems on GPUs[C]//2016 IEEE 23rd International Conference on High Performance Computing. Hyderabad, India:IEEE, 2016. [15] SUCHOSKI B, SEVERN C, SHANTHARAM M, et al. Adapting sparse triangular solution to GPUs[C]//201241stInternational Conferenceon Parallel ProcessingWorkshops. Pittsburgh, USA:IEEE, 2012. [16] LI A, VAN DEN BRAAK G J, CORPORAAL H, et al. Fine-grained synchronizations and dataflow programming on GPUs[C]//Proceedings of the 29th ACM on International Conference on Supercomputing. New York, USA:ACM, 2015. [17] CHE S, SHEAFFER J W, SKADRON K. Dymaxion:optimizing memory access patterns for heterogeneous systems[C]//Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. New York, USA:IEEE, 2011. [18] DAVIS T A, HU Y F. The university of Florida sparse matrix collection[J]. ACM Transactions on Mathematical Software, 2011, 38(1):1-25. [19] NVIDIA Corporation. CUDA Toolkit 10. 0, cusparse library, 2018[EB/OL]. (2018-09-19) http://developer.nvidia.com/cuda-toolkit-40. [20] Ahamed A K C, MAGOULōS F. Parallel sub-structuring methods for solving sparse linear systems on a cluster of GPUs[C]//2014 IEEE International Conference on High Performance Computing and Communications, 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Syst. Paris, France:IEEE, 2015. [21] SAAD Y. Iterative methods for sparse linear systems[M]. Boston:PWS Pub. Co., 2009:216-220. [22] YOUNG D M. Iterative solution of large linear systems[M]. Amsterdam:Elsevier, 2014. [23] DAVIS T A. Direct methods for sparse linear systems[M]. Philadelphia:Fundamentals of Algorithms, 2006. [24] DAVIS T A, RAJAMANICKAM S, SID-LAKHDAR W M. A survey of direct methods for sparse linear systems[J]. Acta Numerica, 2016, 25:383-566. [25] AMENT M, KNITTEL G, WEISKOPF D, et al. A parallel preconditioned conjugate gradient solver for the Poisson problem on a multi-GPU platform[C]//2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. Pisa, Italy:IEEE, 2010:583-592. [26] BAHI J M, RAPHAëL C, KHODJA L Z. Parallel sparse linear solver GMRES for GPU clusters with compression of exchanged data[M]. Berlin Heidelberg:Parallel Processing Workshops, 2012. [27] KLIE H M, SUDAN H H, LI R, et al. Exploiting capabilities of many core platforms in reservoir simulation[C]//Society of Petroleum Engineers SPE Reservoir Simulation Symposium. The Woodlands, Texas, USA:SPE Reservoir Simulation Symposium, 2011. [28] DZIEKONSKI A, LAMECKI A, MROZOWSKI M. Jacobi and Gauss-Seidel preconditioned complex conjugate gradient method with GPU acceleration for finite element method[C]//The 40th European Microwave Conference. Paris, France:IEEE, 2010. [29] GAIKWAD A, TOKE I M. Parallel iterative linear solvers on GPU:A financial engineering case[C]//2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing. Pisa, Italy:IEEE, 2010. [30] DUFF I S, ERISMAN A M, REID J K. Direct methods for sparse matrices[M]. 2nd ed. Oxford:Oxford University Press, 2017. [31] Yeralan S N, Davis T A, Ranka S. Sparse QR factorization on GPU architectures[R]. Florida:University of Florida, 2013:1-36. [32] AGULLO E, BUTTARI A, GUERMOUCHE A, et al. Multifrontal QR factorization for multicore architectures over runtime systems[M]//WOLF F, MOHR B, AN MEY D. Euro-Par 2013 Parallel Processing. Berlin Heidelberg:Springer, 2013. [33] BUTTARI A. Fine-grained multithreading for themultifrontal QR factorization of sparse matrices[J]. SIAM Journal on Scientific Computing, 2013, 35(4):C323-C345. [34] GEORGE T, SAXENAA V, GUPTA A, et al. Multifrontal factorization of sparse SPD matrices on GPUs[C]//2011 IEEE International Parallel & Distributed Processing Symposium, Anchorage, USA:IEEE, 2011:372-383. [35] LUCAS R F, WAGENBRETH G, TRAN J J, et al. Multifrontal sparse matrix factorization on graphics processing units[EB/OL].[2012-01-13]. http://dx.doi.org/.2012. [36] RENNICH S C, STOSIC D, DAVIS T A. Accelerating sparse cholesky factorization on GPUs[J]. Parallel Computing, 2016, 59:140-150. [37] CHEN X M, REN L, WANG Y, et al. GPU-accelerated sparse LU factorization for circuit simulation with performance modeling[J]. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(3):786-795. [38] KAYAASLAN E, UÇAR B. Reducing elimination tree height for parallel LU factorization of sparse unsymmetric matrices[C]//201421st International Conference on High Performance Computing. Dona Paula, India:IEEE, 2014:1-10. [39] NVIDIA. CUDA toolkit[EB/OL].[2019-05-18]. https://developer.nvidia.com/cuda-toolkit. [40] NVIDIA. NVIDIA CUDA C programming guide. Version 3.1[M]. San Jose:NVIDIA, 2010.