Please wait a minute...
 首页  期刊介绍 期刊订阅 联系我们
 
最新录用  |  预出版  |  当期目录  |  过刊浏览  |  阅读排行  |  下载排行  |  引用排行  |  百年期刊
Journal of Tsinghua University(Science and Technology)    2016, Vol. 56 Issue (9) : 963-968     DOI: 10.16511/j.cnki.qhdxxb.2016.21.060
COMPUTER SCIENCE AND TECHNOLOGY |
Cholesky decomposition and parallel structure design based on matrix triangularization decomposition
LIU Shuyong, LIN Junyu, WU Yanxia, ZHANG Bowei
Collage of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China
Download: PDF(1331 KB)  
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks    
Abstract  Matrix computing is one of the core problems in high performance computing with matrix decomposition being an important way to improve the parallelism of matrix computations. FPGA gives a powerful environment for parallel computing. This study uses Cholesky decomposition based on a hardware-adaptive parallel sub-matrix identity updating algorithm. Its parallel structure is based on FPGA. Tests show that this structure achieves more than 10 fold speedup compared to general-purpose processors in terms of the kernel computational speed because the algorithm has better data-parallelism and pipeline-parallelism during matrix triangularization.
Keywords matrix triangularization decomposition      Cholesky decomposition      parallel structure      field programmable gate array     
ZTFLH:  TP302.1  
Issue Date: 15 September 2016
Service
E-mail this article
E-mail Alert
RSS
Articles by authors
LIU Shuyong
LIN Junyu
WU Yanxia
ZHANG Bowei
Cite this article:   
LIU Shuyong,LIN Junyu,WU Yanxia, et al. Cholesky decomposition and parallel structure design based on matrix triangularization decomposition[J]. Journal of Tsinghua University(Science and Technology), 2016, 56(9): 963-968.
URL:  
http://jst.tsinghuajournals.com/EN/10.16511/j.cnki.qhdxxb.2016.21.060     OR     http://jst.tsinghuajournals.com/EN/Y2016/V56/I9/963
  
  
  
  
  
  
  
[1] Satchidanand G H, Sotirios G Z. FPGA implementation of a Cholesky algorithm for a shared-memory multiprocessor architecture [J]. Parallel Algorithms and Applications, 2004, 19(4): 211-226.
[2] Kung H T, Leiserson C E. Systolic arrays technical report [R]. Pitsburg, PA, USA: Carneige Mellon University, 1978.
[3] YANG Depeng, Gregory D P, LI Husheng. Compressed sensing and Cholesky decomposition on FPGAs and GPUs [J]. Parallel Computing, 2012, 38(8): 421-437.
[4] Alfredo B, Julien L, Jakub K, et al. A class of parallel tiled linear algebra algorithms for multicore architectures [J]. Parallel Computing, 2009, 35(1): 38-53.
[5] DOU Yong, Vassiliadis S, Kuzmanov G, et al. 64-bit floating-point FPGA matrix multiplication [C]//Proceedings of the 13th ACM/SIGDA International Symposium on Field Programmable Gate Arrays(FPGA’05), New York, NY, USA: ACM, 2005: 86-95.
[6] Vasily V, James W D. LU, QR and Cholesky factorizations using vector capabilities of GPUs [R]. Berkeley, CA, USA: EECS Department, University of California, Berkeley, 2008.
[7] Oleg M, Volodymyr L, Anatoli S, et al. Parallel implementation of Cholesky LLT-algorithm in FPGA-based processor [J]. Parallel Processing and Applied Mathematics, 2008, 49(67): 137-147.
[8] Eunice E S, CHU Peiyun. Efficient and optimal parallel algorithms for Cholesky decomposition [J]. Journal of Mathematical Modeling and Algorithms, 2003, 2(3): 217-234.
[9] Aatonio R, George A. A high throughput FPGA-based floating point conjugate gradient implementation for dense matrices [J]. ACM Transactions on Reconfigurable Technology and Systems, 2010, 3(1): 1-19.
[10] Badia J M, Vidal A M. Parallel algorithms to computer the eigenvalues and eigenvectors of symmetric Toeplitz matrices [J]. Parallel Algorithms and Applications, 1998, 13(1): 75-93.
[11] Anderson E, BAI Z, Bischof C, et al. LAPACK Users Guide [M]. 3rd ED. Philadelphia, PA, USA: The society of industrial and applied mathematics, 1999.
[12] Blackford L S, Choi J, Cleary A, et al. ScaLAPACK Users Guide [M]. Philadelphia, PA, USA: The Society of Industrial and Applied Mathematics, 1997.
[13] WU Guiming, DOU Yong, SUN Junqing, et al. A high performance and memory efficient LU decomposer on FPGAs [J]. Computers, IEEE Transactions, 2012, 61(3): 366-378.
[14] WU Guiming, DOU Yong, Gregory D P. Blocking LU decomposition for FPGAs [C]//Proceedings of the 18th IEEE Symposium on Field-Programmable Custom Computing Machines(FCCM’10). Charlotte, NC,USA: IEEE, 2010, 109-112.
[15] Haridas S G, Sotirios G Z. FPGA implementation of a Cholesky algorithm for a shared memory multiprocessor architecture [J]. International Journal of Parallel, Emergent and Distributed Systems, 2004, 19(4): 211-226.
[16] 郭磊, 唐玉华, 周杰, 等. 基于FPGA的Cholesky分解细粒度并行结构与实现 [J]. 计算机研究与发展, 48 (Suppl), 2011: 258-265.GUO Lei, TANG Yuhua, ZHOU Jie, et al. Fine-grained architecture and implementation for Cholesky decomposition on FPGA [J]. Journal of Computer Research and Development, 2011, 48(suppl): 258-265. (in Chinese)
[17] WANG Xiaojun, Leeser M. A truly two-dimensional systolic array FPGA implementation of QR decomposition [J]. ACM Transactions on Embedded Computing Systems (TECS), 2009, 9(1): 17-18.
[18] 邬贵明, 窦勇, 王淼. Cholesky分解细粒度并行算法 [J]. 计算机工程与科学, 2010, 32(9): 102-106.WU Guiming, DOU Yong, WANG Miao. A fine-grained parallel algorithm for the Cholesky decomposition [J]. Computer Engineering & Science, 2010, 32(9): 102-106. (in Chinese)
[19] 刘书勇,吴艳霞,张博文,等. 基于可重构计算系统的矩阵三角化分解硬件并行结构研究 [J]. 电子学报, 43(8), 2015: 1642-1650. LIU Shuyong, WU Yanxia, ZHANG Bowen, et al. Research of parallel hardware architecture for matrix triangularization decomposition based on reconfigurable computing system [J]. ACTA Electronica Sinica, 2015, 43 (8): 1642-1650. (in Chinese)
[1] YU Jinpeng, ZHOU Yan, MO Ni, LIU Xingnan, ZHAO Lei. Design of a digital eddy current displacement sensor based on FPGA[J]. Journal of Tsinghua University(Science and Technology), 2018, 58(3): 330-336.
[2] ZHOU Bin, QIU Zhichang. Design of PCI express real-time measurement and control platforms for MEMS gyroscopes[J]. Journal of Tsinghua University(Science and Technology), 2017, 57(12): 1310-1316.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
Copyright © Journal of Tsinghua University(Science and Technology), All Rights Reserved.
Powered by Beijing Magtech Co. Ltd