CUDA Based LU Decomposition Solvers for CEM Applications
Keywords:
CUDA Based LU Decomposition Solvers for CEM ApplicationsAbstract
The use of graphical processing units to perform numerical computations required by electromagnetic analyses have been shown over the past several years significant increase in the computational speed. Most of the previous work concentrated on electromagnetic analyses that do not require matrix inversion. This paper uses the NVIDIA’s compute unified device architecture (CUDA) language to develop and modify routines for matrix solution based on the LU decomposition procedure to enhance and speed up a class of electromagnetic simulations. This implementation is utilizing the CPU and GPU for the inversion procedure. Various implementations for real, complex, single precision and double precision will be examined. The performance details of the developed LU decomposition routines especially for complex and double precision arithmetic are presented.
Downloads
References
M. J. Inman and A. Z. Elsherbeni, “Programming
video cards for computational electromagnetics
applications,” IEEE Antennas Propagation Mag.,
Vol. 47, Issue 6, pp. 71-78, 2005.
K. Fatahalian, et. al., “Understanding the
Efficiency of GPU Algorithms for Matrix-Matrix
Multiplication”, Stanford University, 2004.
V. Volkov and J. W. Demmel, Benchmarking
GPUs to tune dense linear algebra, SC08, 2008
N. Galoppo, N. Govindaraju, M. Henson, and D.
Manocha, LU-GPU: Efficient Algorithms for
Solving Dense Linear Systems on Graphics
Hardware, Proceedings of the ACM/IEEE
conference on Supercomputing, 2005.
E. Anderson, Z. Bai, J. Dongarra, A. Greenbaum,
A. Mckenney, J. Du Croz, S. Hammerling, J.
Demmel, C. Bischof, And D. Sorensen, LAPACK:
a portable linear algebra library for high-
performance computers, Supercomputing ’90 ,
M. Baboulin, J. Dongarra, and S. Tomov. Some
Issues in Dense Linear Algebra for Multicore and
Special Purpose Architectures, LAPACK Working
Note 200, 1993.
CUDA User Forums, http://forums.nvidia.com