Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA
关键词:
Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA摘要
In a previous study, the authors proposed an finite-difference time-domain (FDTD) implementation for a compute unified device architecture (CUDA) compatible graphics processing unit (GPU) using a thread block constructed as a two-dimensional (2-D) array. However, it was found that the larger the computational domain of the 2-D FDTD simulation using the GPU, the slower the computational speed. In the present paper, the authors investigated the computational performance with respect to the size of a thread block constructed as a 2-D array, and improved the performance of the implementation. Finally, regardless of the size of computational domain, the computational speed using a single GPU (NVIDIA GeForce GTX 280) achieved approximately 30.0 Gflops, which was approximately 20 times faster than that of a single core of a central processing unit (Intel 3.0-GHz Core 2 Duo). The improved performance was approximately 65% of the theoretical peak performance (47.23 Gflops) obtained by the theoretical memory bandwidth (141.7 GB/s).
##plugins.generic.usageStats.downloads##
参考
T. Hamada, T. Narumi, R. Yokota, K. Yasuoka,
K. Nitadori, and M. Taiji, “42 TFlops
hierarchical N-body simulations on GPUs with
applications in both astrophysics and
turbulence,” Proceedings of the Conference on
High Performance Computing Networking,
Storage and Analysis, 2009.
W. R. Mark, R. S. Glanville, K. Akeley, and M.
J. Kilgard, “Cg: A system for programming
graphics hardware in a C-like language,” ACM
SIGGRAPH, pp. 896-907, 2003.
I. Buck, T. Foley, D. Horn, J. Sugerman, K.
Fatahalian, M. Houston, and P. Hanrahan,
“Brook for GPUs: stream computing on
graphics hardware,” ACM SIGGRAPH, pp.
-786, 2004.
NVIDIA, NVIDIA CUDA Programming Guide
version 2.1, NVIDIA, 2008.
K. S. Yee, “Numerical solution of initial
boundary value problems involving Maxwell ́s
Equations in isotropic media,” IEEE Trans.
Antennas Propagat., vol. AP-14, pp. 302-307,
A. Taflove, “Computational electrodynamics:
the finite difference time domain method,”
Artech House, Inc., 1995.
K. S. Kunz and R. J. Luebbers, “The finite
difference time domain method for
electromagnetics,” CRC Press, Inc., 1993.
N. Takada, N. Masuda, T. Tanaka, Y. Abe, and
T. Ito, “A GPU implementation of the 2-D
finite-difference time-domain code using high
level shader Language,” ACES Journal, vol. 23,
no. 4, pp. 309-316, 2008.
G. S. Baron, C. D. Sarris, and E. Fiume, “Fast
and accurate time-domain simulations with
commodity graphics hardware,” Proceedings of
the Antennas and Propagation Society
International Symposium, July 2005.
M. J. Inman and A. Z. Elsherbeni,
“Programming video cards for computational
electromagnetics application,” IEEE Antennas
and Propagation Magazine, vol. 47, no. 6, pp.
-78, 2005.
2048 4096 6144 8192
Size of L
Effective performance [%]
Fig. 5. Effective performance of the improved
GPU-FDTD computation versus computational
domain L×L.
ACES JOURNAL, VOL. 25, NO. 12, DECEMBER 2010
M. J. Inman, A. Z. Elsherbeni, J. G. Maloney,
and B. N. Baker, “Practical implementation of a
CPML absorbing boundary for GPU
accelerated FDTD technique,” ACES Journal,
vol. 23, no. 1, pp. 16-22, 2008.
N. Takada, T. Takizawa, Z. Gong, N. Masuda,
T. Ito, and T. Shimobaba, “Fast computation of
-D finite-difference time-domain method
using graphics processing unit with unified
shader,” IEICE Trans. Inf. Syst., vol. J91-D, no.
, pp. 2562-2564, 2008.
S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone,
D. Kirk, and W. Hwu, “Optimization principles
and application performance evaluation of a
multithreaded GPU using CUDA,” Proc. of the
th ACM SIGPLAN Symposium on Principles
and Practice of Parallel Programming, pp.73–
, 2008.
P. Sypek, A. Dziekonski, and M. Mrozowski,
“How to render FDTD computations more
effective using graphics accelerator,” IEEE
Trans. Magn., vol. 45, no. 3, pp. 1324-1327,
V. Demir and A. Z. Elsherbeni, “Compute
Unified Device Architecture (CUDA) based
finite-difference time-domain (FDTD)
implementation,” ACES Journal, vol. 25, no. 4,
pp. 303-314, 2010.
N. Takada, T. Shimobaba, N. Masuda, and T.
Ito, “High-speed FDTD simulation algorithm
for GPU with compute unified device
architecture,” Proc. 2009 IEEE AP-S Int.
Symposium and USNC/URSI National Radio
Science Meeting, session 126, 126.9, 2009.
M. J. Inman, A. Z. Elsherbeni and C. J. Reddy,
“CUDA based LU decomposition solvers for
CEM applications,” ACES Journal, vol. 25, no.
, pp. 339-347, 2010.


