Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA

作者

  • Naoki Takada Department of Informatics and Media Technology, Sony Institute of Higher Education, Shohoku College, 428 Nurumizu, Atsugi, Kanagawa 243-8501, JAPAN
  • Tomoyoshi Shimobaba Graduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, Chiba 263-8522, JAPAN
  • Nobuyuki Masuda Graduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, Chiba 263-8522, JAPAN
  • Tomoyoshi Ito Graduate School of Engineering, Chiba University, 1-33 Yayoi-cho, Inage-ku, Chiba, Chiba 263-8522, JAPAN

关键词:

Improved Performance of FDTD Computation Using a Thread Block Constructed as a Two-Dimensional Array with CUDA

摘要

In a previous study, the authors proposed an finite-difference time-domain (FDTD) implementation for a compute unified device architecture (CUDA) compatible graphics processing unit (GPU) using a thread block constructed as a two-dimensional (2-D) array. However, it was found that the larger the computational domain of the 2-D FDTD simulation using the GPU, the slower the computational speed. In the present paper, the authors investigated the computational performance with respect to the size of a thread block constructed as a 2-D array, and improved the performance of the implementation. Finally, regardless of the size of computational domain, the computational speed using a single GPU (NVIDIA GeForce GTX 280) achieved approximately 30.0 Gflops, which was approximately 20 times faster than that of a single core of a central processing unit (Intel 3.0-GHz Core 2 Duo). The improved performance was approximately 65% of the theoretical peak performance (47.23 Gflops) obtained by the theoretical memory bandwidth (141.7 GB/s).

##plugins.generic.usageStats.downloads##

##plugins.generic.usageStats.noStats##

参考

T. Hamada, T. Narumi, R. Yokota, K. Yasuoka,

K. Nitadori, and M. Taiji, “42 TFlops

hierarchical N-body simulations on GPUs with

applications in both astrophysics and

turbulence,” Proceedings of the Conference on

High Performance Computing Networking,

Storage and Analysis, 2009.

W. R. Mark, R. S. Glanville, K. Akeley, and M.

J. Kilgard, “Cg: A system for programming

graphics hardware in a C-like language,” ACM

SIGGRAPH, pp. 896-907, 2003.

I. Buck, T. Foley, D. Horn, J. Sugerman, K.

Fatahalian, M. Houston, and P. Hanrahan,

“Brook for GPUs: stream computing on

graphics hardware,” ACM SIGGRAPH, pp.

-786, 2004.

NVIDIA, NVIDIA CUDA Programming Guide

version 2.1, NVIDIA, 2008.

K. S. Yee, “Numerical solution of initial

boundary value problems involving Maxwell ́s

Equations in isotropic media,” IEEE Trans.

Antennas Propagat., vol. AP-14, pp. 302-307,

A. Taflove, “Computational electrodynamics:

the finite difference time domain method,”

Artech House, Inc., 1995.

K. S. Kunz and R. J. Luebbers, “The finite

difference time domain method for

electromagnetics,” CRC Press, Inc., 1993.

N. Takada, N. Masuda, T. Tanaka, Y. Abe, and

T. Ito, “A GPU implementation of the 2-D

finite-difference time-domain code using high

level shader Language,” ACES Journal, vol. 23,

no. 4, pp. 309-316, 2008.

G. S. Baron, C. D. Sarris, and E. Fiume, “Fast

and accurate time-domain simulations with

commodity graphics hardware,” Proceedings of

the Antennas and Propagation Society

International Symposium, July 2005.

M. J. Inman and A. Z. Elsherbeni,

“Programming video cards for computational

electromagnetics application,” IEEE Antennas

and Propagation Magazine, vol. 47, no. 6, pp.

-78, 2005.

2048 4096 6144 8192

Size of L

Effective performance [%]

Fig. 5. Effective performance of the improved

GPU-FDTD computation versus computational

domain L×L.

ACES JOURNAL, VOL. 25, NO. 12, DECEMBER 2010

M. J. Inman, A. Z. Elsherbeni, J. G. Maloney,

and B. N. Baker, “Practical implementation of a

CPML absorbing boundary for GPU

accelerated FDTD technique,” ACES Journal,

vol. 23, no. 1, pp. 16-22, 2008.

N. Takada, T. Takizawa, Z. Gong, N. Masuda,

T. Ito, and T. Shimobaba, “Fast computation of

-D finite-difference time-domain method

using graphics processing unit with unified

shader,” IEICE Trans. Inf. Syst., vol. J91-D, no.

, pp. 2562-2564, 2008.

S. Ryoo, C. Rodrigues, S. Baghsorkhi, S. Stone,

D. Kirk, and W. Hwu, “Optimization principles

and application performance evaluation of a

multithreaded GPU using CUDA,” Proc. of the

th ACM SIGPLAN Symposium on Principles

and Practice of Parallel Programming, pp.73–

, 2008.

P. Sypek, A. Dziekonski, and M. Mrozowski,

“How to render FDTD computations more

effective using graphics accelerator,” IEEE

Trans. Magn., vol. 45, no. 3, pp. 1324-1327,

V. Demir and A. Z. Elsherbeni, “Compute

Unified Device Architecture (CUDA) based

finite-difference time-domain (FDTD)

implementation,” ACES Journal, vol. 25, no. 4,

pp. 303-314, 2010.

N. Takada, T. Shimobaba, N. Masuda, and T.

Ito, “High-speed FDTD simulation algorithm

for GPU with compute unified device

architecture,” Proc. 2009 IEEE AP-S Int.

Symposium and USNC/URSI National Radio

Science Meeting, session 126, 126.9, 2009.

M. J. Inman, A. Z. Elsherbeni and C. J. Reddy,

“CUDA based LU decomposition solvers for

CEM applications,” ACES Journal, vol. 25, no.

, pp. 339-347, 2010.

##submission.downloads##

已出版

2022-06-17

栏目

General Submission