Price-Performance Aspects of Accelerating the FDTD Method Using the Vector Processing Programming Paradigm on GPU and Multi-Core Clusters

Robert G.  Ilgner; David B.  Davidson

Authors

Robert G. Ilgner Department of Electrical and Electronic Engineering University of Stellenbosch, Matieland, Private Bag X1, Western Cape, South Africa
David B. Davidson Department of Electrical and Electronic Engineering University of Stellenbosch, Matieland, Private Bag X1, Western Cape, South Africa

Keywords:

AVX, cluster, FDTD, GPU, multicore, performance, SSE and vector processing

Abstract

The parallelization of the FDTD on GPUs has become popular due to the low cost, low power and high compute performance achieved with these devices. In recent years, manufacturers of multi-core processors have enhanced the vector processing capability inherent in conventional processing cores, to the extent that these are now contributing considerably to the acceleration of the FDTD and competing with GPUs. This paper will compare the power consumption and purchase cost versus the performance benefits of several parallel FDTD implementations, in order to quantify the effect of parallelizing the FDTD using various processing paradigms. The purchase cost of hardware, computational performance and power consumption are used to compare the parallel FDTD deployments on the BlueGene/P, GPU clusters and the multi-core clusters using SSE. It is shown that the deployment of the parallel FDTD using a hybrid programming paradigm achieves the best computational performance for the lowest purchase cost and power consumption

Downloads

Download data is not yet available.

References

J. G. Koomey, S. Berard, M. Sanchez, and H. Wong, “Implications of historical trends in the electrical efficiency of computing,” )%%%!NNALSOF THE(ISTORYOF#OMPUTING, vol. 33, pp. 46-54, 2011.

A. Taflove and S. C. Hagness, “Computational electrodynamics: the finite-difference time-domain method,” Third Edition !RTECH(OUSE, chapters 3- 7, 2005.

C. Guiffaut and K. Mahdjoubi, “A parallel FDTD algorithm using the MPI library,” )%%% !NTENNAS AND0ROPAGATION-AGAZINE, vol. 43, no 2, pp. 94- 103, April 2001.

D. Luebke, White Paper, “Nvidia® GPU architecture and implications,” 2007.

openMP website: available at HTTPWWWOPEN-0ORG.

W. Yu, X. Yang, Y. Liu, and R. Mittra, “A novel hardware acceleration technique for high performance parallel conformal FDTD method,” TH !NNUAL 2EVIEW OF 0ROGRESS IN !PPLIED #OMPUTATIONAL%LECTROMAGNETICS!#%3 , Virginia, USA, pp. 903-908, March 2011.

S. E. Krakiwsky, L. E. Tumer, and M. M. Okoniewski, “Acceleration of finite-difference time-domain (FDTD) using graphics processor units (GPU),” )%%% -44 3 )NTERNATIONAL -ICROWAVE 3YMPOSIUM $IGEST, vol. 2, pp. 1033- 1036, June 2004.

W. Simon, A. Lauer, and A. Wien, “FDTD simulations with 1011 unknowns using AVX and SSD on a consumer PC,” )%%% !NTENNAS AND 0ROPAGATION 3OCIETY )NTERNATIONAL 3YMPOSIUM !03523) Chicago, IL, USA, pp. 1-2, July 2012.

V. Demir and A. Z. Elsherbeni, “Programming finite-difference time-domain for graphics processor architecture,” )%%% !NTENNAS AND0 ROPAGATION 3OCIETY )NTERNATIONAL 3YMPOSIUM, Toronto, Ontario, Canada, July 2010.

L. Zhang, X. Yang, and W. Yu, “Enhanced parallel FDTD method using SSE instruction sets,” !PPLIED #OMPUTATIONAL %LECTROMAGNETICS 3OCIETY !#%3 *OURNAL, vol. 27, no. 1, pp. 1-8, January 2012.

W. Yu and W. Li, “An enhanced hardware acceleration FDTD technique for parallel signal line simulation,” TH !NNUAL 2EVIEW OF 0ROGRESS IN !PPLIED #OMPUTATIONAL %LECTROMAGNETICS !#%3 , Ohio, USA, pp. 411-416, April 2012.

D. B. Davidson and R. W. Ziolkowski, “A connection machine implementation of a three dimensional parallel finite difference time-domain code for electromagnetic field simulation,” )NTERNATIONAL *OURNAL OF .UMERICAL -ODELLING %LECTRONIC .ETWORKS $EVICES AND &IELDS, vol. 8, pp. 221-232, 1995.

W. Yu, X. Yang, Y. Liu, L. Ma, T. Su, N. Huang, R. Mittra, R. Maaskant, Y. Lu, Q. Che, R. Lu, and Z. Su, “A new direction in computational electromagnetics: solving large problems using the parallel FDTD on the bluegene/l supercomputer providing teraflop-level performance,” )%%% !NTENNASAND0ROPAGATION-AGAZINE, vol. 50, no. 2, pp. 26-41, April 2008.

W. Yu, M. Hashemi, R. Mittra, D. de Araujo, M. Cases, N. Pham, E. Matoglu, P. Patel, and B. Herrman, “Massively parallel conformal FDTD on a bluegene supercomputer,” )%%% )NT #ONF ON 3YSTEMS -ANAND#YBERNETIC, San Antonia, Texas, 2009.

M. F. Su, I. El-Kady, D. Bader, and Y. Lin, “A novel FDTD application featuring openMP-MPI hybrid parallelization,” )%%% )NTERNATIONAL #ONFERENCEON0ARALLEL0ROCESSING, Montreal, QC, Canada, 2004.

R. Maddox, G. Singh, and R. Safranek, “Weaving high performance multiprocessor fabric,” )NTEL 0RESS, chapters 1-2, 2009.

U. Drepper, “What every programmer should know about memory,” 3WISS &EDERAL )NSTITUTE OF 4ECHNOLOGY, 2007, available at: http://www.akkadia.org/drepper/cpumemory.pdf.

W. Yu, X. Yang, Y. Liu, R. Mittra, J. Wang, and W. Yin, “Advanced features to enhance the FDTD method in GEMS simulation software package,” )%%% )NTERNATIONAL 3YMPOSIUM ON! NTENNAS AND 0ROPAGATION !03523) , Washington, USA, pp. 2728-2731, July 2011.

V. Demir, “An algorithm to improve solution efficiency of FDFD method on GPU,” TH!NNUAL 2EVIEW OF 0ROGRESS IN !PPLIED #OMPUTATIONAL %LECTROMAGNETICS!#%3 , Ohio, USA, pp. 364- 369, April 2012.

V. Demir and A. Z. Elsherbeni, “CUDA-openGL interoperability to visualize electromagnetic fields calculated by FDTD,” !PPLIED #OMPUTATIONAL %LECTROMAGNETICS3OCIETY!#%3 *OURNAL, vol. 27, no. 2, pp. 206-214, February 2012.

V. Demir, “A stacking scheme to improve the efficiency of finite-difference time-domain solutions on graphics processing units,” !PPLIED #OMPUTATIONAL %LECTROMAGNETICS 3OCIETY !#%3 *OURNAL, vol. 25, no. 4, pp. 323-330, April 2010.

“CUDA programming manual,” available at: http://www/nvidia.com.

J. Stack and Jr., “Accelerating the finite difference time domain (FDTD) method with CUDA,” TH !NNUAL 2EVIEW OF 0ROGRESS IN !PPLIED #OMPUTATIONAL%LECTROMAGNETICS!#%3 , Virginia, USA, pp. 897-902, March 2011.

M. Weldon, L. Maxwell, D. Cyca, M. Hughes, C. Whelan, and M. Okoniewski, “A practical look at GPU-accelerated FDTD performance,” !PPLIED #OMPUTATIONAL %LECTROMAGNETICS 3OCIETY !#%3 *OURNAL, vol. 25, no. 4, pp. 314-322, April 2010.

M. Ujaldon, “Using GPUs for accelerating electromagnetic simulations,” !PPLIED #OMPUTATIONAL %LECTROMAGNETICS 3OCIETY !#%3 *OURNAL, vol. 25, no. 4, pp. 294-302, April 2010.

J. Stack, B. Suchoski, and J. Infantolino, “CUDA implementation of moving window finite difference time domain,” TH !NNUAL 2EVIEW OF

Price-Performance Aspects of Accelerating the FDTD Method Using the Vector Processing Programming Paradigm on GPU and Multi-Core Clusters

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

ImpactScore

issn

coverimage

CallForPaper

index

archivesite

Language

Information