A Pre-splitting Green’s Function based Hybrid Fast Algorithmfor Multiscale Problems

Guang-Yu Zhu1,2, Wei-Dong Li2, Wei E. I. Sha1, Hou-Xing Zhou2, and Wei Hong2

1College of Information Science and Electronic Engineering
Zhejiang University, Hangzhou, 310027, China
gyzhu_china@163.com

2State Key Laboratory of Millimeter Waves
Frontiers Science Center for Mobile Information Communication and Security
Southeast University, Nanjing, 210096, China

Submitted On: February 19, 2023; Accepted On: July 26, 2023

ABSTRACT

Based on the splitting form of the Green’s function, a hybrid fast algorithm is proposed for efficient analysis of multiscale problems. In this algorithm, the Green’s function is a priori split into two parts: a spectrally band-limited part and a spatially localized part. Then, the fast Fourier transforms (FFT) utilizing the global Cartesian grid and the matrix compression method aided by an adaptive octree grouping are implemented for these two parts, respectively. Compared with the traditional methods which only employ the FFT for acceleration, the proposed hybrid fast algorithm is capable of maintaining low memory consumption in multiscale problems without compromising time cost. Moreover, the proposed algorithm does not need cumbersome geometric treatment to implement the hybridization, and can be established in a concise and straightforward manner. Several numerical examples discretized with multiscale meshes are provided to demonstrate the computational performance of proposed hybrid fast algorithm.

Index Terms: Fast algorithm, fast Fourier transform, Green’s function, matrix compression, multiscale problems.

I. INTRODUCTION

Many real-world electromagnetic (EM) problems require multiscale discretization [1, 2, 3, 4]. When the method of moments (MoM) [5], along with classical fast algorithms, is used to model the multi-scale problems, a single fast algorithm [6, 7, 8, 9, 10, 11, 12] is often insufficient to achieve satisfactory computational performance. Instead, due to capturing both the circuit physics and wave physics [1], hybrid fast algorithms [19, 20, 21, 22] generally provide a more practical path to efficiently solve EM multiscale problems.

As we know, the multilevel fast multipole algorithm (MLFMA) is widely used for efficient simulation of electrically large problems [6, 7, 8]. However, the MLFMA encounters the sub-wavelength breakdown [8] when dense mesh occurs. As a remedy, low-frequency fast algorithms [23, 24] were studied. Correspondingly, a series of hybrid algorithms were developed [25, 26, 27, 28, 29, 30], such as mixed-form FMA [19], interpolative decomposition (ID)-MLFMA [20], and adaptive cross approximation (ACA)-MLFMA [21].

Different from the MLFMA, the pre-correction-fast Fourier transforms (FFT)-based methods [9, 10, 11, 12, 13, 14, 15, 16, 17, 18] (e.g., adaptive integral method (AIM) [9], pre-corrected FFT (P-FFT) [10], and integral equation FFT (IE-FFT) [11]) are free from the sub-wavelength breakdown. Nevertheless, due to the intrinsic uniformity of the global Cartesian grids, the pre-correction-FFT-based methods do not allow for sufficient and varying spatial resolutions to handle multiscale discretizations. That is, by simply tuning the grid spacing, it is hard to achieve high computational efficiency for far interactions and maintain low storage efficiency for the near matrix at the same time [31].

Recently, in order to overcome this shortcoming of the pre-correction-FFT-based methods, a hybrid fast algorithm was developed in [31]. In this algorithm, the pre-correction-FFT-based methods is augmented with the matrix compression method (ACA) [36, 37, 38]. However, to implement such a hybridization, the procedures are not as straightforward as they seem [31]. In particular, a complicated spatial grouping strategy is inherently needed, and the relations between the required auxiliary geometric data structures (such as the Cartesian grid, the expansion box, and the tree-cube) are complex and require cautious treatments [31]. Such a situation motivates us to reconsider the overall framework of the pre-correction-FFT-based methods and wonder if there exists a more elegant way to develop a hybrid algorithm that consists of both the FFT and the matrix compression methods(e.g., ACA).

As an interesting alternative of the well-studied pre-correction-FFT-based methods, some pre-splitting-FFT-based method was developed in [32, 33]. Basically, the pre-splitting-FFT-based method bears similarities with the well-known pre-correction-FFT-based methods, since it also uses the Cartesian grid and the FFT as the essentials for the acceleration of the spatial convolutions. However, different from the widely-used pre-correction-FFT-based methods which typically utilize the Cartesian grid-based near-field corrections (based on some local expansion boxes) to compensate for the near-field contributions, the pre-splitting-FFT-based method [32, 33] instead resorts to some splitting form of the Green’s function so as to realize accurate and efficient computations. Despite of its unique feature, this pre-splitting-based framework has not attracted enough attentions in the computational electromagnetics (CEM) community, and a hybrid fast algorithm based on such framework has never been explored yet.

In this work, based on the splitting form of the Green’s function, a hybrid fast algorithm is proposed for efficiently analyzing multiscale problems. Specifically, the pre-splitting-FFT-based method originally devised for the problems with quasi-uniform discretizations is here further enhanced with the matrix compression method (ACA). Thus, a hybrid fast algorithm using both the FFT and the matrix compression method is established. In particular, the proposed hybrid fast algorithm is capable of maintaining low memory consumption in multiscale problems without compromising time cost, thus manifesting itself as a favorable multiscale extension of the pure FFT-based method. Furthermore, the required auxiliary geometric data structures herein (including the Cartesian grid and the spatial grouping octree) can be constructed independent of each other, and are not intertwined as in [31]. Consequently, the proposed hybrid algorithm can be implemented in a straightforward manner without any cumbersome geometric treatment.

In the following, after introducing the basic formulations and implementation key points in Section 2, several numerical examples discretized with multiscale meshes are then provided in Section 3 to demonstrate the computational performances of the proposed hybrid fast algorithm for multiscale problems.

II. FORMULATION

A perfectly electric conductor (PEC) object illuminated by an incident plane wave is considered. The combined field integral equation (CFIE) is employed to model this typical electromagnetic scattering problem. Throughout the paper, we use λ to denote the wavelength in free space. Moreover, we use k and η to denote the wave number and the wave impedance in free space, respectively.

A. Combined field integral equation

The CFIE is the linear combination of the electric field integral equation (EFIE) and the magnetic field integral equation (MFIE), which is conventionally expressed as

CFIE=αEFIE+(1-α)ηMFIE, (1)

where α denotes the combination factor and 0α1, the EFIE is formulated as

Einc(r)|tan=[jkηS(J(r)+k2J(r))G(r,r)ds]|tan, (2)

and the MFIE is formulated as

Hinc(r)|tan=[n^×J(r)2+P.V.SJ(r)×G(r,r)ds]|tan. (3)

In the above, Einc and Hinc are the incident electric field and the incident magnetic field, respectively. Moreover, G(r,r) denotes the Green’s function with the field point r and the source point r. Furthermore, J(r) denotes the surface current.

By using the method of moments (MoM) [5] with the RWG function [39], the CFIE can be discretized into a matrix equation

Z[N×N]I[N×1]=V[N×1], (4)

where Z is the impedance matrix, I is the unknown surface current coefficients vector, V is the righthand side vector of the incident field, and N is the number of unknowns. More specifically, the impedance matrix Z corresponding to the CFIE is described by

ZCFIE=αZEFIE+(1-α)ηZMFIE, (5)

where

ZmnEFIE=jkηSmdsfm(r)SnG(r,r)fn(r)ds-jηkSmds[fm(r)]SnG(r,r)[fn(r)]ds, (6)

and

ZmnMFIE=12Smfm(r)fn(r)ds+Smds[n^×fm(r)]SnG(r,r)×fn(r)ds. (7)

In the above, m=1,,N and n=1,,N. Moreover, fm and fn denote the mth and nth RWG functions, respectively. Furthermore, Sm and Sn are the supports of the corresponding RWG functions, respectively.

B. Pre-splitting the Green’s function

In the following, the splitting forms of the Green’s function and its gradient are introduced. First, consider the Green’s function in the EFIE, namely

G(r,r)=e-jkR4πR, (8)

where

R=|r-r|=(x-x)2+(y-y)2+(z-z)2 (9)

denotes the distance between the field point r and the source point r. Clearly, the above can also be equivalently represented as

G(r,r)=Re[G(r,r)]+jIm[G(r,r)], (10)

where

Re[G(r,r)]=coskR4πR, (11)
Im[G(r,r)]=-sinkR4πR. (12)

A basic property of G(r,r) is that, this function is singular for R=0, and its magnitude is globally nonzero for R>0. Also, note that the singular behavior of G(r,r) comes from Re[G(r,r)].

Using the pre-splitting strategy [32, 33], the Green’s function (8) can be written as two parts, namely

G(r,r)=GE(r,r)+GP(r,r), (13)

where

GE(r,r)={Re[G(r,r)]-Φ(R),R<δ0,Rδ, (14)
GP(r,r)={Φ(R)+jIm[G(r,r)],R<δG(r,r),Rδ, (15)

with δ being a splitting threshold, and Φ(R) being some auxiliary function. Particularly, the above two parts manifest the following properties.

• Function GE(r,r) is a singular and spatially localized function.

It has nonzero function value only when R<δ, and GE(r,r)=0 for Rδ.

• Function GP(r,r) is a globally oscillatory and spectrally band-limited function.

It is smooth and bounded for r and r.

To have more intuitive understanding on the properties of function GE(r,r) and GP(r,r), the corresponding function images are shown in Fig. 1.

Moreover, from a physical and spectral perspective, GE(r,r) mainly captures the evanescent waves, and GP(r,r) mainly captures the propagating waves.

images

Figure 1: The Green’s function is a priori split into a singular but spatially localized part GE and a globally oscillatory but spectrally band-limited part GP. The relevant magnitudes of the original Green’s function and of these two parts are visualized in the figure.

To achieve the above splitting and corresponding properties, the auxiliary function Φ(R) is specially designed and one simple way is to choose Φ(R) to be a polynomial of the following form [32, 33]

Φ(R)=aR3+bR2+c, (16)

where a, b, and c are unknown coefficients. Clearly, the above is a three order polynomial without the linear term, and it implies

dΦ(0)/dR=0, (17)

meaning that the first order derivative function of Φ(R) is zero at R=0 (where r=r).

Given the splitting threshold δ, the coefficients in (16) can be determined by some continuity conditions at R=δ as follows

{Φ(δ)=Re[G(δ)]dΦ(δ)/dR=dRe[G(δ)]/dRd2Φ(δ)/dR2=d2Re[G(δ)]/dR2, (18)

or, by solving the following matrix equation

[δ3δ213δ22δ06δ20][abc]=[Re[G(δ)]dRe[G(δ)]/dRd2Re[G(δ)]/dR2]. (19)

The derivative functions of Re[G(R)] used in the above are given in the appendix. Thus, after Φ(R) is obtained, the splitting representation (13) is completely determined.

The polynomial choice (16), the implicit continuity condition (17), and the explicit continuity condition (18) are based on the consideration that, after the splitting, the resultant GP(r,r) (or, more precisely, Re[GP(r,r)]) should be smooth everywhere for r and r, so that it can be interpolated accurately without much difficulty (the interpolation procedures will be discussed in the next subsection).

To make it more clear, without loss of generality, consider the following representative situation. The source point r is fixed to the coordinate origin, and the field point r is moving along the x axis. Then, the corresponding function values of G(r,r) and GP(r,r) are evaluated, as shown in Fig. 3. We can clearly see that Re[G(r,r)] is singular when r approaches r, while Re[GP(r,r)] is continuous and smooth everywhere. Here, Re[GP(r,r)] is smooth at R=0 (where r=r) as a result of (17), and smooth at the splitting point R=δ as a result of (18).

Next, consider the gradient of the Green’s function in the MFIE, namely

G(r,r)=G˙˙˙(R)R, (20)

with

G˙˙˙(R)=Re[G˙˙˙(R)]+jIm[G˙˙˙(R)], (21)

where

Re[G˙˙˙(R)]=-coskR+kRsinkR4πR2, (22)
Im[G˙˙˙(R)]=-kRcoskR-sinkR4πR2, (23)

and

R=x-xRx^+y-yRy^+z-zRz^. (24)

Similar to (8), function G(r,r) is singular for R=0, and its magnitude is globally nonzero for R>0. Also, note that the singular behavior of G(r,r) comes from Re[G˙˙˙(r,r)].

Again, using the pre-splitting strategy [32, 33], the gradient Green’s function (20) can be written as two parts, namely

G(r,r)=GE(r,r)+GP(r,r), (25)

where

GE(r,r)={{Re[G˙˙˙(R)]-Φg(R)}R,R<δ0,Rδ, (26)
GP(r,r)={{Φg(R)+jIm[G˙˙˙(R)]}R,R<δG(r,r),Rδ, (27)

with δ being the splitting threshold, and Φg(R) being some auxiliary function.

Here, the following auxiliary function is used to achieve the desired splitting, namely

Φg(R)=aR3+bR2+cR. (28)

Note that, different from (16), the above is a three order polynomial without the constant term, and it implies

Φg(0)=0, (29)

which clearly means that the function value of Φg(R) is zero at R=0 (where r=r).

Given the splitting threshold δ, the coefficients in (28) can be determined by some continuity conditions at R=δ as follows

{Φg(δ)=Re[G˙˙˙(δ)]dΦg(δ)/dR=dRe[G˙˙˙(δ)]/dRd2Φg(δ)/dR2=d2Re[G˙˙˙(δ)]/dR2, (30)

or, by solving the following matrix equation

[δ3δ2δ3δ22δ16δ20][abc]=[Re[G˙˙˙(δ)]dRe[G˙˙˙(δ)]/dRd2Re[G˙˙˙(δ)]/dR2]. (31)

The derivative functions of Re[G˙˙˙(R)] used in the above are given in the appendix. Thus, after Φg(R) is obtained, the splitting representation (25) is completely determined.

Similar as before, (28), (29), and (30) are based on the consideration that, after the splitting, the resultant GP(r,r) (or, more precisely, each Cartesian component Re[GP(r,r)]σ, σ={x,y,z}) should be smooth everywhere for r and r, so that it can be interpolated accurately without much difficulty.

To make it more clear, consider the representative situation of r and r as before. Then, the corresponding function values of [G(r,r)]x and [GP(r,r)]x are evaluated, as shown in Fig. 4. Here, Re[GP(r,r)]x is smooth at R=0 (where r=r) as a result of (29), and smooth at the splitting point R=δ as a result of (30).

Note that the auxiliary function (28) chosen for the splitting of Re[G(r,r)]σ (σ={x,y,z}) is different from the auxiliary function (16) chosen for the splitting of Re[G(r,r)]. Such difference is due to the fact that, unlike Re[G(r,r)], Re[G(r,r)]σ (σ={x,y,z}) manifest different (i.e., positive or negative) singular behaviors when r approaches r along different directions (e.g., along +x or -x direction), which can be deduced from the definition of (24), or, more intuitively, observed from Fig. 3 and Fig. 4.

images

Figure 2: Systematic diagram of the flow chart and relevant key elements of the presented hybrid fast algorithm. Particularly, the two auxiliary geometric data structures, including the Cartesian grid and the octree grouping, can be constructed independently.

images

Figure 3: The function values of G(r,r) and GP(r,r). The source point r is fixed to the coordinate origin, and the field point r is moving along the x axis.

images

Figure 4: The function values of [G(r,r)]x and [GP(r,r)]x. The source point r is fixed to the coordinate origin, and the field point r is moving along the x axis.

With the splitting form of the Green’s function (13) and its gradient (25), the impedance matrix ZCFIE can be correspondingly split into two parts, yielding

ZCFIE=ZE+ZP, (32)

where ZE is related to GE(r,r) and GE(r,r), while ZP is related to GP(r,r) and GP(r,r). For convenience, ZE and ZP are termed here as the evanescent matrix and the propagating matrix, respectively. When the commonly-used Krylov subspace iterative method [35] is employed to solve the matrix equation (4), the matrix-vector product ZCFIEI is computed. Using the splitting representation (32), this product can be equivalently implemented as

ZCFIEI=ZEI+ZPI. (33)

As will be shown in the following subsections, by fully exploiting the properties of the splitting form of the Green’s function and its gradient, both ZEI and ZPI can be evaluated in an efficient manner. For clarity, before going into further discussion, basic concepts and key elements of the presented hybrid fast algorithm are illustrated in Fig. 2.

C. Propagating matrix calculation

Due to the unique properties (smoothness and translation invariance) of GP(r,r) and GP(r,r), the matrix-vector product ZPI can be calculated accurately and efficiently, with the aid of proper interpolation and theFFT.

To this end, some auxiliary geometric data structures need to be defined first. Specifically, the object considered is enclosed by a three dimensional (3D) rectangular bounding box. Then, this bounding box is subdivided regularly along each Cartesian dimension, yielding a cluster of global Cartesian grids [9, 10, 11, 12, 13, 14]. The grid spacings along these Cartesian dimensions are here denoted by hx, hy, hz, respectively. The number of grids along these Cartesian dimensions are denoted by Nx, Nx, Nx, respectively. Thus, a total of NGrid=Nx×Ny×Nz Cartesian grids are defined.

Recall that GP(r,r) and GP(r,r) obtained from the splitting process are smooth everywhere for r and r. Thus, they can be well approximated using the following Cartesian grid-based sampling expansions

GP(r,r)=uBmvBnβu(r)GP(u,v)βv(r), (34)
GP(r,r)=uBmvBnβu(r)GP(u,v)βv(r), (35)

for r and r, where Bm and Bn denote the corresponding local expansion boxes that enclose r and r, respectively. Note that, depending on the relative position between r and r, Bm and Bn can be either well-separated or overlapping, as illustrated in Fig. 5. Moreover, u and v denote the coordinates of the Cartesian grids related to Bm and Bn, respectively. Furthermore, βu(r) (or βv(r)) denotes the 3D Lagrange interpolation basis function based on the Cartesian grids of Bm (or Bn), which is defined concretely by

βu(r)=βpx(x)βpy(y)βpz(z), (36)

where

βpσ(σ)=pσ=0,pσpσMσ(σ-σpσ)(σpσ-σpσ),σ={x,y,z}, (37)

with Mσ being the expansion order and σpσ being the coordinates of the Cartesian grids involved. Clearly, when Mx=My=Mz=M, a total of (M+1)3 Cartesian grids are involved in (36).

images

Figure 5: Computation corresponding to the propagating matrix ZP can be accelerated by the FFT using the Global Cartesian grid and the local expansion boxes. Typical examples of the expansion boxes are illustrated using different colors. Here, Box 1 and Box 2 are well-separated, while Box 2 and Box 3 are overlapping.

With the expansions (34) and (35), ZPCFIEI can be correspondingly rewritten as

ZPCFIEI=jΦkηΠ¯[GP(u,v)]Π¯TI-jαηkΠd[GP(u,v)]ΠdTI+(1-α)ηΠ¯g[GP(u,v)]×Π¯TI, (38)

where [GP] and [GP]σ are triple block Toeplitz matrices of dimension NGrid×NGrid. Moreover, Π¯, Πd, and Π¯g are sparse matrices of dimension N×NGrid, defined concretely by

Π¯=S[f1(r)f2(r)fN(r)][β1(r),,βNGrid(r)]ds, (39)
Πd=S[f1(r)f2(r)fN(r)][β1(r),,βNGrid(r)]ds, (40)
Π¯g=S[n^×f1(r)n^×f2(r)n^×fN(r)][β1(r),,βNGrid(r)]ds. (41)

Exploiting the block Toeplitz structure of [GP] and [GP]σ, the matrix-vector product ZPI can then be evaluated efficiently by the FFT as follows

ZPCFIEI=jαkηΠ¯-1{{[GP]}{Π¯TI}}-jαηkΠ¯d-1{{[GP]}{Π¯dTI}}+(1-α)ηΠ¯g-1{{[GP]}×{Π¯TI}}, (42)

where and -1 denote the forward and inverse FFT.

Notably, in the conventional pre-correction-based framework [9, 10, 11, 12, 13], the original Green’s function G(r,r) is approximated with the grid-based sampling expansion. Differently, for the pre-splitting-based framework [32, 33] adopted herein, the function GP(r,r) is instead approximated with the grid-based sampling expansion, as in (34). Since GP(r,r) is smooth everywhere without singularity, it is possible to approximate GP(r,r) with sufficiently good accuracy for arbitrary r and r by proper interpolation. Consequently, unlike [9, 10, 11, 12, 13, 31], the Cartesian grid-based near-field corrections are not necessary here.

D. Evanescent matrix calculation

Referring to (14) and (26), we know that ZE, dominated by GE(r,r) and GE(r,r), has nonzero elements only when R<δ. Therefore, with quasi-uniform meshes, ZE is a typical sparse matrix with few nonzero elements, and thus it can be stored without much effort. However, in multiscale problems, dense mesh always occurs. Thus, near-field interactions within R<δ increase substantially. In this case, the nonzero elements of ZE will increase dramatically.

Fortunately, in the pre-splitting-based framework adopted, this difficulty can be trivially addressed. In a nut shell, we first construct a spatial grouping only based on the geometric position of the basis functions, and then apply the matrix compression methods based on such a grouping to achieve memory reductions of ZE.

Here, a standard adaptive octree is utilized for accomplishing the spatial grouping, and the basic procedures for its construction are outlined as follows.

1. A root cube enclosing the whole object is constructed.

2. The nonempty cubes are subdivided into eight sub-cubes recursively until the number of basis functions within each cube is smaller than a prescribed constant.

3. An octree is built when the process finishes.

Accordingly, several concepts related to the octree are then defined as follows.

•Those cubes that do not need to be further subdivided are called leaf cubes. Here, the leaf cubes are denoted by Cs, with s being the index of some leaf cube. It should be noted that not all of the leaf cubes are located at the same octree level. Typical examples of the leaf cubes are shown in Fig. 6 and are highlighted with colors.

• Two leaf cubes (e.g., Cs and Ct) are called neighbor cubes if they share at least one vertex, such as Cube 1 and Cube 5; otherwise, they are called well-separated cubes, such as Cube 1 and Cube 3. For convenience, we use CsCt to denote that two cubes are neighbor cubes, and use CsCt= to denote that two cubes are well-separated cubes.

Note that, in [31], the conventional octree grouping cannot be directly applied, and a more involved grouping scheme is developed, in which the cubes for grouping the basis functions are induced from the Cartesian grids. In contrast, here the basis function grouping does not have to be dependent on the Cartesian grids, and can be implemented independently and conventionally. In other words, the commonly-used adaptive octree can be directly employed to achieve the required groupings, and the treatments of the required auxiliary geometric data structures herein are thus simple and straightforward.

images

Figure 6: The storage of ZE is effectively compressed by using the spatial grouping octree and the matrix compression method. Illustration of the octree grouping is shown in the left subfigure, where some leaf cubes are highlighted using colors. Here, Cube 5 is neighbor cube of Cube 1, while Cube 2, Cube 3, Cube 4 are all well-separated cubes of Cube 1. Besides, the nonzero range of GE(r,r), characterized by R<δ, is illustrated in the right subfigure. Note that the establishment of the grouping and the application of the compression can be realized independent of the Cartesian grids and the FFT.

On the basis of the spatial groupings above, ZE can then be rewritten as

ZE={ZEst}CsCt+{ZEst}CsCt=, (43)

where

{ZEst}CsCt denote those matrix subblocks corresponding to the interactions between neighbor cubes. They are directly computed and stored.

{ZEst}CsCt= denote those matrix subblocks corresponding to the interactions between well-separated cubes. These matrix subblocks can be low-rank. They can be compressed and stored in a compact form.

Here, the ACA [36, 37, 38] is employed for accomplishing the matrix compression. Detailed procedures of the ACA can be found in, for example, [37, 38]. Correspondingly, {ZEst}CsCt= can be then factorized as

{ZEst}CsCt=={UEsVEt}CsCt=. (44)

Thus, the matrix-vector product ZEI can be calculated as

ZEI={ZEst}CsCtI+{ZEst}CsCt=I={ZEstIt}CsCt+{UEs(VEtIt)}CsCt=. (45)

Some implementation details for the compression of {ZEst}CsCt= are further clarified as follows. In particular, as shown in Fig. 6, due to the localized nature of ZE decided by R<δ, for a basis function ϕ in Cube 1, all the basis functions in Cube 2 have full interactions with ϕ, while all the basis functions in Cube 4 have zero interactions with ϕ. Meanwhile, only parts of the basis functions in Cube 3 and Cube 5 have nonzero interactions with ϕ. Hence, for two well separated groups with index sets s and t, only the submatrices representing nonzero interactions is to be compressed. Thus, the matrix ZEst which represents the interactions between the two groups is in fact described more precisely by

ZEst=Pss[ZEstOOO]Qtt, (46)

with subsets ss and tt. Here, Pss and Qtt are permutation matrices, ZEst denotes the matrix subblock corresponding to nonzero interactions. Note that, depending on the relative position between the two cubes considered, there can be many zero interactions in ZEst, due to the sparsity pattern of ZE. Therefore, the matrix compression is in fact applied to ZEst rather than ZEst, and we have

ZEst=UEVE, (47)

where |s| and |t| denote the sizes of index sets s and t, respectively. Meanwhile, the dimension of the matrix UE is |s|×Γ, and the dimension of the matrix VE is Γ×|t|. Here, Γ denotes the compression rank of ZEst for a given tolerance. Ultimately, the compressed matrices UE and VE in (47) are stored, and are applied to the surface current vector I in an efficient manner. For many practical problems that involve multiscale discretization, the matrix block ZEst can be effectively compressed, and the memory consumption for UE and VE are greatly reduced compared to that for ZEst.

III. NUMERICAL RESULTS

In this section, the computational performance of the proposed hybrid algorithm is demonstrated through several numerical examples. The conjugate gradient method (CG) [35] is used as the iterative solver, and the tolerance is set to 1.0E-3. The diagonal preconditioner is used to improve the convergence. Unless otherwise stated, the grid spacings (i.e., hx, hy, hz) for the FFT acceleration are by default set to the common value h= 0.1 λ, and the expansion orders (i.e., Mx, My, Mz) are by default set to the common value M=3. Furthermore, the well-known FFTW [34] is used to perform the FFT. For clarity, in the following, the traditional pre-splitting-FFT-based algorithm which employs the uncompressed ZE will be termed as the PSG-FFT algorithm, and the proposed hybrid fast algorithm (enhanced with the ACA) will be referred to as the PSG-FFT-ACA algorithm.

A. Sphere

images

Figure 7: Mesh configurations of the PEC sphere. The radius of the sphere is 1.0 λ. Here, about a quarter of the spherical surface is discretized with a relatively dense mesh of 0.02 λ, and the rest of the surface is discretized with a regular mesh of 0.1 λ. For clarity, the enlarged view of the multiscale mesh is also visualized in the above.

A PEC sphere of radius 1.0 λ is first considered. The surface of the sphere is discretized with multiscale triangular meshes, resulting in 34,764 RWG functions. In particular, as shown in Fig. 7, about a quarter of the surface is discretized with dense meshes of edge lengths about 0.02 λ, and the rest is discretized with regular meshes of edge lengths about 0.1 λ.

For this example, the standard MoM (without using any acceleration technique), the traditional PSG-FFT algorithm, and the proposed PSG-FFT-ACA algorithm are adopted as the the solution methods. After solving the corresponding matrix equations as formulated in (4), the surface current coefficients (i.e, the solutions of equation (4)) IMoM, IPSG-FFT, and IPSG-FFT-ACA are then obtained, respectively. Here, it should be noticed that both IPSG-FFT and IPSG-FFT-ACA depend on the choice of the pre-splitting threshold δ. However, when the standard MoM is used, δ is not involved, and IMoM is therefore independent of δ. Thus, the solution IMoM can be used as a proper reference, to evaluate the accuracy of IPSG-FFT and IPSG-FFT-ACA for varying δ. The relative errors of the surface current coefficients are defined as follows

=ITest-IReference2IReference2, (48)

where ITest denotes the surface current coefficients to be tested, IReference denotes some specified surface current coefficients to be used as the reference, and 2 denotes the 2-norm of vector. Based on (48), the relative errors of both the traditional PSG-FFT algorithm and the proposed PSG-FFT-ACA algorithm for several different δ are calculated and shown in Fig. 8. We can see that both algorithms can achieve accurate results and show similar accuracy level. In particular, within the range of the δ between 0.15 λ and 0.65 λ, the errors of both algorithms decrease with increasing δ. Furthermore, the accuracies of both algorithms are not strongly sensitive with respect to the δ considered.

images

Figure 8: Relative errors of the surface currents coefficients I with respect to different splitting threshold δ. The relative errors of both the traditional PSG-FFT algorithm and the proposed PSG-FFT-ACA algorithm are illustrated. The surface currents coefficients obtained using the standard MoM are used as the reference.

In the following, the choice of δ is discussed. In principle, δ can be of any value. However, in practice, a moderate value of δ is suggested, due to the following considerations.

• If δ is chosen be relatively large, the number of nonzero elements of the evanescent matrix ZE will then become relatively large. Clearly, in this case, the corresponding computational cost will increase. In Table 1, the computational costs with respect to δ are recorded. It can be observed that the time and memory consumptions of both the traditional PSG-FFT algorithm and the proposed PSG-FFT-ACA algorithm increase as δ increase. Hence, due to efficiency considerations, δ is not suggested to be too large.

• If δ is chosen to be relatively small, the number of nonzero elements of the evanescent matrix ZE will become relatively small, and the computational cost is correspondingly reduced. However, from Fig. 8, it is observed that the accuracy of the solution is slightly reduced when δ becomes smaller. Hence, due to accuracy considerations, δ is not suggested to be too small as well.

Based on the above considerations, the δ between 0.15 λ and 0.65 λ can be proper choice. In practice, by tuning the splitting threshold δ within this range, we can trade off between accuracy and computational cost.

Table 1: Time and memory for the sphere model with different splitting threshold δ

δ(λ) PSG-FFT PSG-FFT-ACA
Memory (GB) CPU Time (m) Memory (GB) CPU Time (m)
0.15(λ) 0.9924 0.6601 0.9917 0.7087
0.25(λ) 1.9707 0.8078 1.6581 0.8167
0.35(λ) 3.1454 1.1445 2.2623 0.8679
0.45(λ) 4.6662 1.2814 2.9167 0.9177
0.55(λ) 6.3642 2.0607 3.5851 0.9812
0.65(λ) 8.3831 1.8004 4.2083 1.0390
0.75(λ) 10.514 2.1802 4.3177 1.0939
0.85(λ) 11.528 2.4548 5.1668 1.2197

Besides, the well-studied IE-FFT algorithm [17], being a typical instance of the pre-correction-based methods, is also used to calculate this example. Similar to the configurations of the pre-splitting-based methods employed, the grid spacing h for the IE-FFT is set to 0.1 λ and the expansion order M is set to 3. Then, using IMoM as the reference, the relative error of the obtained surface current coefficients IIE-FFT is calculated, which is 2.96E-3. Referring to Fig. 8, we can see that such error is close to the error of the PSG-FFT(-ACA) algorithm with δ about 0.45 λ. Moreover, for the IE-FFT algorithm, the memory requirement is 3.09718 G and the CPU time required is 1.08727 m. Referring to Table 1, we can see that such cost is slightly lower than that of the PSG-FFT algorithm with δ about 0.45 λ. From the above numerical results, it can be known that the computational performance of the IE-FFT algorithm during the solution stage is slightly better than that of the PSG-FFT algorithm. However, it should be emphasized that such slight advantage of the IE-FFT algorithm is at the price of the additional grid-based numerical correction operations during the setup (precomputation) stage.

B. Monopole antenna on top of large platform

In the following, a more realistic example is considered, i.e., a monopole antenna on top of a large platform, as shown in Fig. 9. Clearly, this is a typical multiscale structure and often encountered in many practical scenarios. The width and height of the platform are 4.0 λ and 0.3 λ, respectively. Moreover, the width and height of the monopole antenna are 0.06 λ and 0.6 λ, respectively. The object surface is discretized with the multiscale triangular meshes. Specifically, the monopole antenna is discretized with meshes of edge length about 0.01 λ, while the large platform is discretized with meshes of edge length about 0.1 λ. With such mesh, a total of 24,807 RWG functions are generated.

images

Figure 9: Mesh configurations of a small monopole antenna on top of a large platform. A multiscale mesh is used to discretize this object. The monopole antenna is discretized using dense mesh of about 0.01 λ, while the large platform is discretized with regular mesh ofabout 0.1 λ.

images

Figure 10: Relative errors of the surface currents coefficients with respect to different grid spacings h and expansion orders M. The relative errors of the proposed PSG-FFT-ACA algorithm are illustrated. The surface currents coefficients obtained using the standard MoM are used as the reference.

The standard MoM and the proposed PSG-FFT-ACA algorithm are employed to calculate this example. Here, the pre-splitting threshold δ is chosen to be 0.35 λ. Using IMoM as the reference, the relative errors of IPSG-FFT-ACA are calculated for different grid spacings h and different expansion orders M. For clarity, the relative errors are illustrated in Fig. 10. The corresponding time and memory consumptions are recorded in Table 2. It can be clearly seen that, for decreasing grid spacing h and increasing expansion order M, the relative error decreases and the corresponding computational cost increases. In other words, by tuning h and M, we can trade off between accuracy and computational cost.

Table 2: Time and memory for the monopole-platform model with different grid spacing h and expansion order M

h(λ) M=2 M=3
Memory (GB) CPU Time (m) Memory (GB) CPU Time (m)
0.05(λ) 1.0074 0.7863 1.1094 0.8256
0.10(λ) 0.8643 0.2280 0.9662 0.2792
0.15(λ) 0.8311 0.1554 0.9304 0.2106
0.20(λ) 0.8243 0.1155 0.9261 0.1801

C. Cone-shape object

To further study the effectiveness of the matrix compression enhancement, a PEC cone model is then considered, as shown in Fig. 11. The height of the cone is 3.0 λ. The radii of the top and the bottom faces are 0.03 λ and 0.6 λ, respectively. The surface is discretized with a gradually-varied meshes. In particular, three mesh configurations are considered. For the three cases, the edge length for the regular part of the mesh is fixed to 0.1 λ, while the edge lengths for the dense part of the mesh are set to 0.01 λ, 0.005 λ, and 0.0025 λ, respectively. For clarity, the mesh settings considered are illustrated in Fig. 11.

images

Figure 11: Mesh configurations of the PEC cone. A gradually-varied mesh is used to discretize the cone-shape object. The edge length for the regular part of the mesh is 0.1 λ. The edge lengths for the dense part of the mesh are 0.01 λ, 0.005 λ, and 0.0025 λ, respectively. For clarity, the enlarged views are also shown in the above.

Table 3: Time and memory for the cone model of case 1

Algorithm Memory (GB) CPU Time
ZE Total ZEI (s) Total (m)
PSG-FFT 0.3313 0.6286 12.128 0.6955
PSG-FFT-ACA 0.2418 0.4738 8.3050 0.6194

Table 4: Time and memory for the cone model of case 2

Algorithm Memory (GB) CPU Time
ZE Total ZEI (s) Total (m)
PSG-FFT 0.9611 1.6176 69.751 2.3978
PSG-FFT-ACA 0.4935 0.8778 35.127 1.8475

For each of these mesh configurations, we solve the corresponding equations first using the PSG-FFT algorithm which employs the uncompressed ZE. Then, we recalculate these problems using the PSG-FFT-ACA algorithm which compresses ZE with the ACA. Here, the splitting threshold δ is chosen to be 0.35 λ. Notice that, with such configuration of δ, all the interactions corresponding to ZE occur within the subwavelength regime. Thus, the ACA compression can be applied effectively [21, 37]. In Table  3, 4, 5, we tabulate the time and memory consumptions of both the PSG-FFT algorithm and the PSG-FFT-ACA algorithm for the three mesh cases considered. It can be seen that, for all the cases, the PSG-FFT-ACA algorithm shows improved computational performances relative to the PSG-FFT algorithm. Further inspections show that the degree of improvements becomes more prominent when the dense mesh is finer. To make this more clear, we calculate the compression ratios of the evanescent matrix ZE and use this as an indicator to measure the degree of improvements of the matrix compression enhancement. In Fig. 12, the compression ratios with respect to different mesh configurations are illustrated. We can clearly see that the compression ratio becomes higher when the dense mesh becomes finer. That is, the effectiveness of the proposed hybrid algorithm (with matrix compression enhancement) becomes more prominent when the ratio between the regular mesh size and the dense mesh size is larger.

Table 5: Time and memory for the cone model of case 3

Algorithm Memory (GB) CPU Time
ZE Total ZEI (m) Total (m)
PSG-FFT 2.6465 4.2615 181.47 4.4874
PSG-FFT-ACA 1.1579 2.0570 78.678 2.8009

images

Figure 12: Compression ratios of the cone with different mesh configurations.

D. Aircraft model

A relatively large PEC aircraft model with multiscale meshes is finally considered, as shown in Fig. 13. In this model, both the length and the wingspan of the aircraft are about 16 λ. The surface is discretized with triangular meshes, resulting in 120,072 RWG functions. Here, a part of the aircraft surface on the wings is discretized with dense meshes of edge lengths about 0.02 λ, and the rest of the aircraft surface is discretized with regular meshes of edge lengths about 0.1 λ.

The bistatic RCS curves obtained by the PSG-FFT algorithm and the PSG-FFT-ACA algorithm are illustrated in Fig. 14. The pre-splitting threshold δ is here chosen to be 0.35 λ. The bistatic RCS curve obtained by the well-tested IE-FFT algorithm [17] is also shown in the figure and used as the reference to calculate the root mean square error (RMSE). We can see that these curves agree well with each other.

images

Figure 13: Geometry model and corresponding multiscale mesh of the PEC aircraft. Here, a part of the aircraft surface on the wings are discretized with a relatively dense mesh of 0.02 λ, while the rest of the aircraft surface is discretized with a regular mesh of 0.1 λ. The enlarged view of the multiscale mesh is also highlighted in the above.

images

Figure 14: Bistatic RCS curves of the PEC aircraft model. The result obtained by the IE-FFT is used as the reference. The RMSE of the PSG-FFT and the PSG-FFT-ACA are calculated.

The time and memory consumptions of both the PSG-FFT algorithm and the PSG-FFT-ACA algorithm are tabulated in Table  6. From the table, we can see that the memory requirement of the evanescent matrix ZE in the PSG-FFT-ACA algorithm is reduced by about 34.1218%, compared with that of the PSG-FFT algorithm. Meanwhile, the time cost for performing ZEI in the whole solution process is reduced by about 37.52%. Thus, the time and memory cost related to the evanescent matrix ZE can be greatly reduced, as a result of the matrix compression enhancement.

Nevertheless, the total cost of the overall algorithm are determined not only by the cost related to the evanescent matrix ZE, but also determined by the cost related to the propagating matrix ZP. For this relatively large object, the dense mesh region only occupies a relatively small portion of the overall object surface, and the ratio between the edge length (0.1 λ) of the regular mesh and the edge length (0.02 λ) of the dense mesh is here not extremely large. Thus, the time cost for performing ZPI still dominates the time cost for performing ZI, and the reduction of the time cost for performing ZEI therefore only has a minor influence on the total time cost for performing ZI. As a result, although the total time can be reduced due to the reduced cost related to ZE, the reduction of the total time does not look very striking for this example.

Table 6: Time and memory for the aircraft model

Algorithm Memory (GB) CPU Time
ZE Total ZEI (s) Total (m)
PSG-FFT 3.5558 7.7408 76.910 23.580
PSG-FFT-ACA 2.3425 5.4033 48.053 21.431

IV. CONCLUSIONS

In this paper, a hybrid fast algorithm has been established for efficiently analyzing multiscale problems. In this algorithm, the Green’s function is a priori split into a singular but spatially localized part, and a smooth, oscillatory but bandlimited part. Then, the fundamental blocks for acceleration, i.e., the FFT and the matrix compression method (ACA), are applied to these two parts, respectively. Compared with the traditional methods which only employ the FFT for acceleration, the proposed hybrid algorithm can maintain low memory consumption in multiscale cases without compromising time cost, thus manifesting itself as a favorable multiscale extension of the traditional pure FFT-based methods. Furthermore, the required auxiliary geometric data structures herein can be constructed independent of each other, thus permitting the two kinds of acceleration methods to be connected seamlessly in a concise and elegant manner.

On the basis of the present work, several future directions may be explored. First, except for the Lagrange interpolation used in this work, other more accurate interpolation schemes (such as the Gaussian interpolation [15] or the fitting techniques [12]) can be employed to further improve the computational performance of the overall algorithm. Moreover, although only the PEC problems are considered here, extensions to the dielectric problems (especially inhomogeneous cases [16]) are clearly feasible. Furthermore, effective parallelization (especially on heterogeneous architectures [13]) of the proposed hybrid algorithm can be interesting subject of future research.

APPENDIX

The derivative functions of Re[G(R)], which are used in (18) and (19), are given below

ddRRe[G(R)]=-cos(kR)+kRsin(kR)4πR2, (49)
d2dR2Re[G(R)]=-(k2R2-2)cos(kR)-2kRsin(kR)4πR3. (50)

The derivative functions of Re[G˙˙˙(R)], which are used in (30) and (31), are given below

ddRRe[G˙˙˙(R)]=-(k2R2-2)cos(kR)-2kRsin(kR)4πR3, (51)
d2dR2Re[G˙˙˙(R)]=-3(k2R2-2)cos(kR)-kR(k2R2-6)sin(kR)4πR4. (52)

ACKNOWLEDGMENT

This work was supported in part by the Fundamental Research Funds for the Central Universities (No. 2242022k30003), in part by the National Key Research and Development Project (No. 2019YFB1803202), and in part by the National Science Foundation of China (Nos. 62293492, 62131008, 61975177, U20A20164).

REFERENCES

[1] W. C. Chew and L. J. Jiang, “Overview of large-scale computing: the past, the present, and the future,” Proceedings of the IEEE, vol. 101, no. 2, pp. 227-241, Feb. 2013.

[2] M.-K. Li and W. C. Chew, “Multiscale simulation of complex structures using equivalence principle algorithm with high-order field point sampling scheme,” IEEE Trans. Antennas Propag., vol. 56, no. 8, pp. 2389-2397, 2007.

[3] Z.-G. Qian and W. C. Chew, “Fast full-wave surface integral equation solver for multiscale structure modeling,” IEEE Trans. Antennas Propag., vol. 57, no. 11, pp. 3594-3601, Nov. 2009.

[4] J. Zhu, S. Omar, and D. Jiao, “Solution of the electric field integral equation when it breaks down,” IEEE Trans. Antennas Propag., vol. 62, no. 8, pp. 4122-4134, Aug. 2014.

[5] R. F. Harrington, Field Computation by Moment Methods. New York: The MacMillian Co., 1968.

[6] J. M. Song and W. C. Chew, “Multilevel fast-multipole algorithm for solving combined field integral equations of electromagnetic scattering,” Microw. Opt. Technol. Lett., vol. 10, no. 1, pp. 14-19, Sep. 1995.

[7] J.-M. Song, C.-C. Lu, and W. C. Chew, “Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects,” IEEE Trans. Antennas Propag., vol. 45, no. 10, pp. 1488-1493, Oct. 1997.

[8] W. C. Chew, J.-M. Jin, E. Michielssen, and J. Song, Fast and Efficient Algorithms in Computational Electromagnetics. Boston, MA: Artech House, 2001.

[9] M. Bleszynski, E. Bleszynski, and T. Jaroszewicz, “AIM: Adaptive integral method for solving large-scale electromagnetic scattering and radiation problems,” Radio Sci., vol. 31, no. 5, pp. 1225-1251, 1996.

[10] J. R. Phillips and J. K. White, “A precorrected-FFT method for electrostatic analysis of complicated 3-D structures,” IEEE Trans. Computer-Aided Design, vol. 16, no. 10, pp. 1059-1072, Oct. 1997.

[11] S. M. Seo and J.-F. Lee, “A fast IE-FFT algorithm for solving PEC scattering problems,” IEEE Trans. Magn., vol. 41, no. 5, pp. 1476-1479, May2005.

[12] J.-Y. Xie, H.-X. Zhou, W. Hong, W.-D. Li, and G. Hua, “A highly accurate FGG-FG-FFT for the combined field integral equation,” IEEE Trans. Antennas Propag., vol. 61, no. 9, pp. 4641-4652, Sep. 2013.

[13] S. Peng and C.-F. Wang, “Precorrected-FFT method on graphics processing units,” IEEE Trans. Antennas Propag., vol. 61, no. 4, pp. 2099-2107, Apr. 2013.

[14] L.-W. Li, Y.-J. Wang, and E.-P. Li, “MPI-based parallelized precorrected FFT algorithm for analyzing scattering by arbitrarily shaped three-dimensional objects,” Electromagn. Waves Appl., vol. 17, pp. 1489-1491, Jan. 2003.

[15] B. Lai, X. An, H.-B. Yuan, Z.-S. Chen, C.-M. Huang, and C.-H. Liang, “A novel Gaussian interpolation formula-based IE-FFT algorithm for solving EMscattering problems,” Microw. Opt. Tech. Lett., vol. 51, no. 9, pp. 2233-2236, Sep.2009.

[16] X.-C. Nie, L.-W. Li, N. Yuan, T. S. Yeo, and Y.-B. Gan, “Precorrected-FFT solution of the volume integral equation for 3-D inhomogeneous dielectric objects,” IEEE Trans. Antennas Propag., vol. 53, no. 1, pp. 313-320, Jan. 2005.

[17] J.-Y. Xie, H.-X. Zhou, W.-D. Li, and W. Hong, “IE-FFT for the combined field integral equation applied to electrically large objects,” Microw. Opt. Tech. Lett., vol. 54, no. 4, pp. 891-896, Apr.2012.

[18] S. Peng, C.-F. Wang, and J.-F. Lee, “Analyzing PEC scattering structure using an IE-FFT algorithm,” ACES Journal, vol. 24, no. 2, Apr. 2009.

[19] L. J. Jiang and W. C. Chew, “A mixed-form fast multipole algorithm,” IEEE Trans. Antennas Propag., vol. 53, no. 12, pp. 4145-4156, Dec.2005.

[20] X.-M. Pan, J.-G. Wei, Z. Peng, and X.-Q. Sheng, “A fast algorithm for multiscale electromagnetic problems using interpolative decomposition and multilevel fast multipole algorithm,” Radio Sci., vol. 47, no. 1, Feb. 2012.

[21] L.-F. Ma, Z. Nie, J. Hu, and S.-Q. He, “Combined MLFMA-ACA algorithm application to scattering problems with complex and fine structure,” Asia Pacific Microwave Conference, pp. 802-805,2009.

[22] W.-B. Kong, H.-X. Zhou, K.-L. Zheng, and W. Hong, “Analysis of multiscale problems using the MLFMA with the assistance of the FFT-based method,” IEEE Transactions on Antennas and Propagation, vol. 63, no. 9, pp. 4184-4188,2015.

[23] J. S. Zhao and W. C. Chew, “Three dimensional multilevel fast multipole algorithm from static to electrodynamic,” Microw. Opt. Technol. Lett., vol. 26, no. 1, pp. 43-48, 2000.

[24] L. J. Jiang and W. C. Chew, “Low-frequency fast inhomogeneous plane-wave algorithm,” Microw. Opt. Tech. Lett., vol. 40, no. 2, pp. 117-122, Jan. 2004.

[25] T. Xia, L. L. Meng, Q. S. Liu, H. H. Gan, and W. C. Chew, “A low-frequency stable broadband multilevel fast multipole algorithm using plane wave multipole hybridization,” IEEE Trans. Antennas Propag., vol. 66, no. 11, pp. 6137-6145, Nov.2018.

[26] H. Wallen and J. Sarvas, “Translation procedures for broadband MLFMA,” Prog. Electromagn. Res., vol. 55, pp. 47-78, 2005.

[27] D. Wulf and R. Bunger, “An efficient implementation of the combined wideband MLFMA/LF-FIPWA,” IEEE Trans. Antennas Propag., vol. 57, no. 2, pp. 467-474, Feb. 2009.

[28] D. T. Schobert and T. F. Eibert, “Fast integral equation solution by multilevel Green’s function interpolation combined with multilevel fast multipole method,” IEEE Trans. Antennas Propag., vol. 60, no. 9, pp. 4458-4463, Sep. 2012.

[29] I. Bogaert, J. Peeters, and F. Olyslager, “A nondirective plane wave MLFMA stable at low frequencies,” IEEE Trans. Antennas Propag., vol. 56, no. 12, pp. 3752-3767, Dec. 2008.

[30] O. Ergul and B. Karaosmanoglu, “Broadband multilevel fast multipole algorithm based on an approximate diagonalization of the Green’s function,” IEEE Trans. Antennas Propag., vol. 63, no. 7, pp. 3035-3041, July 2015.

[31] W.-B. Kong, H.-X. Zhou, K.-L. Zheng, X. Mu, and W. Hong, “FFT-based method with near-matrix compression,” IEEE Transactions on Antennas and Propagation, vol. 65, no. 11, pp. 5975-5983,2017.

[32] A. Bespalov, “Cost-effective solution of the boundary integral equations for 3D Maxwell problems,” Russ. J. Numer. Anal. Math. Modelling, vol. 14, no. 5, pp. 403-428, 1999.

[33] A. Bespalov, “On the use of a regular grid for implementation of boundary integral methods for wave problems,” Russ. J. Numer. Anal. Math. Modelling, vol. 15, no. 6, pp. 469-488. 2000.

[34] M. Frigo and S. Johnson, FFTW Manual [Online]. Available: http://www.fftw.org/

[35] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Philadelphia, PA, USA: SIAM,1994.

[36] M. Bebendorf and S. Rjasanow, “Adaptive low-rank approximation of collocation matrices,” Computing, vol. 70, no. 1, pp. 1-24, 2003.

[37] K. Zhao, M. N. Vouvakis, and J.-F. Lee, “The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems,” IEEE Trans. Electromag. Compat., vol. 47, no. 4, pp. 763-773, Nov. 2005.

[38] H.-X. Zhou, G.-Y. Zhu, W.-B. Kong, and W. Hong, “An upgraded ACA algorithm in complex field and its statistical analysis,” IEEE Trans. Antennas Propag., vol. 65, no. 5, pp. 2734-2739, May2017.

[39] S. M. Rao, D. R. Wilton, and A. W. Glisson, “Elettromagnetic scattering by surfaces of arbitrary shape,” IEEE Trans. Antennas Propag., vol. 30, no. 3, pp. 409-418, May 1982.

BIOGRAPHIES

images

Guang-Yu Zhu received the B.D. in electrical engineering from Nanjing Agricultural University, Nanjing, China, in 2012, and the Ph.D. degree in radio engineering from Southeast University, Nanjing, China, in 2020, respectively. He is currently a postdoctoral research fellow with the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China. His research interests are in computational electromagnetics with focus on fast algorithms and domain decomposition methods.

images

Wei-Dong Li received the M.S. degree in mathematics and the Ph.D. degree in radio engineering from Southeast University, Nanjing, China, in 2003 and 2007, respectively. From January 2008 to January 2009, he was a visiting scholar with the Technische University Darmstadt, Germany. He is currently an associate professor with the State Key Laboratory of Millimeter Waves and the School of Information Science and Engineering, Southeast University. He has authored or coauthored over 40 technical articles. He serves as a reviewer for the IEEE Transactions on Antennas and Propagation and IET Microwave, Antennas and Propagation. His research interests are in computational EM with focus on integral-equation-based overlapped domain decomposition method, multilevel fast multipole algorithm, fast and accurate inter/extrapolation techniques, and basic functions.

images

Wei E.I. Sha received the B.S. and Ph.D. degrees in electronic engineering from Anhui University, Hefei, China, in 2003 and 2008, respectively. From July 2008 to July 2017, he was a post-doctoral research fellow and then a research assistant professor with the Department of Electrical and Electronic Engineering, University of Hong Kong, Hong Kong. From March 2018 to March 2019, he worked as a Marie Skodowska-Curie Individual Fellow with University College London, London, U.K. From October 2017, he joined the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, where he is currently a tenure-tracked assistant professor. He has authored or coauthored 180 refereed journal articles, 150 conference publications (including five keynote talks and one short course), nine book chapters, and two books. His Google Scholar citation is 8193 with H-index of 45. His research interests include theoretical and computational research in electromagnetics and optics, focusing on the multiphysics and interdisciplinary research. His research involves fundamental and applied aspects in computational and applied electromagnetics, nonlinear and quantum electromagnetics, micro- and nano-optics, optoelectronic device simulation, and multiphysics modeling. Dr. Sha is a member of Optical Society of America (OSA). He served as a reviewer for 60 technical journals and technical program committee member for ten IEEE conferences. He was a recipient of the Applied Computational Electromagnetics Society (ACES) Technical Achievement Award in 2022 and PIERS Young Scientist Award in 2021. In 2015, he was awarded Second Prize of Science and Technology from Anhui Province Government, China. In 2007, he was awarded the Thousand Talents Program for Distinguished Young Scholars of China. He also received six Best Student Paper Prizes and one Young Scientist Award with his students. He also served as an associate editor for IEEE Journal on Multiscale and Multiphysics Computational Techniques, IEEE Open Journal of Antennas and Propagation, and IEEE Access.

images

Hou-Xing Zhou received the M.S. degree in mathematics from Southwest Normal University, Chongqing, China, in 1995, and the Ph.D. degree in radio engineering from Southeast University, Nanjing, China, in 2002. Since 2002, he has been with the State Key Laboratory of Millimeter Waves, Southeast University. He is currently a professor with the School of Information Science and Engineering, Southeast University. He has authored or coauthored 40 journal articles. His main research interests include numerical algorithms in computational EM, including the fast algorithm for spatial domain dyadic Green’s functions of stratified media, the multilevel fast multiple algorithm, the FFT-based fast algorithm, the IE-based domain decomposition method, the FEM-BI-based domain decomposition method, the higher order method of moments, and the parallel computation based on GPU/multicore-CPU platform. In addition, the full-wave simulation technology for the electromagnetic field distribution around the base-station is one of his current interests.

images

Wei Hong received the B.S. degree from the University of Information Engineering, Zhengzhou, China, in 1982, and the M.S. and Ph.D. degrees from Southeast University, Nanjing, China, in 1985 and 1988, respectively, all in radio engineering. Since 1988, he has been with the State Key Laboratory of Millimeter Waves and serves for the director of the lab, since 2003. In 1993, 1995, 1996, 1997, and 1998, he was a short-term visiting scholar with the University of California at Berkeley and at Santa Cruz, respectively. He is currently a professor with the School of Information Science and Engineering, Southeast University. He has been involved in numerical methods for electromagnetic problems, millimeter wave theory and technology, antennas, RF technology for wireless communications, and so on. He has authored and coauthored over 300 technical publications and two books. Dr. Hong was an elected IEEE MTT-S AdCom Member, from 2014 to 2016. He is a fellow of CIE, the vice president of the CIE Microwave Society and Antenna Society, and the chair of the IEEE MTT-S/AP-S/EMC-S Joint Nanjing Chapter. He was twice awarded the National Natural Prizes, thrice awarded the first-class Science and Technology Progress Prizes issued by the Ministry of Education of China and Jiangsu Province Government. He also received the Foundations for China Distinguished Young Investigators and for Innovation Group issued by NSF of China. He served as the associate editor for the IEEE Transactions on Microwave Theory and Techniques from 2007 to 2010, and one of the guest editors for the 5G special issue of the IEEE Transactions on Antennas and Propagation, in 2017.

ABSTRACT

I. INTRODUCTION

II. FORMULATION

A. Combined field integral equation

B. Pre-splitting the Green’s function

images

images

images

images

C. Propagating matrix calculation

images

D. Evanescent matrix calculation

images

III. NUMERICAL RESULTS

A. Sphere

images

images

B. Monopole antenna on top of large platform

images

images

C. Cone-shape object

images

images

D. Aircraft model

images

images

IV. CONCLUSIONS

APPENDIX

ACKNOWLEDGMENT

REFERENCES

BIOGRAPHIES