A Pre-splitting Green’s Function based Hybrid Fast Algorithmfor Multiscale Problems

Guang-Yu Zhu $^{1,2}$ , Wei-Dong Li $^{2}$ , Wei E. I. Sha $^{1}$ , Hou-Xing Zhou $^{2}$ , and Wei Hong $^{2}$

¹College of Information Science and Electronic Engineering
Zhejiang University, Hangzhou, 310027, China
gyzhu_china@163.com
²State Key Laboratory of Millimeter Waves
Frontiers Science Center for Mobile Information Communication and Security
Southeast University, Nanjing, 210096, China

Submitted On: February 19, 2023; Accepted On: July 26, 2023

ABSTRACT

Based on the splitting form of the Green’s function, a hybrid fast algorithm is proposed for efficient analysis of multiscale problems. In this algorithm, the Green’s function is a priori split into two parts: a spectrally band-limited part and a spatially localized part. Then, the fast Fourier transforms (FFT) utilizing the global Cartesian grid and the matrix compression method aided by an adaptive octree grouping are implemented for these two parts, respectively. Compared with the traditional methods which only employ the FFT for acceleration, the proposed hybrid fast algorithm is capable of maintaining low memory consumption in multiscale problems without compromising time cost. Moreover, the proposed algorithm does not need cumbersome geometric treatment to implement the hybridization, and can be established in a concise and straightforward manner. Several numerical examples discretized with multiscale meshes are provided to demonstrate the computational performance of proposed hybrid fast algorithm.

Index Terms: Fast algorithm, fast Fourier transform, Green’s function, matrix compression, multiscale problems.

I. INTRODUCTION

Many real-world electromagnetic (EM) problems require multiscale discretization [1, 2, 3, 4]. When the method of moments (MoM) [5], along with classical fast algorithms, is used to model the multi-scale problems, a single fast algorithm [6, 7, 8, 9, 10, 11, 12] is often insufficient to achieve satisfactory computational performance. Instead, due to capturing both the circuit physics and wave physics [1], hybrid fast algorithms [19, 20, 21, 22] generally provide a more practical path to efficiently solve EM multiscale problems.

As we know, the multilevel fast multipole algorithm (MLFMA) is widely used for efficient simulation of electrically large problems [6, 7, 8]. However, the MLFMA encounters the sub-wavelength breakdown [8] when dense mesh occurs. As a remedy, low-frequency fast algorithms [23, 24] were studied. Correspondingly, a series of hybrid algorithms were developed [25, 26, 27, 28, 29, 30], such as mixed-form FMA [19], interpolative decomposition (ID)-MLFMA [20], and adaptive cross approximation (ACA)-MLFMA [21].

Different from the MLFMA, the pre-correction-fast Fourier transforms (FFT)-based methods [9, 10, 11, 12, 13, 14, 15, 16, 17, 18] (e.g., adaptive integral method (AIM) [9], pre-corrected FFT (P-FFT) [10], and integral equation FFT (IE-FFT) [11]) are free from the sub-wavelength breakdown. Nevertheless, due to the intrinsic uniformity of the global Cartesian grids, the pre-correction-FFT-based methods do not allow for sufficient and varying spatial resolutions to handle multiscale discretizations. That is, by simply tuning the grid spacing, it is hard to achieve high computational efficiency for far interactions and maintain low storage efficiency for the near matrix at the same time [31].

Recently, in order to overcome this shortcoming of the pre-correction-FFT-based methods, a hybrid fast algorithm was developed in [31]. In this algorithm, the pre-correction-FFT-based methods is augmented with the matrix compression method (ACA) [36, 37, 38]. However, to implement such a hybridization, the procedures are not as straightforward as they seem [31]. In particular, a complicated spatial grouping strategy is inherently needed, and the relations between the required auxiliary geometric data structures (such as the Cartesian grid, the expansion box, and the tree-cube) are complex and require cautious treatments [31]. Such a situation motivates us to reconsider the overall framework of the pre-correction-FFT-based methods and wonder if there exists a more elegant way to develop a hybrid algorithm that consists of both the FFT and the matrix compression methods(e.g., ACA).

As an interesting alternative of the well-studied pre-correction-FFT-based methods, some pre-splitting-FFT-based method was developed in [32, 33]. Basically, the pre-splitting-FFT-based method bears similarities with the well-known pre-correction-FFT-based methods, since it also uses the Cartesian grid and the FFT as the essentials for the acceleration of the spatial convolutions. However, different from the widely-used pre-correction-FFT-based methods which typically utilize the Cartesian grid-based near-field corrections (based on some local expansion boxes) to compensate for the near-field contributions, the pre-splitting-FFT-based method [32, 33] instead resorts to some splitting form of the Green’s function so as to realize accurate and efficient computations. Despite of its unique feature, this pre-splitting-based framework has not attracted enough attentions in the computational electromagnetics (CEM) community, and a hybrid fast algorithm based on such framework has never been explored yet.

In this work, based on the splitting form of the Green’s function, a hybrid fast algorithm is proposed for efficiently analyzing multiscale problems. Specifically, the pre-splitting-FFT-based method originally devised for the problems with quasi-uniform discretizations is here further enhanced with the matrix compression method (ACA). Thus, a hybrid fast algorithm using both the FFT and the matrix compression method is established. In particular, the proposed hybrid fast algorithm is capable of maintaining low memory consumption in multiscale problems without compromising time cost, thus manifesting itself as a favorable multiscale extension of the pure FFT-based method. Furthermore, the required auxiliary geometric data structures herein (including the Cartesian grid and the spatial grouping octree) can be constructed independent of each other, and are not intertwined as in [31]. Consequently, the proposed hybrid algorithm can be implemented in a straightforward manner without any cumbersome geometric treatment.

In the following, after introducing the basic formulations and implementation key points in Section 2, several numerical examples discretized with multiscale meshes are then provided in Section 3 to demonstrate the computational performances of the proposed hybrid fast algorithm for multiscale problems.

II. FORMULATION

A perfectly electric conductor (PEC) object illuminated by an incident plane wave is considered. The combined field integral equation (CFIE) is employed to model this typical electromagnetic scattering problem. Throughout the paper, we use $λ$ to denote the wavelength in free space. Moreover, we use $k$ and $η$ to denote the wave number and the wave impedance in free space, respectively.

A. Combined field integral equation

The CFIE is the linear combination of the electric field integral equation (EFIE) and the magnetic field integral equation (MFIE), which is conventionally expressed as

CFIE = α EFIE + (1 - α) η MFIE,

(1)

where $α$ denotes the combination factor and $0 ⩽ α ⩽ 1$ , the EFIE is formulated as

\begin{matrix} {E^{inc} (r) |}_{\tan} = \\ {[j k η \int_{S} (J (r^{'}) + \frac{\nabla \nabla^{'} \cdot}{k^{2}} J (r^{'})) G (r, r^{'}) d s^{'}] |}_{\tan}, \end{matrix}

(2)

and the MFIE is formulated as

\begin{matrix} {H^{inc} (r) |}_{\tan} = \\ [\frac{\hat{n} \times J (r^{'})}{2} + P . V . \int_{S} J (r^{'}) \times \nabla G (r, r^{'}) d s^{'}] |_{\tan} . \end{matrix}

(3)

In the above, $E^{inc}$ and $H^{inc}$ are the incident electric field and the incident magnetic field, respectively. Moreover, $G (r, r^{'})$ denotes the Green’s function with the field point r and the source point $r^{'}$ . Furthermore, $J (r^{'})$ denotes the surface current.

By using the method of moments (MoM) [5] with the RWG function [39], the CFIE can be discretized into a matrix equation

Z^{[N \times N]} \cdot I^{[N \times 1]} = V^{[N \times 1]},

(4)

where Z is the impedance matrix, I is the unknown surface current coefficients vector, V is the righthand side vector of the incident field, and $N$ is the number of unknowns. More specifically, the impedance matrix Z corresponding to the CFIE is described by

Z^{CFIE} = α Z^{EFIE} + (1 - α) η Z^{MFIE},

(5)

where

\begin{matrix} Z_{m n}^{EFIE} = j k η \int_{S_{m}} d s f_{m} (r) \cdot \int_{S_{n}} G (r, r^{'}) f_{n} (r^{'}) d s^{'} \\ - j \frac{η}{k} \int_{S_{m}} d s [\nabla \cdot f_{m} (r)] \cdot \int_{S_{n}} G (r, r^{'}) [\nabla \cdot f_{n} (r^{'})] d s^{'}, \end{matrix}

(6)

and

\begin{matrix} Z_{m n}^{MFIE} = \frac{1}{2} \int_{S_{m}} f_{m} (r) \cdot f_{n} (r) d s \\ + \int_{S_{m}} d s [\hat{n} \times f_{m} (r)] \cdot \int_{S_{n}} \nabla G (r, r^{'}) \times f_{n} (r^{'}) d s^{'} . \end{matrix}

(7)

In the above, $m = 1, \dots, N$ and $n = 1, \dots, N$ . Moreover, $f_{m}$ and $f_{n}$ denote the $m$ th and $n$ th RWG functions, respectively. Furthermore, $S_{m}$ and $S_{n}$ are the supports of the corresponding RWG functions, respectively.

B. Pre-splitting the Green’s function

In the following, the splitting forms of the Green’s function and its gradient are introduced. First, consider the Green’s function in the EFIE, namely

G (r, r^{'}) = \frac{e^{- j k R}}{4 π R},

(8)

where

\begin{matrix} R = | r - r^{'} | \\ = \sqrt{{(x - x^{'})}^{2} + {(y - y^{'})}^{2} + {(z - z^{'})}^{2}} \end{matrix}

(9)

denotes the distance between the field point $r$ and the source point $r^{'}$ . Clearly, the above can also be equivalently represented as

G (r, r^{'}) = Re [G (r, r^{'})] + j Im [G (r, r^{'})],

(10)

where

Re [G (r, r^{'})] = \frac{\cos k R}{4 π R},

(11)

Im [G (r, r^{'})] = - \frac{\sin k R}{4 π R} .

(12)

A basic property of $G (r, r^{'})$ is that, this function is singular for $R = 0$ , and its magnitude is globally nonzero for $R > 0$ . Also, note that the singular behavior of $G (r, r^{'})$ comes from $Re [G (r, r^{'})]$ .

Using the pre-splitting strategy [32, 33], the Green’s function (8) can be written as two parts, namely

G (r, r^{'}) = G_{E} (r, r^{'}) + G_{P} (r, r^{'}),

(13)

where

G_{E} (r, r^{'}) = {\begin{matrix} Re [G (r, r^{'})] - Φ (R), R < δ \\ 0 \begin{matrix}  \end{matrix}, R \geq δ \end{matrix},

(14)

G_{P} (r, r^{'}) = {\begin{matrix} Φ (R) + j Im [G (r, r^{'})], R < δ \\ G (r, r^{'}) \begin{matrix}  \end{matrix}, R \geq δ \end{matrix},

(15)

with $δ$ being a splitting threshold, and $Φ (R)$ being some auxiliary function. Particularly, the above two parts manifest the following properties.

• Function $G_{E} (r, r^{'})$ is a singular and spatially localized function.

It has nonzero function value only when $R < δ$ , and $G_{E} (r, r^{'}) = 0$ for $R \geq δ$ .

• Function $G_{P} (r, r^{'})$ is a globally oscillatory and spectrally band-limited function.

It is smooth and bounded for $\forall r$ and $\forall r^{'}$ .

To have more intuitive understanding on the properties of function $G_{E} (r, r^{'})$ and $G_{P} (r, r^{'})$ , the corresponding function images are shown in Fig. 1.

Moreover, from a physical and spectral perspective, $G_{E} (r, r^{'})$ mainly captures the evanescent waves, and $G_{P} (r, r^{'})$ mainly captures the propagating waves.

Figure 1: The Green’s function is a priori split into a singular but spatially localized part $G_{E}$ and a globally oscillatory but spectrally band-limited part $G_{P}$ . The relevant magnitudes of the original Green’s function and of these two parts are visualized in the figure.

To achieve the above splitting and corresponding properties, the auxiliary function $Φ (R)$ is specially designed and one simple way is to choose $Φ (R)$ to be a polynomial of the following form [32, 33]

Φ (R) = a R^{3} + b R^{2} + c,

(16)

where $a$ , $b$ , and $c$ are unknown coefficients. Clearly, the above is a three order polynomial without the linear term, and it implies

d Φ (0) / d R = 0,

(17)

meaning that the first order derivative function of $Φ (R)$ is zero at $R = 0$ (where $r = r^{'}$ ).

Given the splitting threshold $δ$ , the coefficients in (16) can be determined by some continuity conditions at $R = δ$ as follows

{\begin{matrix} Φ (δ) & = & Re [G (δ)] \\ d Φ (δ) / d R & = & d Re [G (δ)] / d R \\ d^{2} Φ (δ) / d R^{2} & = & d^{2} Re [G (δ)] / d R^{2} \end{matrix},

(18)

or, by solving the following matrix equation

[\begin{matrix} δ^{3} & δ^{2} & 1 \\ 3 δ^{2} & 2 δ & 0 \\ 6 δ & 2 & 0 \end{matrix}] [\begin{matrix} a \\ b \\ c \end{matrix}] = [\begin{matrix} Re [G (δ)] \\ d Re [G (δ)] / d R \\ d^{2} Re [G (δ)] / d R^{2} \end{matrix}] .

(19)

The derivative functions of $Re [G (R)]$ used in the above are given in the appendix. Thus, after $Φ (R)$ is obtained, the splitting representation (13) is completely determined.

The polynomial choice (16), the implicit continuity condition (17), and the explicit continuity condition (18) are based on the consideration that, after the splitting, the resultant $G_{P} (r, r^{'})$ (or, more precisely, $Re [G_{P} (r, r^{'})]$ ) should be smooth everywhere for $\forall r$ and $\forall r^{'}$ , so that it can be interpolated accurately without much difficulty (the interpolation procedures will be discussed in the next subsection).

To make it more clear, without loss of generality, consider the following representative situation. The source point $r^{'}$ is fixed to the coordinate origin, and the field point r is moving along the $x$ axis. Then, the corresponding function values of $G (r, r^{'})$ and $G_{P} (r, r^{'})$ are evaluated, as shown in Fig. 3. We can clearly see that $Re [G (r, r^{'})]$ is singular when r approaches $r^{'}$ , while $Re [G_{P} (r, r^{'})]$ is continuous and smooth everywhere. Here, $Re [G_{P} (r, r^{'})]$ is smooth at $R = 0$ (where $r = r^{'}$ ) as a result of (17), and smooth at the splitting point $R = δ$ as a result of (18).

Next, consider the gradient of the Green’s function in the MFIE, namely

\nabla G (r, r^{'}) = \overset{˙˙˙}{G} (R) \nabla R,

(20)

with

\overset{˙˙˙}{G} (R) = Re [\overset{˙˙˙}{G} (R)] + j Im [\overset{˙˙˙}{G} (R)],

(21)

where

Re [\overset{˙˙˙}{G} (R)] = - \frac{\cos k R + k R \sin k R}{4 π R^{2}},

(22)

Im [\overset{˙˙˙}{G} (R)] = - \frac{k R \cos k R - \sin k R}{4 π R^{2}},

(23)

and

\nabla R = \frac{x - x^{'}}{R} \hat{x} + \frac{y - y^{'}}{R} \hat{y} + \frac{z - z^{'}}{R} \hat{z} .

(24)

Similar to (8), function $\nabla G (r, r^{'})$ is singular for $R = 0$ , and its magnitude is globally nonzero for $R > 0$ . Also, note that the singular behavior of $\nabla G (r, r^{'})$ comes from $Re [\overset{˙˙˙}{G} (r, r^{'})]$ .

Again, using the pre-splitting strategy [32, 33], the gradient Green’s function (20) can be written as two parts, namely

\nabla G (r, r^{'}) = \nabla G_{E} (r, r^{'}) + \nabla G_{P} (r, r^{'}),

(25)

where

\nabla G_{E} (r, r^{'}) = {\begin{matrix} {Re [\overset{˙˙˙}{G} (R)] - Φ_{g} (R)} \nabla R, R < δ \\ 0 \begin{matrix}  \end{matrix}, R ⩾ δ \end{matrix},

(26)

\nabla G_{P} (r, r^{'}) = {\begin{matrix} {Φ_{g} (R) + j Im [\overset{˙˙˙}{G} (R)]} \nabla R, R < δ \\ \nabla G (r, r^{'}) \begin{matrix}  \end{matrix}, R ⩾ δ \end{matrix},

(27)

with $δ$ being the splitting threshold, and $Φ_{g} (R)$ being some auxiliary function.

Here, the following auxiliary function is used to achieve the desired splitting, namely

Φ_{g} (R) = a R^{3} + b R^{2} + c R .

(28)

Note that, different from (16), the above is a three order polynomial without the constant term, and it implies

Φ_{g} (0) = 0,

(29)

which clearly means that the function value of $Φ_{g} (R)$ is zero at $R = 0$ (where $r = r^{'}$ ).

Given the splitting threshold $δ$ , the coefficients in (28) can be determined by some continuity conditions at $R = δ$ as follows

{\begin{matrix} Φ_{g} (δ) & = & Re [\overset{˙˙˙}{G} (δ)] \\ d Φ_{g} (δ) / d R & = & d Re [\overset{˙˙˙}{G} (δ)] / d R \\ d^{2} Φ_{g} (δ) / d R^{2} & = & d^{2} Re [\overset{˙˙˙}{G} (δ)] / d R^{2} \end{matrix},

(30)

or, by solving the following matrix equation

[\begin{matrix} δ^{3} & δ^{2} & δ \\ 3 δ^{2} & 2 δ & 1 \\ 6 δ & 2 & 0 \end{matrix}] [\begin{matrix} a \\ b \\ c \end{matrix}] = [\begin{matrix} Re [\overset{˙˙˙}{G} (δ)] \\ d Re [\overset{˙˙˙}{G} (δ)] / d R \\ d^{2} Re [\overset{˙˙˙}{G} (δ)] / d R^{2} \end{matrix}] .

(31)

The derivative functions of $Re [\overset{˙˙˙}{G} (R)]$ used in the above are given in the appendix. Thus, after $Φ_{g} (R)$ is obtained, the splitting representation (25) is completely determined.

Similar as before, (28), (29), and (30) are based on the consideration that, after the splitting, the resultant $\nabla G_{P} (r, r^{'})$ (or, more precisely, each Cartesian component $Re {[\nabla G_{P} (r, r^{'})]}_{σ}$ , $σ = {x, y, z}$ ) should be smooth everywhere for $\forall r$ and $\forall r^{'}$ , so that it can be interpolated accurately without much difficulty.

To make it more clear, consider the representative situation of $r^{'}$ and r as before. Then, the corresponding function values of ${[\nabla G (r, r^{'})]}_{x}$ and ${[\nabla G_{P} (r, r^{'})]}_{x}$ are evaluated, as shown in Fig. 4. Here, $Re {[\nabla G_{P} (r, r^{'})]}_{x}$ is smooth at $R = 0$ (where $r = r^{'}$ ) as a result of (29), and smooth at the splitting point $R = δ$ as a result of (30).

Note that the auxiliary function (28) chosen for the splitting of $Re {[\nabla G (r, r^{'})]}_{σ}$ ( $σ = {x, y, z}$ ) is different from the auxiliary function (16) chosen for the splitting of $Re [G (r, r^{'})]$ . Such difference is due to the fact that, unlike $Re [G (r, r^{'})]$ , $Re {[\nabla G (r, r^{'})]}_{σ}$ ( $σ = {x, y, z}$ ) manifest different (i.e., positive or negative) singular behaviors when r approaches $r^{'}$ along different directions (e.g., along $+ x$ or $- x$ direction), which can be deduced from the definition of (24), or, more intuitively, observed from Fig. 3 and Fig. 4.

Figure 2: Systematic diagram of the flow chart and relevant key elements of the presented hybrid fast algorithm. Particularly, the two auxiliary geometric data structures, including the Cartesian grid and the octree grouping, can be constructed independently.

Figure 3: The function values of $G (r, r^{'})$ and $G_{P} (r, r^{'})$ . The source point $r^{'}$ is fixed to the coordinate origin, and the field point r is moving along the $x$ axis.

Figure 4: The function values of ${[\nabla G (r, r^{'})]}_{x}$ and ${[\nabla G_{P} (r, r^{'})]}_{x}$ . The source point $r^{'}$ is fixed to the coordinate origin, and the field point r is moving along the $x$ axis.

With the splitting form of the Green’s function (13) and its gradient (25), the impedance matrix $Z^{CFIE}$ can be correspondingly split into two parts, yielding

Z^{CFIE} = Z_{E} + Z_{P},

(32)

where $Z_{E}$ is related to $G_{E} (r, r^{'})$ and $\nabla G_{E} (r, r^{'})$ , while $Z_{P}$ is related to $G_{P} (r, r^{'})$ and $\nabla G_{P} (r, r^{'})$ . For convenience, $Z_{E}$ and $Z_{P}$ are termed here as the evanescent matrix and the propagating matrix, respectively. When the commonly-used Krylov subspace iterative method [35] is employed to solve the matrix equation (4), the matrix-vector product $Z^{CFIE} \cdot I$ is computed. Using the splitting representation (32), this product can be equivalently implemented as

Z^{CFIE} \cdot I = Z_{E} \cdot I + Z_{P} \cdot I .

(33)

As will be shown in the following subsections, by fully exploiting the properties of the splitting form of the Green’s function and its gradient, both $Z_{E} \cdot I$ and $Z_{P} \cdot I$ can be evaluated in an efficient manner. For clarity, before going into further discussion, basic concepts and key elements of the presented hybrid fast algorithm are illustrated in Fig. 2.

C. Propagating matrix calculation

Due to the unique properties (smoothness and translation invariance) of $G_{P} (r, r^{'})$ and $\nabla G_{P} (r, r^{'})$ , the matrix-vector product $Z_{P} \cdot I$ can be calculated accurately and efficiently, with the aid of proper interpolation and theFFT.

To this end, some auxiliary geometric data structures need to be defined first. Specifically, the object considered is enclosed by a three dimensional (3D) rectangular bounding box. Then, this bounding box is subdivided regularly along each Cartesian dimension, yielding a cluster of global Cartesian grids [9, 10, 11, 12, 13, 14]. The grid spacings along these Cartesian dimensions are here denoted by $h_{x}$ , $h_{y}$ , $h_{z}$ , respectively. The number of grids along these Cartesian dimensions are denoted by $N_{x}$ , $N_{x}$ , $N_{x}$ , respectively. Thus, a total of $N_{Grid} = N_{x} \times N_{y} \times N_{z}$ Cartesian grids are defined.

Recall that $G_{P} (r, r^{'})$ and $\nabla G_{P} (r, r^{'})$ obtained from the splitting process are smooth everywhere for $\forall r$ and $\forall r^{'}$ . Thus, they can be well approximated using the following Cartesian grid-based sampling expansions

G_{P} (r, r^{'}) = \sum_{u \in B_{m}} \sum_{v \in B_{n}} β_{u} (r) G_{P} (u, v) β_{v} (r^{'}),

(34)

\nabla G_{P} (r, r^{'}) = \sum_{u \in B_{m}} \sum_{v \in B_{n}} β_{u} (r) \nabla G_{P} (u, v) β_{v} (r^{'}),

(35)

for $\forall r$ and $\forall r^{'}$ , where $B_{m}$ and $B_{n}$ denote the corresponding local expansion boxes that enclose r and $r^{'}$ , respectively. Note that, depending on the relative position between r and $r^{'}$ , $B_{m}$ and $B_{n}$ can be either well-separated or overlapping, as illustrated in Fig. 5. Moreover, u and v denote the coordinates of the Cartesian grids related to $B_{m}$ and $B_{n}$ , respectively. Furthermore, $β_{u} (r)$ (or $β_{v} (r^{'})$ ) denotes the 3D Lagrange interpolation basis function based on the Cartesian grids of $B_{m}$ (or $B_{n}$ ), which is defined concretely by

β_{u} (r) = β_{p_{x}} (x) β_{p_{y}} (y) β_{p_{z}} (z),

(36)

where

β_{p_{σ}} (σ) = \prod_{p_{σ} = 0, p_{σ} \neq p^{'}_{σ}}^{M_{σ}} \frac{(σ - σ_{p^{'}_{σ}})}{(σ_{p_{σ}} - σ_{p^{'}_{σ}})}, σ = {x, y, z},

(37)

with $M_{σ}$ being the expansion order and $σ_{p_{σ}}$ being the coordinates of the Cartesian grids involved. Clearly, when $M_{x} = M_{y} = M_{z} = M$ , a total of ${(M + 1)}^{3}$ Cartesian grids are involved in (36).

Figure 5: Computation corresponding to the propagating matrix $Z_{P}$ can be accelerated by the FFT using the Global Cartesian grid and the local expansion boxes. Typical examples of the expansion boxes are illustrated using different colors. Here, Box 1 and Box 2 are well-separated, while Box 2 and Box 3 are overlapping.

With the expansions (34) and (35), $Z_{P}^{CFIE} \cdot I$ can be correspondingly rewritten as

\begin{matrix} Z_{P}^{CFIE} \cdot I = j Φ k η \bar{Π} \cdot [G_{P} (u, v)] \cdot {\bar{Π}}^{T} \cdot I \\ - j α \frac{η}{k} Π_{d} \cdot [G_{P} (u, v)] \cdot Π_{d}^{T} \cdot I \\ + (1 - α) η {\bar{Π}}_{g} \cdot [\nabla G_{P} (u, v)] \times {\bar{Π}}^{T} \cdot I, \end{matrix}

(38)

where $[G_{P}]$ and ${[\nabla G_{P}]}_{σ}$ are triple block Toeplitz matrices of dimension $N_{Grid} \times N_{Grid}$ . Moreover, $\bar{Π}$ , $Π_{d}$ , and ${\bar{Π}}_{g}$ are sparse matrices of dimension $N \times N_{Grid}$ , defined concretely by

\bar{Π} = \int_{S} [\begin{matrix} f_{1} (r) \\ f_{2} (r) \\ ⋮ \\ f_{N} (r) \end{matrix}] [\begin{matrix} β_{1} (r), & \dots & , β_{N_{Grid}} (r) \end{matrix}] d s,

(39)

Π_{d} = \int_{S} [\begin{matrix} \nabla \cdot f_{1} (r) \\ \nabla \cdot f_{2} (r) \\ ⋮ \\ \nabla \cdot f_{N} (r) \end{matrix}] [\begin{matrix} β_{1} (r), & \dots & , β_{N_{Grid}} (r) \end{matrix}] d s,

(40)

{\bar{Π}}_{g} = \int_{S} [\begin{matrix} \hat{n} \times f_{1} (r) \\ \hat{n} \times f_{2} (r) \\ ⋮ \\ \hat{n} \times f_{N} (r) \end{matrix}] [\begin{matrix} β_{1} (r), & \dots & , β_{N_{Grid}} (r) \end{matrix}] d s .

(41)

Exploiting the block Toeplitz structure of $[G_{P}]$ and ${[\nabla G_{P}]}_{σ}$ , the matrix-vector product $Z_{P} \cdot I$ can then be evaluated efficiently by the FFT as follows

\begin{matrix} Z_{P}^{CFIE} \cdot I = j α k η \bar{Π} \cdot ℱ^{- 1} {ℱ {[G_{P}]} \cdot ℱ {{\bar{Π}}^{T} \cdot I}} \\ - j α \frac{η}{k} {\bar{Π}}_{d} \cdot ℱ^{- 1} {ℱ {[G_{P}]} \cdot ℱ {{\bar{Π}}_{d}^{T} \cdot I}} \\ + (1 - α) η {\bar{Π}}_{g} \cdot ℱ^{- 1} {ℱ {[\nabla G_{P}]} \times ℱ {{\bar{Π}}^{T} \cdot I}}, \end{matrix}

(42)

where $ℱ$ and $ℱ^{- 1}$ denote the forward and inverse FFT.

Notably, in the conventional pre-correction-based framework [9, 10, 11, 12, 13], the original Green’s function $G (r, r^{'})$ is approximated with the grid-based sampling expansion. Differently, for the pre-splitting-based framework [32, 33] adopted herein, the function $G_{P} (r, r^{'})$ is instead approximated with the grid-based sampling expansion, as in (34). Since $G_{P} (r, r^{'})$ is smooth everywhere without singularity, it is possible to approximate $G_{P} (r, r^{'})$ with sufficiently good accuracy for arbitrary r and $r^{'}$ by proper interpolation. Consequently, unlike [9, 10, 11, 12, 13, 31], the Cartesian grid-based near-field corrections are not necessary here.

D. Evanescent matrix calculation

Referring to (14) and (26), we know that $Z_{E}$ , dominated by $G_{E} (r, r^{'})$ and $\nabla G_{E} (r, r^{'})$ , has nonzero elements only when $R < δ$ . Therefore, with quasi-uniform meshes, $Z_{E}$ is a typical sparse matrix with few nonzero elements, and thus it can be stored without much effort. However, in multiscale problems, dense mesh always occurs. Thus, near-field interactions within $R < δ$ increase substantially. In this case, the nonzero elements of $Z_{E}$ will increase dramatically.

Fortunately, in the pre-splitting-based framework adopted, this difficulty can be trivially addressed. In a nut shell, we first construct a spatial grouping only based on the geometric position of the basis functions, and then apply the matrix compression methods based on such a grouping to achieve memory reductions of $Z_{E}$ .

Here, a standard adaptive octree is utilized for accomplishing the spatial grouping, and the basic procedures for its construction are outlined as follows.

1. A root cube enclosing the whole object is constructed.

2. The nonempty cubes are subdivided into eight sub-cubes recursively until the number of basis functions within each cube is smaller than a prescribed constant.

3. An octree is built when the process finishes.

Accordingly, several concepts related to the octree are then defined as follows.

•Those cubes that do not need to be further subdivided are called leaf cubes. Here, the leaf cubes are denoted by $C_{s}$ , with $s$ being the index of some leaf cube. It should be noted that not all of the leaf cubes are located at the same octree level. Typical examples of the leaf cubes are shown in Fig. 6 and are highlighted with colors.

• Two leaf cubes (e.g., $C_{s}$ and $C_{t}$ ) are called neighbor cubes if they share at least one vertex, such as Cube 1 and Cube 5; otherwise, they are called well-separated cubes, such as Cube 1 and Cube 3. For convenience, we use $C_{s} \cap C_{t} \neq \emptyset$ to denote that two cubes are neighbor cubes, and use $C_{s} \cap C_{t} = \emptyset$ to denote that two cubes are well-separated cubes.

Note that, in [31], the conventional octree grouping cannot be directly applied, and a more involved grouping scheme is developed, in which the cubes for grouping the basis functions are induced from the Cartesian grids. In contrast, here the basis function grouping does not have to be dependent on the Cartesian grids, and can be implemented independently and conventionally. In other words, the commonly-used adaptive octree can be directly employed to achieve the required groupings, and the treatments of the required auxiliary geometric data structures herein are thus simple and straightforward.

Figure 6: The storage of $Z_{E}$ is effectively compressed by using the spatial grouping octree and the matrix compression method. Illustration of the octree grouping is shown in the left subfigure, where some leaf cubes are highlighted using colors. Here, Cube 5 is neighbor cube of Cube 1, while Cube 2, Cube 3, Cube 4 are all well-separated cubes of Cube 1. Besides, the nonzero range of $G_{E} (r, r^{'})$ , characterized by $R < δ$ , is illustrated in the right subfigure. Note that the establishment of the grouping and the application of the compression can be realized independent of the Cartesian grids and the FFT.

On the basis of the spatial groupings above, $Z_{E}$ can then be rewritten as

Z_{E} = {Z_{E}^{st}}_{C_{s} \cap C_{t} \neq \emptyset} + {Z_{E}^{st}}_{C_{s} \cap C_{t} = \emptyset},

(43)

where

• ${Z_{E}^{st}}_{C_{s} \cap C_{t} \neq \emptyset}$ denote those matrix subblocks corresponding to the interactions between neighbor cubes. They are directly computed and stored.

• ${Z_{E}^{st}}_{C_{s} \cap C_{t} = \emptyset}$ denote those matrix subblocks corresponding to the interactions between well-separated cubes. These matrix subblocks can be low-rank. They can be compressed and stored in a compact form.

Here, the ACA [36, 37, 38] is employed for accomplishing the matrix compression. Detailed procedures of the ACA can be found in, for example, [37, 38]. Correspondingly, ${Z_{E}^{st}}_{C_{s} \cap C_{t} = \emptyset}$ can be then factorized as

{Z_{E}^{st}}_{C_{s} \cap C_{t} = \emptyset} = {U_{E}^{s} \cdot V_{E}^{t}}_{C_{s} \cap C_{t} = \emptyset} .

(44)

Thus, the matrix-vector product $Z_{E} \cdot I$ can be calculated as

\begin{matrix} Z_{E} \cdot I = {Z_{E}^{st}}_{C_{s} \cap C_{t} \neq \emptyset} \cdot I + {Z_{E}^{st}}_{C_{s} \cap C_{t} = \emptyset} \cdot I \\ = {Z_{E}^{st} \cdot I^{t}}_{C_{s} \cap C_{t} \neq \emptyset} + {U_{E}^{s} \cdot (V_{E}^{t} \cdot I^{t})}_{C_{s} \cap C_{t} = \emptyset} . \end{matrix}

(45)

Some implementation details for the compression of ${Z_{E}^{st}}_{C_{s} \cap C_{t} = \emptyset}$ are further clarified as follows. In particular, as shown in Fig. 6, due to the localized nature of $Z_{E}$ decided by $R < δ$ , for a basis function $ϕ$ in Cube 1, all the basis functions in Cube 2 have full interactions with $ϕ$ , while all the basis functions in Cube 4 have zero interactions with $ϕ$ . Meanwhile, only parts of the basis functions in Cube 3 and Cube 5 have nonzero interactions with $ϕ$ . Hence, for two well separated groups with index sets s and t, only the submatrices representing nonzero interactions is to be compressed. Thus, the matrix $Z_{E}^{st}$ which represents the interactions between the two groups is in fact described more precisely by

Z_{E}^{st} = P^{s s} [\begin{matrix} Z_{E}^{s^{'} t^{'}} & O \\ O & O \end{matrix}] Q^{t t},

(46)

with subsets $s^{'} \subset s$ and $t^{'} \subset t$ . Here, $P^{s s}$ and $Q^{t t}$ are permutation matrices, $Z_{E}^{s^{'} t^{'}}$ denotes the matrix subblock corresponding to nonzero interactions. Note that, depending on the relative position between the two cubes considered, there can be many zero interactions in $Z_{E}^{s t}$ , due to the sparsity pattern of $Z_{E}$ . Therefore, the matrix compression is in fact applied to $Z_{E}^{s^{'} t^{'}}$ rather than $Z_{E}^{s t}$ , and we have

Z_{E}^{s^{'} t^{'}} = U_{E} V_{E},

(47)

where $| s^{'} |$ and $| t^{'} |$ denote the sizes of index sets $s^{'}$ and $t^{'}$ , respectively. Meanwhile, the dimension of the matrix $U_{E}$ is $| s^{'} | \times Γ$ , and the dimension of the matrix $V_{E}$ is $Γ \times | t^{'} |$ . Here, $Γ$ denotes the compression rank of $Z_{E}^{s^{'} t^{'}}$ for a given tolerance. Ultimately, the compressed matrices $U_{E}$ and $V_{E}$ in (47) are stored, and are applied to the surface current vector I in an efficient manner. For many practical problems that involve multiscale discretization, the matrix block $Z_{E}^{s^{'} t^{'}}$ can be effectively compressed, and the memory consumption for $U_{E}$ and $V_{E}$ are greatly reduced compared to that for $Z_{E}^{s t}$ .

III. NUMERICAL RESULTS

In this section, the computational performance of the proposed hybrid algorithm is demonstrated through several numerical examples. The conjugate gradient method (CG) [35] is used as the iterative solver, and the tolerance is set to 1.0E-3. The diagonal preconditioner is used to improve the convergence. Unless otherwise stated, the grid spacings (i.e., $h_{x}$ , $h_{y}$ , $h_{z}$ ) for the FFT acceleration are by default set to the common value $h =$ 0.1 $λ$ , and the expansion orders (i.e., $M_{x}$ , $M_{y}$ , $M_{z}$ ) are by default set to the common value $M = 3$ . Furthermore, the well-known FFTW [34] is used to perform the FFT. For clarity, in the following, the traditional pre-splitting-FFT-based algorithm which employs the uncompressed $Z_{E}$ will be termed as the PSG-FFT algorithm, and the proposed hybrid fast algorithm (enhanced with the ACA) will be referred to as the PSG-FFT-ACA algorithm.

A. Sphere

Figure 7: Mesh configurations of the PEC sphere. The radius of the sphere is 1.0 $λ$ . Here, about a quarter of the spherical surface is discretized with a relatively dense mesh of 0.02 $λ$ , and the rest of the surface is discretized with a regular mesh of 0.1 $λ$ . For clarity, the enlarged view of the multiscale mesh is also visualized in the above.

A PEC sphere of radius 1.0 $λ$ is first considered. The surface of the sphere is discretized with multiscale triangular meshes, resulting in 34,764 RWG functions. In particular, as shown in Fig. 7, about a quarter of the surface is discretized with dense meshes of edge lengths about 0.02 $λ$ , and the rest is discretized with regular meshes of edge lengths about 0.1 $λ$ .

For this example, the standard MoM (without using any acceleration technique), the traditional PSG-FFT algorithm, and the proposed PSG-FFT-ACA algorithm are adopted as the the solution methods. After solving the corresponding matrix equations as formulated in (4), the surface current coefficients (i.e, the solutions of equation (4)) $I^{MoM}$ , $I^{PSG-FFT}$ , and $I^{PSG-FFT-ACA}$ are then obtained, respectively. Here, it should be noticed that both $I^{PSG-FFT}$ and $I^{PSG-FFT-ACA}$ depend on the choice of the pre-splitting threshold $δ$ . However, when the standard MoM is used, $δ$ is not involved, and $I^{MoM}$ is therefore independent of $δ$ . Thus, the solution $I^{MoM}$ can be used as a proper reference, to evaluate the accuracy of $I^{PSG-FFT}$ and $I^{PSG-FFT-ACA}$ for varying $δ$ . The relative errors of the surface current coefficients are defined as follows

ℰ = \frac{{∥ I^{Test} - I^{Reference} ∥}_{2}}{{∥ I^{Reference} ∥}_{2}},

(48)

where $I^{Test}$ denotes the surface current coefficients to be tested, $I^{Reference}$ denotes some specified surface current coefficients to be used as the reference, and $∥ \cdot ∥_{2}$ denotes the 2-norm of vector. Based on (48), the relative errors of both the traditional PSG-FFT algorithm and the proposed PSG-FFT-ACA algorithm for several different $δ$ are calculated and shown in Fig. 8. We can see that both algorithms can achieve accurate results and show similar accuracy level. In particular, within the range of the $δ$ between 0.15 $λ$ and 0.65 $λ$ , the errors of both algorithms decrease with increasing $δ$ . Furthermore, the accuracies of both algorithms are not strongly sensitive with respect to the $δ$ considered.

Figure 8: Relative errors $ℰ$ of the surface currents coefficients I with respect to different splitting threshold $δ$ . The relative errors of both the traditional PSG-FFT algorithm and the proposed PSG-FFT-ACA algorithm are illustrated. The surface currents coefficients obtained using the standard MoM are used as the reference.

In the following, the choice of $δ$ is discussed. In principle, $δ$ can be of any value. However, in practice, a moderate value of $δ$ is suggested, due to the following considerations.

• If $δ$ is chosen be relatively large, the number of nonzero elements of the evanescent matrix $Z_{E}$ will then become relatively large. Clearly, in this case, the corresponding computational cost will increase. In Table 1, the computational costs with respect to $δ$ are recorded. It can be observed that the time and memory consumptions of both the traditional PSG-FFT algorithm and the proposed PSG-FFT-ACA algorithm increase as $δ$ increase. Hence, due to efficiency considerations, $δ$ is not suggested to be too large.

• If $δ$ is chosen to be relatively small, the number of nonzero elements of the evanescent matrix $Z_{E}$ will become relatively small, and the computational cost is correspondingly reduced. However, from Fig. 8, it is observed that the accuracy of the solution is slightly reduced when $δ$ becomes smaller. Hence, due to accuracy considerations, $δ$ is not suggested to be too small as well.

Based on the above considerations, the $δ$ between 0.15 $λ$ and 0.65 $λ$ can be proper choice. In practice, by tuning the splitting threshold $δ$ within this range, we can trade off between accuracy and computational cost.

Table 1: Time and memory for the sphere model with different splitting threshold $δ$

$δ (λ)$	PSG-FFT		PSG-FFT-ACA
$δ (λ)$		Memory (GB)	CPU Time $(m)$	Memory (GB)	CPU Time $(m)$
$0.15 (λ)$	0.9924	0.6601	0.9917	0.7087
$0.25 (λ)$	1.9707	0.8078	1.6581	0.8167
$0.35 (λ)$	3.1454	1.1445	2.2623	0.8679
$0.45 (λ)$	4.6662	1.2814	2.9167	0.9177
$0.55 (λ)$	6.3642	2.0607	3.5851	0.9812
$0.65 (λ)$	8.3831	1.8004	4.2083	1.0390
$0.75 (λ)$	10.514	2.1802	4.3177	1.0939
$0.85 (λ)$	11.528	2.4548	5.1668	1.2197

Besides, the well-studied IE-FFT algorithm [17], being a typical instance of the pre-correction-based methods, is also used to calculate this example. Similar to the configurations of the pre-splitting-based methods employed, the grid spacing $h$ for the IE-FFT is set to 0.1 $λ$ and the expansion order $M$ is set to 3. Then, using $I^{MoM}$ as the reference, the relative error of the obtained surface current coefficients $I^{IE-FFT}$ is calculated, which is 2.96E-3. Referring to Fig. 8, we can see that such error is close to the error of the PSG-FFT(-ACA) algorithm with $δ$ about 0.45 $λ$ . Moreover, for the IE-FFT algorithm, the memory requirement is $3.09718$ G and the CPU time required is $1.08727$ m. Referring to Table 1, we can see that such cost is slightly lower than that of the PSG-FFT algorithm with $δ$ about 0.45 $λ$ . From the above numerical results, it can be known that the computational performance of the IE-FFT algorithm during the solution stage is slightly better than that of the PSG-FFT algorithm. However, it should be emphasized that such slight advantage of the IE-FFT algorithm is at the price of the additional grid-based numerical correction operations during the setup (precomputation) stage.

B. Monopole antenna on top of large platform

In the following, a more realistic example is considered, i.e., a monopole antenna on top of a large platform, as shown in Fig. 9. Clearly, this is a typical multiscale structure and often encountered in many practical scenarios. The width and height of the platform are 4.0 $λ$ and 0.3 $λ$ , respectively. Moreover, the width and height of the monopole antenna are 0.06 $λ$ and 0.6 $λ$ , respectively. The object surface is discretized with the multiscale triangular meshes. Specifically, the monopole antenna is discretized with meshes of edge length about 0.01 $λ$ , while the large platform is discretized with meshes of edge length about 0.1 $λ$ . With such mesh, a total of 24,807 RWG functions are generated.

Figure 9: Mesh configurations of a small monopole antenna on top of a large platform. A multiscale mesh is used to discretize this object. The monopole antenna is discretized using dense mesh of about 0.01 $λ$ , while the large platform is discretized with regular mesh ofabout 0.1 $λ$ .

Figure 10: Relative errors $ℰ$ of the surface currents coefficients with respect to different grid spacings $h$ and expansion orders $M$ . The relative errors of the proposed PSG-FFT-ACA algorithm are illustrated. The surface currents coefficients obtained using the standard MoM are used as the reference.

The standard MoM and the proposed PSG-FFT-ACA algorithm are employed to calculate this example. Here, the pre-splitting threshold $δ$ is chosen to be 0.35 $λ$ . Using $I^{MoM}$ as the reference, the relative errors of $I^{PSG-FFT-ACA}$ are calculated for different grid spacings $h$ and different expansion orders $M$ . For clarity, the relative errors are illustrated in Fig. 10. The corresponding time and memory consumptions are recorded in Table 2. It can be clearly seen that, for decreasing grid spacing $h$ and increasing expansion order $M$ , the relative error decreases and the corresponding computational cost increases. In other words, by tuning $h$ and $M$ , we can trade off between accuracy and computational cost.

Table 2: Time and memory for the monopole-platform model with different grid spacing $h$ and expansion order $M$

$h (λ)$	$M = 2$		$M = 3$
$h (λ)$		Memory (GB)	CPU Time $(m)$	Memory (GB)	CPU Time $(m)$
$0.05 (λ)$	1.0074	0.7863	1.1094	0.8256
$0.10 (λ)$	0.8643	0.2280	0.9662	0.2792
$0.15 (λ)$	0.8311	0.1554	0.9304	0.2106
$0.20 (λ)$	0.8243	0.1155	0.9261	0.1801

C. Cone-shape object

To further study the effectiveness of the matrix compression enhancement, a PEC cone model is then considered, as shown in Fig. 11. The height of the cone is 3.0 $λ$ . The radii of the top and the bottom faces are 0.03 $λ$ and 0.6 $λ$ , respectively. The surface is discretized with a gradually-varied meshes. In particular, three mesh configurations are considered. For the three cases, the edge length for the regular part of the mesh is fixed to 0.1 $λ$ , while the edge lengths for the dense part of the mesh are set to 0.01 $λ$ , 0.005 $λ$ , and 0.0025 $λ$ , respectively. For clarity, the mesh settings considered are illustrated in Fig. 11.

Figure 11: Mesh configurations of the PEC cone. A gradually-varied mesh is used to discretize the cone-shape object. The edge length for the regular part of the mesh is 0.1 $λ$ . The edge lengths for the dense part of the mesh are 0.01 $λ$ , 0.005 $λ$ , and 0.0025 $λ$ , respectively. For clarity, the enlarged views are also shown in the above.

Table 3: Time and memory for the cone model of case 1

Algorithm	Memory (GB)		CPU Time
Algorithm		$Z_{E}$	Total	$Z_{E} \cdot I$ $(s)$	Total $(m)$
PSG-FFT	0.3313	0.6286	12.128	0.6955
PSG-FFT-ACA	0.2418	0.4738	8.3050	0.6194

Table 4: Time and memory for the cone model of case 2

Algorithm	Memory (GB)		CPU Time
Algorithm		$Z_{E}$	Total	$Z_{E} \cdot I$ $(s)$	Total $(m)$
PSG-FFT	0.9611	1.6176	69.751	2.3978
PSG-FFT-ACA	0.4935	0.8778	35.127	1.8475

For each of these mesh configurations, we solve the corresponding equations first using the PSG-FFT algorithm which employs the uncompressed $Z_{E}$ . Then, we recalculate these problems using the PSG-FFT-ACA algorithm which compresses $Z_{E}$ with the ACA. Here, the splitting threshold $δ$ is chosen to be 0.35 $λ$ . Notice that, with such configuration of $δ$ , all the interactions corresponding to $Z_{E}$ occur within the subwavelength regime. Thus, the ACA compression can be applied effectively [21, 37]. In Table 3, 4, 5, we tabulate the time and memory consumptions of both the PSG-FFT algorithm and the PSG-FFT-ACA algorithm for the three mesh cases considered. It can be seen that, for all the cases, the PSG-FFT-ACA algorithm shows improved computational performances relative to the PSG-FFT algorithm. Further inspections show that the degree of improvements becomes more prominent when the dense mesh is finer. To make this more clear, we calculate the compression ratios of the evanescent matrix $Z_{E}$ and use this as an indicator to measure the degree of improvements of the matrix compression enhancement. In Fig. 12, the compression ratios with respect to different mesh configurations are illustrated. We can clearly see that the compression ratio becomes higher when the dense mesh becomes finer. That is, the effectiveness of the proposed hybrid algorithm (with matrix compression enhancement) becomes more prominent when the ratio between the regular mesh size and the dense mesh size is larger.

Table 5: Time and memory for the cone model of case 3

Algorithm	Memory (GB)		CPU Time
Algorithm		$Z_{E}$	Total	$Z_{E} \cdot I$ $(m)$	Total $(m)$
PSG-FFT	2.6465	4.2615	181.47	4.4874
PSG-FFT-ACA	1.1579	2.0570	78.678	2.8009

Figure 12: Compression ratios of the cone with different mesh configurations.

D. Aircraft model

A relatively large PEC aircraft model with multiscale meshes is finally considered, as shown in Fig. 13. In this model, both the length and the wingspan of the aircraft are about 16 $λ$ . The surface is discretized with triangular meshes, resulting in 120,072 RWG functions. Here, a part of the aircraft surface on the wings is discretized with dense meshes of edge lengths about 0.02 $λ$ , and the rest of the aircraft surface is discretized with regular meshes of edge lengths about 0.1 $λ$ .

The bistatic RCS curves obtained by the PSG-FFT algorithm and the PSG-FFT-ACA algorithm are illustrated in Fig. 14. The pre-splitting threshold $δ$ is here chosen to be 0.35 $λ$ . The bistatic RCS curve obtained by the well-tested IE-FFT algorithm [17] is also shown in the figure and used as the reference to calculate the root mean square error (RMSE). We can see that these curves agree well with each other.

Figure 13: Geometry model and corresponding multiscale mesh of the PEC aircraft. Here, a part of the aircraft surface on the wings are discretized with a relatively dense mesh of 0.02 $λ$ , while the rest of the aircraft surface is discretized with a regular mesh of 0.1 $λ$ . The enlarged view of the multiscale mesh is also highlighted in the above.

Figure 14: Bistatic RCS curves of the PEC aircraft model. The result obtained by the IE-FFT is used as the reference. The RMSE of the PSG-FFT and the PSG-FFT-ACA are calculated.

The time and memory consumptions of both the PSG-FFT algorithm and the PSG-FFT-ACA algorithm are tabulated in Table 6. From the table, we can see that the memory requirement of the evanescent matrix $Z_{E}$ in the PSG-FFT-ACA algorithm is reduced by about $34.1218 %$ , compared with that of the PSG-FFT algorithm. Meanwhile, the time cost for performing $Z_{E} \cdot I$ in the whole solution process is reduced by about $37.52 %$ . Thus, the time and memory cost related to the evanescent matrix $Z_{E}$ can be greatly reduced, as a result of the matrix compression enhancement.

Nevertheless, the total cost of the overall algorithm are determined not only by the cost related to the evanescent matrix $Z_{E}$ , but also determined by the cost related to the propagating matrix $Z_{P}$ . For this relatively large object, the dense mesh region only occupies a relatively small portion of the overall object surface, and the ratio between the edge length (0.1 $λ$ ) of the regular mesh and the edge length (0.02 $λ$ ) of the dense mesh is here not extremely large. Thus, the time cost for performing $Z_{P} \cdot I$ still dominates the time cost for performing $Z \cdot I$ , and the reduction of the time cost for performing $Z_{E} \cdot I$ therefore only has a minor influence on the total time cost for performing $Z \cdot I$ . As a result, although the total time can be reduced due to the reduced cost related to $Z_{E}$ , the reduction of the total time does not look very striking for this example.

Table 6: Time and memory for the aircraft model

Algorithm	Memory (GB)		CPU Time
Algorithm		$Z_{E}$	Total	$Z_{E} \cdot I$ $(s)$	Total $(m)$
PSG-FFT	3.5558	7.7408	76.910	23.580
PSG-FFT-ACA	2.3425	5.4033	48.053	21.431

IV. CONCLUSIONS

In this paper, a hybrid fast algorithm has been established for efficiently analyzing multiscale problems. In this algorithm, the Green’s function is a priori split into a singular but spatially localized part, and a smooth, oscillatory but bandlimited part. Then, the fundamental blocks for acceleration, i.e., the FFT and the matrix compression method (ACA), are applied to these two parts, respectively. Compared with the traditional methods which only employ the FFT for acceleration, the proposed hybrid algorithm can maintain low memory consumption in multiscale cases without compromising time cost, thus manifesting itself as a favorable multiscale extension of the traditional pure FFT-based methods. Furthermore, the required auxiliary geometric data structures herein can be constructed independent of each other, thus permitting the two kinds of acceleration methods to be connected seamlessly in a concise and elegant manner.

On the basis of the present work, several future directions may be explored. First, except for the Lagrange interpolation used in this work, other more accurate interpolation schemes (such as the Gaussian interpolation [15] or the fitting techniques [12]) can be employed to further improve the computational performance of the overall algorithm. Moreover, although only the PEC problems are considered here, extensions to the dielectric problems (especially inhomogeneous cases [16]) are clearly feasible. Furthermore, effective parallelization (especially on heterogeneous architectures [13]) of the proposed hybrid algorithm can be interesting subject of future research.

APPENDIX

The derivative functions of $Re [G (R)]$ , which are used in (18) and (19), are given below

\frac{d}{d R} Re [G (R)] = - \frac{\cos (k R) + k R \sin (k R)}{4 π R^{2}},

(49)

\begin{matrix} \frac{d^{2}}{d R^{2}} Re [G (R)] = \\ - \frac{(k^{2} R^{2} - 2) \cos (k R) - 2 k R \sin (k R)}{4 π R^{3}} . \end{matrix}

(50)

The derivative functions of $Re [\overset{˙˙˙}{G} (R)]$ , which are used in (30) and (31), are given below

\begin{matrix} \frac{d}{d R} Re [\overset{˙˙˙}{G} (R)] = \\ - \frac{(k^{2} R^{2} - 2) \cos (k R) - 2 k R \sin (k R)}{4 π R^{3}}, \end{matrix}

(51)

\begin{matrix} \frac{d^{2}}{d R^{2}} Re [\overset{˙˙˙}{G} (R)] = \\ - \frac{3 (k^{2} R^{2} - 2) \cos (k R) - k R (k^{2} R^{2} - 6) \sin (k R)}{4 π R^{4}} . \end{matrix}

(52)

ACKNOWLEDGMENT

This work was supported in part by the Fundamental Research Funds for the Central Universities (No. 2242022k30003), in part by the National Key Research and Development Project (No. 2019YFB1803202), and in part by the National Science Foundation of China (Nos. 62293492, 62131008, 61975177, U20A20164).

REFERENCES

[1] W. C. Chew and L. J. Jiang, “Overview of large-scale computing: the past, the present, and the future,” Proceedings of the IEEE, vol. 101, no. 2, pp. 227-241, Feb. 2013.

[2] M.-K. Li and W. C. Chew, “Multiscale simulation of complex structures using equivalence principle algorithm with high-order field point sampling scheme,” IEEE Trans. Antennas Propag., vol. 56, no. 8, pp. 2389-2397, 2007.

[3] Z.-G. Qian and W. C. Chew, “Fast full-wave surface integral equation solver for multiscale structure modeling,” IEEE Trans. Antennas Propag., vol. 57, no. 11, pp. 3594-3601, Nov. 2009.

[4] J. Zhu, S. Omar, and D. Jiao, “Solution of the electric field integral equation when it breaks down,” IEEE Trans. Antennas Propag., vol. 62, no. 8, pp. 4122-4134, Aug. 2014.

[5] R. F. Harrington, Field Computation by Moment Methods. New York: The MacMillian Co., 1968.

[6] J. M. Song and W. C. Chew, “Multilevel fast-multipole algorithm for solving combined field integral equations of electromagnetic scattering,” Microw. Opt. Technol. Lett., vol. 10, no. 1, pp. 14-19, Sep. 1995.

[7] J.-M. Song, C.-C. Lu, and W. C. Chew, “Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects,” IEEE Trans. Antennas Propag., vol. 45, no. 10, pp. 1488-1493, Oct. 1997.

[8] W. C. Chew, J.-M. Jin, E. Michielssen, and J. Song, Fast and Efficient Algorithms in Computational Electromagnetics. Boston, MA: Artech House, 2001.

[9] M. Bleszynski, E. Bleszynski, and T. Jaroszewicz, “AIM: Adaptive integral method for solving large-scale electromagnetic scattering and radiation problems,” Radio Sci., vol. 31, no. 5, pp. 1225-1251, 1996.

[10] J. R. Phillips and J. K. White, “A precorrected-FFT method for electrostatic analysis of complicated 3-D structures,” IEEE Trans. Computer-Aided Design, vol. 16, no. 10, pp. 1059-1072, Oct. 1997.

[11] S. M. Seo and J.-F. Lee, “A fast IE-FFT algorithm for solving PEC scattering problems,” IEEE Trans. Magn., vol. 41, no. 5, pp. 1476-1479, May2005.

[12] J.-Y. Xie, H.-X. Zhou, W. Hong, W.-D. Li, and G. Hua, “A highly accurate FGG-FG-FFT for the combined field integral equation,” IEEE Trans. Antennas Propag., vol. 61, no. 9, pp. 4641-4652, Sep. 2013.

[13] S. Peng and C.-F. Wang, “Precorrected-FFT method on graphics processing units,” IEEE Trans. Antennas Propag., vol. 61, no. 4, pp. 2099-2107, Apr. 2013.

[14] L.-W. Li, Y.-J. Wang, and E.-P. Li, “MPI-based parallelized precorrected FFT algorithm for analyzing scattering by arbitrarily shaped three-dimensional objects,” Electromagn. Waves Appl., vol. 17, pp. 1489-1491, Jan. 2003.

[15] B. Lai, X. An, H.-B. Yuan, Z.-S. Chen, C.-M. Huang, and C.-H. Liang, “A novel Gaussian interpolation formula-based IE-FFT algorithm for solving EMscattering problems,” Microw. Opt. Tech. Lett., vol. 51, no. 9, pp. 2233-2236, Sep.2009.

[16] X.-C. Nie, L.-W. Li, N. Yuan, T. S. Yeo, and Y.-B. Gan, “Precorrected-FFT solution of the volume integral equation for 3-D inhomogeneous dielectric objects,” IEEE Trans. Antennas Propag., vol. 53, no. 1, pp. 313-320, Jan. 2005.

[17] J.-Y. Xie, H.-X. Zhou, W.-D. Li, and W. Hong, “IE-FFT for the combined field integral equation applied to electrically large objects,” Microw. Opt. Tech. Lett., vol. 54, no. 4, pp. 891-896, Apr.2012.

[18] S. Peng, C.-F. Wang, and J.-F. Lee, “Analyzing PEC scattering structure using an IE-FFT algorithm,” ACES Journal, vol. 24, no. 2, Apr. 2009.

[19] L. J. Jiang and W. C. Chew, “A mixed-form fast multipole algorithm,” IEEE Trans. Antennas Propag., vol. 53, no. 12, pp. 4145-4156, Dec.2005.

[20] X.-M. Pan, J.-G. Wei, Z. Peng, and X.-Q. Sheng, “A fast algorithm for multiscale electromagnetic problems using interpolative decomposition and multilevel fast multipole algorithm,” Radio Sci., vol. 47, no. 1, Feb. 2012.

[21] L.-F. Ma, Z. Nie, J. Hu, and S.-Q. He, “Combined MLFMA-ACA algorithm application to scattering problems with complex and fine structure,” Asia Pacific Microwave Conference, pp. 802-805,2009.

[22] W.-B. Kong, H.-X. Zhou, K.-L. Zheng, and W. Hong, “Analysis of multiscale problems using the MLFMA with the assistance of the FFT-based method,” IEEE Transactions on Antennas and Propagation, vol. 63, no. 9, pp. 4184-4188,2015.

[23] J. S. Zhao and W. C. Chew, “Three dimensional multilevel fast multipole algorithm from static to electrodynamic,” Microw. Opt. Technol. Lett., vol. 26, no. 1, pp. 43-48, 2000.

[24] L. J. Jiang and W. C. Chew, “Low-frequency fast inhomogeneous plane-wave algorithm,” Microw. Opt. Tech. Lett., vol. 40, no. 2, pp. 117-122, Jan. 2004.

[25] T. Xia, L. L. Meng, Q. S. Liu, H. H. Gan, and W. C. Chew, “A low-frequency stable broadband multilevel fast multipole algorithm using plane wave multipole hybridization,” IEEE Trans. Antennas Propag., vol. 66, no. 11, pp. 6137-6145, Nov.2018.

[26] H. Wallen and J. Sarvas, “Translation procedures for broadband MLFMA,” Prog. Electromagn. Res., vol. 55, pp. 47-78, 2005.

[27] D. Wulf and R. Bunger, “An efficient implementation of the combined wideband MLFMA/LF-FIPWA,” IEEE Trans. Antennas Propag., vol. 57, no. 2, pp. 467-474, Feb. 2009.

[28] D. T. Schobert and T. F. Eibert, “Fast integral equation solution by multilevel Green’s function interpolation combined with multilevel fast multipole method,” IEEE Trans. Antennas Propag., vol. 60, no. 9, pp. 4458-4463, Sep. 2012.

[29] I. Bogaert, J. Peeters, and F. Olyslager, “A nondirective plane wave MLFMA stable at low frequencies,” IEEE Trans. Antennas Propag., vol. 56, no. 12, pp. 3752-3767, Dec. 2008.

[30] O. Ergul and B. Karaosmanoglu, “Broadband multilevel fast multipole algorithm based on an approximate diagonalization of the Green’s function,” IEEE Trans. Antennas Propag., vol. 63, no. 7, pp. 3035-3041, July 2015.

[31] W.-B. Kong, H.-X. Zhou, K.-L. Zheng, X. Mu, and W. Hong, “FFT-based method with near-matrix compression,” IEEE Transactions on Antennas and Propagation, vol. 65, no. 11, pp. 5975-5983,2017.

[32] A. Bespalov, “Cost-effective solution of the boundary integral equations for 3D Maxwell problems,” Russ. J. Numer. Anal. Math. Modelling, vol. 14, no. 5, pp. 403-428, 1999.

[33] A. Bespalov, “On the use of a regular grid for implementation of boundary integral methods for wave problems,” Russ. J. Numer. Anal. Math. Modelling, vol. 15, no. 6, pp. 469-488. 2000.

[34] M. Frigo and S. Johnson, FFTW Manual [Online]. Available: http://www.fftw.org/

[35] R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. Van der Vorst, Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Philadelphia, PA, USA: SIAM,1994.

[36] M. Bebendorf and S. Rjasanow, “Adaptive low-rank approximation of collocation matrices,” Computing, vol. 70, no. 1, pp. 1-24, 2003.

[37] K. Zhao, M. N. Vouvakis, and J.-F. Lee, “The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems,” IEEE Trans. Electromag. Compat., vol. 47, no. 4, pp. 763-773, Nov. 2005.

[38] H.-X. Zhou, G.-Y. Zhu, W.-B. Kong, and W. Hong, “An upgraded ACA algorithm in complex field and its statistical analysis,” IEEE Trans. Antennas Propag., vol. 65, no. 5, pp. 2734-2739, May2017.

[39] S. M. Rao, D. R. Wilton, and A. W. Glisson, “Elettromagnetic scattering by surfaces of arbitrary shape,” IEEE Trans. Antennas Propag., vol. 30, no. 3, pp. 409-418, May 1982.

BIOGRAPHIES

Guang-Yu Zhu received the B.D. in electrical engineering from Nanjing Agricultural University, Nanjing, China, in 2012, and the Ph.D. degree in radio engineering from Southeast University, Nanjing, China, in 2020, respectively. He is currently a postdoctoral research fellow with the Department of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China. His research interests are in computational electromagnetics with focus on fast algorithms and domain decomposition methods.

Wei-Dong Li received the M.S. degree in mathematics and the Ph.D. degree in radio engineering from Southeast University, Nanjing, China, in 2003 and 2007, respectively. From January 2008 to January 2009, he was a visiting scholar with the Technische University Darmstadt, Germany. He is currently an associate professor with the State Key Laboratory of Millimeter Waves and the School of Information Science and Engineering, Southeast University. He has authored or coauthored over 40 technical articles. He serves as a reviewer for the IEEE Transactions on Antennas and Propagation and IET Microwave, Antennas and Propagation. His research interests are in computational EM with focus on integral-equation-based overlapped domain decomposition method, multilevel fast multipole algorithm, fast and accurate inter/extrapolation techniques, and basic functions.

Wei E.I. Sha received the B.S. and Ph.D. degrees in electronic engineering from Anhui University, Hefei, China, in 2003 and 2008, respectively. From July 2008 to July 2017, he was a post-doctoral research fellow and then a research assistant professor with the Department of Electrical and Electronic Engineering, University of Hong Kong, Hong Kong. From March 2018 to March 2019, he worked as a Marie Skodowska-Curie Individual Fellow with University College London, London, U.K. From October 2017, he joined the College of Information Science and Electronic Engineering, Zhejiang University, Hangzhou, China, where he is currently a tenure-tracked assistant professor. He has authored or coauthored 180 refereed journal articles, 150 conference publications (including five keynote talks and one short course), nine book chapters, and two books. His Google Scholar citation is 8193 with H-index of 45. His research interests include theoretical and computational research in electromagnetics and optics, focusing on the multiphysics and interdisciplinary research. His research involves fundamental and applied aspects in computational and applied electromagnetics, nonlinear and quantum electromagnetics, micro- and nano-optics, optoelectronic device simulation, and multiphysics modeling. Dr. Sha is a member of Optical Society of America (OSA). He served as a reviewer for 60 technical journals and technical program committee member for ten IEEE conferences. He was a recipient of the Applied Computational Electromagnetics Society (ACES) Technical Achievement Award in 2022 and PIERS Young Scientist Award in 2021. In 2015, he was awarded Second Prize of Science and Technology from Anhui Province Government, China. In 2007, he was awarded the Thousand Talents Program for Distinguished Young Scholars of China. He also received six Best Student Paper Prizes and one Young Scientist Award with his students. He also served as an associate editor for IEEE Journal on Multiscale and Multiphysics Computational Techniques, IEEE Open Journal of Antennas and Propagation, and IEEE Access.

Hou-Xing Zhou received the M.S. degree in mathematics from Southwest Normal University, Chongqing, China, in 1995, and the Ph.D. degree in radio engineering from Southeast University, Nanjing, China, in 2002. Since 2002, he has been with the State Key Laboratory of Millimeter Waves, Southeast University. He is currently a professor with the School of Information Science and Engineering, Southeast University. He has authored or coauthored 40 journal articles. His main research interests include numerical algorithms in computational EM, including the fast algorithm for spatial domain dyadic Green’s functions of stratified media, the multilevel fast multiple algorithm, the FFT-based fast algorithm, the IE-based domain decomposition method, the FEM-BI-based domain decomposition method, the higher order method of moments, and the parallel computation based on GPU/multicore-CPU platform. In addition, the full-wave simulation technology for the electromagnetic field distribution around the base-station is one of his current interests.

Wei Hong received the B.S. degree from the University of Information Engineering, Zhengzhou, China, in 1982, and the M.S. and Ph.D. degrees from Southeast University, Nanjing, China, in 1985 and 1988, respectively, all in radio engineering. Since 1988, he has been with the State Key Laboratory of Millimeter Waves and serves for the director of the lab, since 2003. In 1993, 1995, 1996, 1997, and 1998, he was a short-term visiting scholar with the University of California at Berkeley and at Santa Cruz, respectively. He is currently a professor with the School of Information Science and Engineering, Southeast University. He has been involved in numerical methods for electromagnetic problems, millimeter wave theory and technology, antennas, RF technology for wireless communications, and so on. He has authored and coauthored over 300 technical publications and two books. Dr. Hong was an elected IEEE MTT-S AdCom Member, from 2014 to 2016. He is a fellow of CIE, the vice president of the CIE Microwave Society and Antenna Society, and the chair of the IEEE MTT-S/AP-S/EMC-S Joint Nanjing Chapter. He was twice awarded the National Natural Prizes, thrice awarded the first-class Science and Technology Progress Prizes issued by the Ministry of Education of China and Jiangsu Province Government. He also received the Foundations for China Distinguished Young Investigators and for Innovation Group issued by NSF of China. He served as the associate editor for the IEEE Transactions on Microwave Theory and Techniques from 2007 to 2010, and one of the guest editors for the 5G special issue of the IEEE Transactions on Antennas and Propagation, in 2017.