Research on Web GUI Image Recognition System Based on Improved Convolutional Neural Network

Nannan Zhao

School of Computer Science and Engineering, Guangdong Ocean University at Yangjiang, China
E-mail: znn@gdou.edu.cn; tm2h5l4f9h@21cn.com

Received 22 June 2022; Accepted 18 July 2022; Publication 27 August 2022

Abstract

In this paper, a MobileNet V2 convolutional neural network depending on L2 regularization method, a amended particle swarm model and Dropout method are constructed in a bid for enhancing the accuracy and speed of Web GUI image recognition for establishing Web Web GUI image recognition system of Web GUI test. Firstly, an improved MobileNet V2 convolutional neural network is constructed. Secondly, the basic models of L2 regularization method, improved particle swarm method and Dropout method are studied, the improved MobileNet V2 convolution neural network optimization algorithm is proposed, and the basic model of image processing is designed. Finally, simulation analysis is carried out on Web GUI image recognition through using the amended convolutional neural network on basis of MINIST data set as the research object. The simulation results illustrate that the amended convolutional neural network proposed in this study is more accurate and efficient in Web GUI image recognition.

Keywords: Improved convolutional neural network, Web GUI image recognition, l2 regularization, improved particle swarm optimization.

1 Introduction

As the main medium of transmission and interaction in the information age, web applications have gradually penetrated into various industries and fields, and have had a great impact on people’s daily production and life. Web based applications have a wide audience, so higher requirements are put forward for the functional correctness and system reliability of web applications. A Web GUI automatic test platform based on Web GUI image recognition technology should be constructed. By improving the recognition rate of web element controls, the reusability of test cases and improve the test efficiency can be improved. With quick advancement of science and technology, artificial intelligence has been increasingly widely used in reality, and relevant researches, such as the researches into robotics, facial detection, Web GUI image recognition, natural language process, intelligent search and artificial neural network, have also received great attention. Convolutional neural network that is widely applied in aspect of artificial intelligence, has also become one of the hot research topics [1].

Web GUI image recognition, which mainly refers to the recognition of images with the computer simulation technology, can classify and recognize images according to the feature information of images. In recent years, as the computer industry has been developing rapidly in China, Web GUI image recognition technology has also become closely related to people’s life and has been successfully applied to fields like medical health, monitoring and tracking and agricultural production. Since the 1980s, optical character recognition technology has been the main Web GUI image recognition technology. With the continuous improvement of computer performance, Web GUI image recognition technology has ushered in a development opportunity. The most commonly used methods of Web GUI image recognition. Template matching algorithm is the simplest Web GUI image recognition algorithm. The corresponding standard template is established in the template library, and the sliding window method is used to complete the classification according to the matching degree between the target to be recognized and the template image. However, in the recognition process, the detection window can only move in parallel. If the matching target rotates or changes in size, the template matching algorithm will fail. In order to solve the limitations of the algorithm template, scholars at home and abroad have carried out a series of studies. As one of the important algorithms in the field of Web GUI image recognition, deep learning can describe the attributes and features of objects more abstractly and deeply, and has great potential in many fields [2].

2 Related Works

In recent years, the image identification has been concerned by many scientists, and some algorithms have been applied in image identification, however the research on Web GUI image identification has relatively few. As different people have different habits when writing numbers, the fonts of numbers are also different. However, it is difficult for computers to recognize a large number of handwritten fonts. Therefore, it is of great theoretical significance to propose an accurate and efficient Web GUI image recognition technology that has practical value. The application of deep learning has received wide concern in the field of Web GUI image recognition. Deep learning mainly uses the potential laws and representation levels of learning sample data, and thus enabling computers to recognize images automatically. Compared with the traditional machine recognition algorithm, the deep learning algorithm has the advantage of being intelligent, especially in Web GUI image recognition. At present, how to make feature extraction more efficient and applicable has become a difficult problem in this field. Deep learning can learn the features of sample data automatically, which can reduce the complexity of feature extraction. When traditional deep learning algorithm is used to extract features, Web GUI image recognition is not so accurate. The Web GUI image recognition algorithm based on deep learning can well adapt to the illumination and angle changes of the scene, and can quickly and robustly learn image features and complete accurate recognition. Therefore, a more effective depth learning algorithm should be proposed according to the features of the image, so as to improve the accuracy of Web GUI image recognition [3–7]. BP neural network belongs to an effective Web GUI image recognition technology. When BP neural network is applied for recognize images, a complex image preprocessing process is required. Feature data such as Hu invariant moment and gray level co-occurrence moment are extracted from the image for Web GUI image recognition. According to analysis mentioned above, an amended Web GUI image recognition model depending on convolutional neural network is proposed in this study. Convolutional neural network is a representative algorithm of deep learning. Through weight sharing and local connection, it can reduce not only the complexity of the network model but also the number of weights [8–11].

Convolutional neural network is applied to effectively improve the effectiveness of recognizing image. In order to make the convolutional neural network learn more image features, the depth and breadth of the neural network are usually increased. The deepening of the number of layers leads to a significant increase in parameters. When the data set is small, the neural network will fit the characteristics of all the data rather than the commonness between the data, resulting in over fitting. In general, the number of non open source data sets often cannot meet the requirements of convolution neural network training. Therefore, effective data enhancement algorithms are needed to generate target images with clear details and rich features with a certain data scale to support the training requirements of high-precision and high robustness Web GUI image recognition algorithm models. Data enhancement algorithm is an effective means to overcome the shortage of data and prevent over fitting. On the one hand, it increases the samples of convolutional neural network training set, improves the diversity and richness of training images, and makes the training set as close to the test set as possible, so as to improve the prediction accuracy. On the other hand, data enhancement can make the neural network learn more and more robust image features, and make the model more generalization [12, 13].

Advantage of convolutional neural network is that it is applied to take the initial image as input variable, utilizes forward and back propagation to optimize the model parameters, obtain the output prediction results, and realize end-to-end learning. Convolution neural network simplifies the complex process of image preprocessing and feature extraction, and reduces the cost of manually designing appropriate feature extractors. However, the convolutional neural network needs a lot of labeled samples for training and powerful graphics processing to accelerate learning. At the same time, the convolutional neural network structure also needs to spend a lot of time to adjust parameters. MobileNet V2 is a lightweight convolutional neural network and belongs to the MobileNets series [14–16]. The amended convolutional neural network can reduce computational workload, ensure accuracy of recognizing image. The amended convolutional neural network is featured by small size, so it is suitable for low memory storage devices, and it has the characteristics of response delay, so it can adapt to embedded devices. Through amended convolution neural network, the depth method is used to recognize the image, which can obtain better results [17, 18].

3 Improved MobileNet V2 Convolutional Neural Network

Convolutional neural network can be well applied in many fields, so the improvement of convolutional neural network performance has attracted the attention of many scholars. This network effectively reduces network complexity through local sensing and parameter sharing. The design is superior and more adaptable in Web GUI image recognition, deformation, distortion, tilt and other deformation forms. Through direct input of the image, feature extraction and image reconstruction are avoided. Therefore, convolutional neural network performs well in speech recognition, image analysis and text classification, and has been widely used in many specific fields.

The structural features of convolutional neural network are as follows [19]:

(1) This model consists of convolution layer, pooling layer and full connection layer. After calculation of convolution layer, the pooling layer starts to calculate, and the two are calculated alternately. At the same time, the feature extraction and classification processes are combined together, which effectively simplifies feature extraction in the traditional algorithm;

(2) Convolutional neural networks use local connections, and a single neuron is only connected to some neurons. Through weight sharing module, number of training weights can be effectively reduced and the issue of over fitting can be prevented to a certain extent [20];

(3) Multilayer perceptron includes three important layers: inputting layer, middle layer and outputting layer. Multiple layer perception has good robustness to image translation, scaling and distortion.

MobileNet V2 network structure is developed and designed for mobile devices and embedded computers. Previous standard convolution is replaced by Depthwise Separable Convolutions. Besides, two hyper-parameter $α$ and $β$ are used. $α$ is a magnification factor, which is used to adjust the number of convolution kernels, and $β$ is used to control the image size parameters of the input network, so as to greatly reduce the calculation amount and the number of parameters, which effectively reduces the size of the model. MobileNet V2 is innovative because it adds residual connection on the basis of deep separable convolution to form an inverted residual block. The traditional residual structure is $1 \times 1$ convolution dimension reduction $\to 3 \times 3$ convolution $\to 1 \times 1$ . The convolution dimension is increased, and the number of channels in the feature map is reduced first and then increased; the reverse is true in the inverted residual structure, which is $1 \times 1$ convolution dimension increase $\to 3 \times 3$ convolution $\to 1 \times 1$ convolution dimensionality reduction. The number of channels of the characteristic graph increases first and then decreases. The first two layers of the activation function of the module use ReLU6, and the last layer uses linear activation function, which can reduce the loss of features. The network level of MobileNet V2 is not deep, and the core consists of 17 inverted residual modules [21].

The activation functions used by MobileNet V2 are all ReLU (rectified linear unit) functions, which are defined as [22]:

F (x) = \max (0, x)

(1)

The ReLU (Rectified Linear Unit) function is a commonly used convolutional neural network activation function. In region that satisfies the condition $x > 0$ , there is no gradient saturation and gradient disappearance. Computation process is not so complex, and no exponential calculation can be needed. The activation value is acquired as a threshold value. However, when $x < 0$ , gradient will be zero, and gradient of neuron and subsequent neurons are also zero. They is no longer response to information, therefore parameters are not changed, that is, neuron necrosis [23].

In Leaky ReLU function, on the basis of ReLU function, when s $x < 0$ , a very small value of $γ$ is introduced as the gradient, which can avoid the neuronal necrosis and supplement the gradient. The use of spanning connection in the convolution of inverted residuals improves complexity of model and ensures it easier to over fit the model when training. Therefore, a Dropout layer is added to the cross connection, and part of the input data is discarded randomly, which reduces the possibility of over fitting, increases generalization and robustness of model.

4 Amended Convolutional Neural Network Learning Method

Established MobileNet V2 convolutional neural network can be optimized to further improve Web GUI image recognition accuracy of MobileNet V2 convolutional neural network. MobileNet V2 convolutional neural network has a large number of parameters and complex structure. Therefore, when the training data set is not very large, over fitting is easy to occur. To avoid model over fitting, reduce the impact of complex background flicker noise, and ensure that model can take well classification function for new information, L2 regularization method, improved particle swarm model and Dropout model are applied in optimizing MobileNet V2 convolutional neural network [24].

4.1 L2 Regularization Algorithm

The idea of L2 regularization is that introduces the regularization item (penalty item) to loss equation, and decrease complexity of the model by limiting the weight $ω$ with the largest number of parameters in model to prevent the model from arbitrarily fitting the noise information such as complex background in the training set. It is assumed that the original loss function used in the training process of the model is $J_{0} (ω, b)$ , and the expression of the mean square error loss function is [25]:

J_{0} (ω, b) = \frac{1}{m} \sum_{i = 1}^{m} L (y^{', (i)}, y^{(i)})

(2)

where, $m$ illustrates number of elements in the sample data set; $L$ represents the cost function; $y^{', (i)}$ illustrates real value of neuron output; $y^{(i)}$ illustrates desired output value of the neuron; $b$ represents the amount of bias during neuron transmission.

After the addition of regularization, $J_{0} (ω, b)$ is not directly optimized, but it is mainly to optimize $J_{0} (ω, b) + c λ R (ω)$ . $R (ω)$ refers to the regularization term or penalty term, which is mainly used to describe the complexity of the model, which is computed based on formula [26]:

R (ω) = {∥ ω ∥}^{2} = \sum_{j = 1}^{l} ω_{j}^{2}

(3)

where, $ω_{j}$ represents the weight of $j$ th neuron, and $l$ illustrates number of full connection layers.

The regularized loss function expression is as follows [27]:

J (ω, b) = \frac{1}{m} \sum_{i = 1}^{m} L (y^{', (i)}, y^{(i)}) + \frac{λ}{2 m} \sum_{j = 1}^{l} ω_{j}^{2}

(4)

where, $λ$ represents the regularization factor. In the experiment, L2 regularization method is used to optimize the full connection layer and output layer. After cross verification, $λ$ is taken as 0.004.

4.2 Improved Particle Swarm Optimization

4.2.1 Forward propagation of MobileNet V2 convolutional neural network

First, MobileNet V2 convolution neural network performs forward propagation, including convolution and pooling operations. The calculation of convolution layer is expressed by [28].

x_{j}^{n} = f (\sum_{i \in M_{j}} x_{j}^{n - 1} K_{i j}^{n} + b_{j}^{n})

(5)

where, $M_{j}$ represents the characteristic atlas, $x_{j}^{n}$ represents the characteristic value $j$ of the $n$ th layer, and $K_{i j}^{n}$ represents the convolution kernel equation; $f ()$ represents active equation, using sigmoid function; $b_{j}^{n}$ is the offset factor.

The convolution layer and pooling layer can be computed in turn. The convolution layer generally follows by the pooling layer. The pooling layer equation is [29]:

x_{j}^{n + 1} = f (\sum_{j} x_{j}^{n - 1} ω_{j}^{n} + b_{j}^{n + 1})

(6)

where, $ω_{j}^{n}$ represents the weight coefficient of the pool layer characteristic graph.

Before the output value of the output layer is obtained, in the $n$ th layer Mobilenet V2 convolutional neural network, $f_{n}$ can be applied in representing pooled activation function for different layers, and the connection weights of different layers are expressed by $ω^{(n)}$ . The calculation process is described by the following formula [30]:

y = f_{n} (\dots f_{2} (f_{1} (x \cdot ω^{1}) ω^{2})) \dots) ω^{n}

(7)

The forward propagation value and error are computed. The error equation is listed by:

E = \frac{1}{n} \sum_{i = 1}^{N} \sum_{j = 1}^{C} {(y_{j i}^{d} - y_{j i})}^{2}

(8)

where, $N$ illustrates number of training samples of MobileNet V2 convolutional neural network input image; $C$ represents number of neurons in the output layer; $y_{j i}^{d}$ represents desired output value of $j$ th output node of $i$ th sample; $y_{j i}$ represents real output value of $j$ th outputting node of $i$ th sample.

4.2.2 Back propagation of MobileNet V2 convolutional neural network

The purpose of backward propagation of MobileNet V2 convolutional neural network is to take the parameters to be calculated as the particles of particle swarm optimization algorithm after the error between desired value and actual value is calculated, and determine part optimization value and whole optimization value through error calculation. The following formula is used to update particles [31]:

$v_{i j}^{'} = ω \cdot v_{i j} + c_{1} rand () \cdot (p_{i j} - x_{i j}) + c_{2} \cdot r a n d () \cdot (p_{g j} - x_{i j})$	(9)
$x_{i j}^{'} = x_{i j} + v_{i j}^{'}$	(10)

After iterative calculation, the updated particle is the weight of model. Then network is propagated forward again until the error threshold converges to the minimum range, and then the algorithm is stopped.

The algorithm flow is as follows:

Input: the number of particles ( $m$ ), speedy coefficient 1 ( $c_{1}$ ), speedy coefficient 2 ( $c_{2}$ ), inertia weight ( $ω$ ), position of particles ( $x$ ), velocity of particles ( $v$ ).

Output: optimized weight ( $c ω []$ ).

Step 1: For each particle in the group, use Equation (7) in the forward propagation of MobileNet V2 convolutional neural network and then Equation (8) to calculate the error.

Step 2: if the minimum error threshold has been reached, stop the algorithm. If there is no convergence, implement Equations (9) and (10) to update the particles.

Step 3: transmit the updated particle information back to the MobileNet V2 convolutional neural network, update the weights to be trained, and calculate the forward propagation error again;

Step 4: If the minimum value of the error threshold is not reached, return to step 2. (c) Dropout algorithm

Dropout algorithm is aimed at MobileNet V2 convolutional neural network. In the training process, some neurons and related connections are randomly discarded according to probability, which is equivalent to training multiple sub networks at the same time. Figure 1 shows the structure of MobileNet V2 convolutional neural network optimized by Dropout algorithm [32]. In Figure 1, $C_{1}^{'}$ is first convolution layer, $C_{2}^{'}$ is second convolution layer, $S_{1}^{'}$ is first pooling layer, $S_{2}^{'}$ is second pooling layer.

Figure 1 MobileNet V2 convolutional neural network optimized based on dropout algorithm.

Dropout algorithm realizes the reduction of MobileNet V2 convolutional neural network, i.e., it adopts the method of randomly discarding several neurons to form a sub network of the complete network. For the network containing $n$ neurons, it can theoretically construct $2 n$ sub networks. In the process of training samples, a reduced sub network is formed each time for training. Each neuron is activated according to the probability $p$ , and the calculation formula is as follows [33]:

p = (p_{a} = 1 | x) = \sum_{i, j \in B_{a}} \frac{e^{ω_{i j}^{n} x_{j}^{n - 1} + b_{i}^{n}}}{1 + e^{\sum_{i, j \in B_{a}} ω_{i j}^{n} x_{j}^{n - 1} + b_{i}^{n}}}

(11)

where, $p_{a}$ represents the probability that sample $a$ becomes 1; $B_{a}$ represents the number of class $i$ neurons belonging to sample $a$ .

In MobileNet V2 convolutional neural network, dropout algorithm uses Ising model to identify neurons with low connection energy, and temporarily discards these neurons in training and reasoning. A remarkable property of Ising model is that as the system evolves, its energy will decrease at the same time. The method of overall energy reduction is actually consistent with the microscopic principle of copying neighborhood states. Through training (changing the weight of interconnection), the optimized network model can map the pattern to be remembered to the state of minimum energy, and then spontaneously evolve to this state of minimum energy through the neighborhood interaction rules of Ising model [34].

It is assumed that each neuron has two states: active and inactive, which are represented by 1 and $-$ 1 respectively. The connection weights of neurons i to j are represented by $ω_{i j}$ . In the beginning, the input vector is mapped to the active and inactive states of each neuron. During network training, each neuron updates the state following the below rules [35]:

S_{i} (t + 1) = {\begin{matrix} 1, \sum_{j} ω_{i j} S_{j} (t) > θ_{i} \\ - 1, other \end{matrix}

(12)

where, $θ_{i}$ represents the threshold factor; $ω_{i j}$ illustrates connective weight, $t$ represents time, and $S_{j} (t)$ illustrating activate status of neuron $j$ on moment $t$ .

According to the above-mentioned rule, if all the neurons adjacent to neuron $i$ are activated and their connection weight is positive, the neuron may be activated, which is equivalent to minimizing a global energy function. If the neural network operates according to the above rule, the total energy will be reduced as much as possible, and the strength of neuron interaction varies with the connection.

The MobileNet V2 convolutional neural network optimized based on dropout algorithm can reduce the network structure and improve the generalization performance of MobileNet V2 convolutional neural network. The reduced sub networks can share the weight. The order of magnitude of the parameters is $O (n^{2})$ , which can make the sample training more applicable and robust.

5 Web GUI Image Processing

The batch normalization method is used to accelerate the network convergence to solve the problems of long convergence time and large parameter memory requirement; the algorithm process is as follows: calculate the mean and variance of $n$ samples $x_{1} \sim x_{n}$ in each batch [36]:

$η = \frac{1}{n} \sum_{i = 1}^{n} x_{i}$	(13)
$σ = \frac{1}{n} \sum_{i = 1}^{n} {(x_{i} - μ)}^{2}$	(14)

where, $μ$ represents the batch mean value, $σ$ represents the batch variance, and then the following formula is used for normalization:

{\overset{⌢}{x}}_{i} = \frac{x_{i} - μ}{\sqrt{σ^{2} + ε}}

(15)

Equation (15) is used to obtain data ${\overset{⌢}{x}}_{i}$ with mean value of 0 and variance of 1. The purpose of equation $ε$ is to avoid the constant set when the variance of 0 fraction is not tenable. In order to avoid data normalization destroying the feature distribution, it is necessary to use reconstruction transformation to restore the original feature distribution [37]:

$y_{i} = γ_{i} x_{i} + β_{i}$	(16)
$γ_{i} = \sqrt{var [x_{i}]}$	(17)
$β_{i} = E [x_{i}]$	(18)

where, $γ_{i}$ and $β_{i}$ are obtained through training, $var$ is the variance function, and $E$ is the mean value. The image is input into the model in the form of vector. The above parameters are vectors, and the dimension is consistent with the size of the input image (the input image in this paper is 256 $\times$ 256 dpi, i.e., the parameter dimension is 256 $\times$ 256) function.

6 Simulation Analysis of Web GUI Image Recognition

The improved MobileNet V2 convolution neural network is used for Web GUI image recognition. The simulation analysis of Web GUI image recognition is carried out in Ubuntu 16.04 LTS 64 bit system. The open source framework of Caffe deep learning is used and the analysis program is compiled with MATLAB software. Eight computers with the same configuration are selected for simulation analysis. The configuration of the computer is as follows: 32GB memory and Intel processor ® Core $^{TM}$ I7-6700KCPU@4.00 GHz X8, NVIDIA GTX980Ti graphics card is selected for image processing.

MINIST data set is used for Web GUI image recognition, which is constructed by American national standards and technology. The training set is composed of 250 numbers written by different people. Some samples are shown in Figure 2. The MINIST data set includes 70000 images, of which 60000 are training sets and 10000 are test sets.

To verify the effectiveness of the proposed image identification model, According to the types and characteristics of controls in web pages, the subjects selected 163 mailbox of web pages, and manually intercepted a total of 10 representative operable control elements in different pages as the benchmark image. The types of controls include Button, Textbox, Droplist and Listbox. The resolution of ten reference images is intercepted at 1280*720 display resolution. The specific GUI image elements are shown in Figure 2.

Figure 2 Partial samples of 163 mailbox GUI.

The traditional convolutional neural network and the improved convolutional neural network are used to train and test respectively. The variation law of the overall loss function of the two network models with the number of iterations and the variation trend of the average recognition accuracy on the test set are compared. The error change curve of 5000 iterations of training with the training data set is shown in Figure 3, and the change curve of model recognition accuracy with the number of iterations is shown in Figure 4.

Figure 3 Curve of iterative training’s error variation.

Figure 4 Curve of iterative training recognition accuracy.

According to results in Figures 3 and 4, two networks are fitted quickly in about 2000 iterations, and the accuracy increases rapidly; In the process of 2000 $\sim$ 4000 iteration operations, loss result declines slowly and recognition precision rises slowly; When number of iteration times rises to 4000, loss results incline to be stable, and recognition precision also remains stable. Then, the model converges. The performance of amended network model is better than that of the traditional network model. Simulation results illustrate that maximum recognition precision of traditional convolutional neural network is 87.43%, while mean recognition accuracy of amended network model is 98.3%, which is significantly improved relative to conventional network model.

163 mailbox GUI data set was run for 10 times. The recognition results of each model under different data scales are shown in Figure 5.

Figure 5 Identification results of 163 mailbox GUI data sets by different models.

It can be seen from Figure 5 that the recognition rate of amended network method for different data scales is higher than that of conventional amended network method. With the increasing size of data, the amended network method possesses the big impact on Web GUI image recognition rate, and the fluctuation range of recognition performance increases. Therefore, the amended network method has good robustness.

Each test data set is run for 8 times, and the performance results of the model are compared between the minimum recognition rate $R_{M}$ , the maximum identification rate $R_{l}$ , average identification rate $R$ and running time $t$ . The results of model performance is illustrated in Table 1.

Table 1 Method performance statistical results

Index	Traditional Network	Amended Network
$R_{M}$	095.4	098.5
$R_{l}$	096.9	099.4
$R$	095.8	099.3
$t$	188.5	119.3

According to the analysis results in Table 1, on 163 mailbox GUI data set, compared with the conventional network method, amended network model has biggest minimum identification rate $R_{M}$ , maximum identification rate $R_{l}$ and average identification rate $R$ , indicating that amended network method has good feasibility and correctness. Compared with conventional network method in running time, running time of amended network is less than that of conventional network. Research results illustrate that the amended network has higher convergence accuracy.

Using the enhanced data for training, the amended network can learn more typical image features, and enhance the anti-interference ability and generalization ability of the algorithm. Aiming at the problem of low accuracy in scene recognition, an algorithm based on saliency image detection is designed to eliminate the background information in the image, highlight the effective features of the image, and reduce the interference caused by the change of illumination and angle of view in the recognition process.

The four versions of the project management system are registered, logged in and logged out for testing, and the pass ratio of Web GUI test is shown in Figure 6. As seen from Figure 6, the pass ratio of Web GUI test based on amended network is higher than that based on traditional network, results show that the proposed algorithm can obtain better Web GUI test effect.

Figure 6 Pass ratio of Web GUI test.

7 Conclusion

In view of the slow convergence and over fitting of conventional network, a MobileNet V2 network based on L2 regularization method, improved intelligent algorithm and dropout method are presented. The amended network has greatly improved the convergence efficiency and robustness. Finally, the traditional convolutional neural network and the improved convolutional neural network are used to identify the 163 mailbox GUI data set. The simulation results show that compared with traditional network method, amended network model is substantially improved in terms of Web GUI image recognition rate and convergence efficiency. The improved Web GUI image recognition algorithm has the advantages of fast convergence speed, high recognition accuracy, strong generalization ability, small space occupied by the model and low time-consuming for a single image, which can meet the practical application needs of Web GUI image recognition. The proposed improved network model also has some shortcomings, which is too simple to be used for simulation and analysis of data sets. Next, we will continuously optimize the performance of the model for the images in different occasions in the complex environment, and apply more complex Web GUI image recognition applications.

The practical application environment of Web GUI image recognition is very complex. Therefore, the network model still needs to use more dense real data for self-learning and verification testing to ensure its practicality and reliability. The algorithm proposed in this paper has only completed the simulation experiment, and has not been applied to the actual error correction system. Therefore, how to integrate the high-precision Web GUI image recognition algorithm into the actual Web GUI image recognition system to complete the error correction is the main content of the next stage of research.

References

[1] Sehyung Lee, Hideaki Kume, Hidetoshi Urakubo, Haruo Kasai, Shin Ishii, Tri-view two-photon microscopic image registration and deblurring with convolutional neural networks, Neural Networks, 152, 2022, 57–69.

[2] Yan Chen, Zimei Cao, Jiejian Zhang, Yuanqing Liu, Duli Yu, Xiaoliang Guo, Wearable ultraviolet sensor based on convolutional neural network image processing method, Sensors and Actuators A: Physical, 338, 2022, 113402.

[3] Tahereh Hassanzadeh, Daryl Essam, Ruhul Sarker, EvoDCNN: An evolutionary deep convolutional neural network for image classification, Neurocomputing, 488, 2022, 271–283.

[4] Hao Tang, Chao Xu, Xu Han, Electrical resistance tomography image reconstruction based on one-dimensional multi-branch convolutional neural network combined with attention mechanism, Flow Measurement and Instrumentation, 84, 2022, 102140.

[5] S. Niyas, S.J. Pawan, M. Anana Kumar, Jeny Rajan, Medical image segmentation with 3D convolutional neural networks: A survey, Neurocomputing, 493, 2022, 397–413.

[6] Feng Hu, Mengran Zhou, Pengcheng Yan, Zhe Liang, Mei Li, A Bayesian optimal convolutional neural network approach for classification of coal and gangue with multispectral imaging, Optics and Lasers in Engineering, 156, 2022, 107081.

[7] Wei-Chih Huang, Mads Svanborg Peters, Mads Juul Ahlebæk, Mads Toudal Frandsen, René Lynge Eriksen, Bjarke Jørgensen, The application of convolutional neural networks for tomographic reconstruction of hyperspectral images, Displays, 74, 2022, 102218.

[8] Ivan Castillo Camacho, Kai Wang, Convolutional neural network initialization approaches for image manipulation detection, Digital Signal Processing, 122, 2022, 103376.

[9] R. Suresh Kumar, B. Nagaraj, P. Manimegalai, P. Ajay, Dual feature extraction based convolutional neural network classifier for magnetic resonance imaging tumor detection using U-Net and three-dimensional convolutional neural network, Computers and Electrical Engineering, 101, 2022, 108010.

[10] Mohammed Ahmed Jaddoa, Luciano Gonzalez, Holly Cuthbertson, Adel Al-Jumailyd, Maryam Imani, Low frequency and radar’s physical based features for improvement of convolutional neural networks for PolSAR image classification, The Egyptian Journal of Remote Sensing and Space Science, 25(1), 2022, 55–62.

[11] Yoji Kawano, Ko Shimamoto, Early signaling network in rice PRR-mediated and R-mediated immunity, Current Opinion in Plant Biology, 16(4), 2013, 496–504.

[12] Nobukuni Hamamotoa, Shigetoshi Yokoyama, Atsuko Takefus, Kento Aid, Implemention of Secured Log Analysis Environment for Moodle using Virtual Cloud Provider Service, Procedia Computer Science, 192, 2021, 3154–3164.

[13] Umberto Di Giacomo, Rosangela Casolare, Oliver Eigner, Fabio Martinelli, Francesco Mercaldo, Torsten Priebe, Antonella Santone, Exploiting Supervised Machine Learning for Driver Detection in a Real-World Environment, Procedia Computer Science, 192, 2021, 2440–2449.

[14] Yao-juan Chu, Meng-li Wang, Xiao-bao Wang, Xiang-yu Zhang, Li-wei Liu, Ying-ying Shi, Li-hua Zuo, Shu-zhang Du, Jian Kang, Bing Li, Wen-bo Cheng, Zhi Sun, Xiao-jian Zhang, Identifying quality markers of Mailuoshutong pill against thromboangiitis obliterans based on chinmedomics strategy, Phytomedicine, 2022, 154313, In Press.

[15] Shiyu Song, Mengyan Ge, Wei Wang, Chenrui Gu, Kun Chen, Qingzhu Zhang, Qibin Yu, Guifeng Liu, Jing Jiang, BpEIN3.1 represses leaf senescence by inhibiting synthesis of ethylene and abscisic acid in Betula platyphylla, Plant Science, 321, 2022, 111330.

[16] Maryam Barati Moghaddam, Mehdi Mazaheri, Jamal Mohammad Vali Samani, Inverse modeling of contaminant transport for pollution source identification in surface and groundwaters: a review, Groundwater for Sustainable Development, 15, 2021, 100651.

[17] Bin Zhao, Hao Chen, Diankui Gao, Lizhi Xu, Risk assessment of refinery unit maintenance based on fuzzy second generation curvelet neural network, Alexandria Engineering Journal, 59, (3), 2020, 1823–1831.

[18] Shivani Gaba, Ishan Budhiraja, Vimal Kumar, Sahil Garg, Georges Kaddoum, Mohammad Mehedi Hassan, A federated calibration scheme for convolutional neural networks: Models, applications and challenges, Computer Communications, 192, 2022, 144–162.

[19] Fei Lu, Yongchao Liang, Xingying Wang, Tinghong Gao, Qian Chen, Yunchun Liu, Yu Zhou, Yongkai Yuan, Yutao Liu, Prediction of amorphous forming ability based on artificial neural network and convolutional neural network, Computational Materials Science, 210, 2022, 111464.

[20] Yang Liu, Yaolun Song, Yan Zhang, Zhifang Liao, WT-2DCNN: A convolutional neural network traffic flow prediction model based on wavelet reconstruction, Physica A: Statistical Mechanics and its Applications, 2022, 127817, In Press.

[21] Bin Zhao, Yi Ren, Diankui Gao, Lizhi Xu, Performance ratio prediction of photovoltaic pumping system based on grey clustering and second curvelet neural network, Energy, 171, 2019, 360–371.

[22] Huanhuan Lv, Zhuolu Wang, Hui Zhang, Edge protection filtering and convolutional neural network for hyperspectral remote sensing image classification, Infrared Physics & Technology, 122, 2022, 104039.

[23] Chaoqing Wang, Weijun Gong, Junlong Cheng, Yurong Qian, DBLCNN: Dependency-based lightweight convolutional neural network for multi-classification of breast histopathology images, Biomedical Signal Processing and Control, 73, 2022, 103451.

[24] Bin Zhao, Yi, Ren, Diankui Gao, Lizhi Xu, Yuanyuan Zhang, Energy utilization efficiency evaluation model of refining unit Based on Contourlet neural network optimized by improved grey optimization algorithm, Energy, 185, 2019, 1032–1044.

[25] Shijiao Gao, Haiou Guan, Xiaodan Ma, A recognition method of multispectral images of soybean canopies based on neural network, Ecological Informatics, 68, 2022, 101538.

[26] Ming-Chuan Chiu, Yen-Ling Tu, Meng-Chun Kao, Applying deep learning image recognition technology to promote environmentally sustainable behavior, Sustainable Production and Consumption, 31, 2022, 736–749

[27] Vishesh Kumar Tanwar, Balasubramanian Raman, Amitesh Singh Rajput, Rama Bhargav, SecureDL: A privacy preserving deep learning model for image recognition over cloud, Journal of Visual Communication and Image Representation, 86, 2022, 103503.

[28] Bin Zhao, Shasha Li, Diankui Gao, Lizhi Xu, Yuanyuan Zhang, Research on intelligent prediction of hydrogen pipeline leakage fire based on Finite Ridgelet neural network, International Journal of Hydrogen Energy, 47(55), 2022, 23316–23323.

[29] Lili Yang, Zhichao Li, Shilan Ma, Xinghua Yang, Artificial intelligence image recognition based on 5G deep learning edge algorithm of Digestive endoscopy on medical construction, Alexandria Engineering Journal, 61(3), 2022, 1852–1863.

[30] Weigang Wang, Jie Qin, Yunwei Zhang, Dashan Deng, Shujuan Yu, Yun Zhang, Yuanjian Liu, TNNL: A novel image dimensionality reduction method for face image recognition, Digital Signal Processing, 115, 2021, 103082.

[31] Jie Hu, Zhenyu Ren, Jiacheng He, Yang Wang, Yanping Wu, Peixiang He, Design of an intelligent vibration screening system for armyworm pupae based on image recognition, Computers and Electronics in Agriculture, 187, 2021, 106189.

[32] Charles Rajesh Kumar, J. Ahmed Almasarani, M.A. Majid, 5G-Wireless Sensor Networks for Smart Grid-Accelerating technology’s progress and innovation in the Kingdom of Saudi Arabia, Procedia Computer Science, 182, 2021, 46–55.

[33] Akshay Pandey, Kamal Jain, An intelligent system for crop identification and classification from UAV images using conjugated dense convolutional neural network, Computers and Electronics in Agriculture, 192, 2022, 106543.

[34] Zheng-fang Wang, Yan-fei Yu, Jing Wang, Jian-qing Zhang, Hong-liang Zhu, Peng Li, Lei Xu, Hao-nan Jiang, Qing-mei Sui, Lei Jia, Jiang-ping Chen, Convolutional neural-network-based automatic dam-surface seepage defect identification from thermograms collected from UAV-mounted thermal imaging camera, Construction and Building Materials, 323, 2022, 126416.

[35] Hamid Sadeghi, Abolghasem-A. Raie, HistNet: His togram-based convolutional neural network with Chi-squared deep metric learning for facial expression recognition, Information Sciences, 608, 2022, 472–488

[36] Orlando Grabiel Toledano-López, Julio Mader, Hector González, Alfredo Simón-Cuevas, A hybrid method based on estimation of distribution algorithms to train convolutional neural networks for text categorization, Pattern Recognition Letters, 160, 2022, 105–111.

[37] Sergio Barrachina, Manuel F. Dolz, Pablo San Juan, Enrique S. Quintana-Ortí, Efficient and portable GEMM-based convolution operators for deep neural network training on multicore processors, Journal of Parallel and Distributed Computing, 167, 2022, 240–254.

Biography

Nannan Zhao received her B.Sc. and M.E. degrees in Computer Application Technology from Liaoning Technical University, China. She was awarded the title of Outstanding Teacher of Private Education in Guangdong Province. She won the Award of Guangdong Excellent Online Teaching Case,the special prize of university-level teaching Achievement Award, the Excellence Award of the 4th Guangdong Provincial College (undergraduate) Young Teachers’ Teaching Competition, the third prize of micro class of higher Education Group in Guangdong Provincial Computer Education Software Evaluation Activity, and the third prize of scientific research project of Guangdong Provincial Finance Department. She has been an excellent member of Jiusan Society in Zhanjiang City from 2019.