An Improved Optimal Channel Sensing Algorithm in Cognitive Radio Networks Used for Video Surveillance

Ranjita Joon^* and Parul Tomar

JC Bose university of Science & Technology, YMCA, Faridabad, India
E-mail: joon.ranjita@gmail.com; ptomar_p@hotmail.com
*Corresponding Author

Received 18 September 2021; Accepted 08 December 2021; Publication 15 November 2022

Abstract

With a rapid rise in the number of wireless devices and gadgets, a shortage in the spectrum bands for wireless communications has been observed. To overcome this problem of shortage of spectrum bands, a new technology called Cognitive Radio Networks (CRNs) was adopted. CRNs help us utilize the spectrum bands which are currently being underutilized by opportunistically and intelligently switching to these underutilized white spaces. Thus, CRNs aim to use the frequency spectrum in an opportunistic manner by allowing different users to operate in available frequency bands without interference. In this paper, Double Q Learning (DQN) with prioritized experience relay approach has been used to study the throughput of the network at different parameters and to draw a relationship between throughput and Probability of Undetectable User Transmission. Double Q Learning (DQN) with prioritized experience relay is a reinforcement learning based method that adds backward exploration to the forward exploration of Q Learning method. Both forward and backward exploration are used to update the Q values. Since the sensor nodes in the cognitive environment have limited energy, and sensing the spectrum band involves energy consumption, so the technique for sensing should be energy efficient so that the sensor nodes can be effectively used for various operations of video surveillance.

Keywords: Cognitive radio networks, Q learning, double DQN, double Q learning, energy detection, signal to noise ratio, CRN, opportunistic spectrum access.

1 Introduction

The radio spectrum is the necessity for wireless communication, and it is a limited resource which is available in nature. According to the spectrum regulation around the world, a spectrum licensed to any user can be utilized according to the requirement. For many years this worked very well but nowadays people are facing spectrum scarcity issues which are produced because of the rapid growth in IoT devices and wireless sensor networks. The Figure 1(a) demonstrates how the cognitive radio networks make use of the underutilized spectrums of the licensed users opportunistically. There has been a significant rise in the applications of wireless sensor networks (WSNs) in the past few years. It is evident that WSNs operate in unlicensed spectrum bands. But due to the increasing usage in unlicensed spectrum band this band is getting and causes interference in the services of the network. The presence of fifth generation technology known as cognitive radio technology gives a solution to this problem. If we combine cognitive radio technology with wireless sensor networks, we can have multiple networks to coexist in ISM band with minimum interference.

Figure 1(a): Opportunistic spectrum access.

There will be 50 billion connected IoT devices by 2030 from 22 billion devices in 2018 [1]. That’s not the end according to the insiders, there will be 75% cars running IoT devices connected to them in the coming years. This will create challenges for researchers and industries. Now, IoT has been used in every field e.g., Smart city, smart buildings, smart hospitals, smart offices, and smart houses, because they provide savings and revenues in many areas but also increases the challenges. Due to the rapid growth of IoT there were degradations in performance, scarcity of spectrum which leads to spectrum sharing and interference in the devices etc. To overcome these problems a novel solution was required which increased the efficiency of the spectrum with better utilization and sustainable development in communication of wireless devices. This approach has shown similar or even better results than Bluetooth, Wi-Fi, WiMAX etc. This approach is known as Cognitive Radio Networks (CRNs) in IoT. In CRN we analyze different vacant Radio Spectrum to control and manage for optimistic utilization of the spectrum. In simple way we can say that we are using spectrum which is already licensed to primary users but he is not using its spectrum at his full, so the secondary user utilizes the vacant spectrum and reduce the demand of spectrum and helps to fully utilizes the spectrum must also protect it from interface and congestion for primary user. Some benefits of CRNs are as follows:

a. Reduce the spectrum scarcity by using unused spectrum,

b. Avoid the interference with primary users,

c. Avoid radio jamming and interference based on the selected spectrum Sensing (SS) approach,

d. Support to power saving protocol, Improve QoS because reliability, availability and suitability will be enhanced.

A secondary user can use a primary user vacant spectrum but if a primary user wants to use his spectrum, then he must also vacant the space which creates challenges for secondary users. So, there are three schemes to use to share primary user spectrum, these are underlying, overlay and interweave. In an overlay and underlay scheme, a secondary user can coexist with the primary user without any interference but in an underlay scheme there is a limit of power transmission for secondary users to protect the primary user from interference. In case of overlay there are no constraints. In an underweave scheme, a secondary user can utilize the primary user spectrum if it is not underutilized by a primary user but if a primary user wants to utilize the spectrum, then in that case the secondary user must vacate the spectrum or switch from that network. The Cognitive Radio works through a cognition cycle which contains four functional phases sensing, decision, sharing and mobility. Protection of PU’s signal is on a high priority in CRNs, hence a parameter called Sensing Accuracy in the terms of P $_{d}$ and P $_{f}$ was introduced. Probability of Detection (P $_{d}$ ) is the probability of detecting the presence of a primary user accurately. Probability of false alarm (P $_{f}$ ). Is the probability of detecting a PU even when it’s not present. In order to get an efficient spectrum sensing, utilization of white spaces and protection of the PU we have to maximise P $_{d}$ and minimise P $_{f}$ . The sensing process defined above takes place at regular intervals and this interval is known as sensing interval(T). Now in case the PU starts its activities in between two sensing intervals i.e at a time when the previous sensing detected PU as inactive and the next sensing interval is due then this type of PU transmission goes undetected and this is called as Undetectable Primary User Transmission (UPT) and the probability of UPT is termed as P $_{UPT}$ . This is a very effective approach, but it has its own challenges which create a gap in the field of CRN and Radio spectrum control, and management.

The ability of the radio technology to sense the information from its radio environment is the cognitive capability. By this capability, we can identify the unused spectrum at a specified time and location. The figure below Figure 1(b) shows the cognition cycle. Cognition Cycle [22] is used in cognition engine for controlling the parameters. In cognition cycle the nodes are enabled with intelligence. They are aware of their surrounding environment and can sense the empty spaces and utilize efficiently. The nodes in cognitive networks continuously senses the environment to search for empty white spaces and utilize them. The steps of cognition cycle are described as under:

• Spectrum Sensing: To improve the spectrum utilization efficiency, Cognitive radio should regularly monitor the spectrum bands. A cognitive radio detects the available spectrum bands, take their information and then based on this information detects the spectrum holes i.e., the unused spectrum.

• Spectrum Analysis: After spectrum sensing, the spectrum holes that are detected by spectrum sensing are estimated. The following parameters need to be known, channel information, capacity, delay, reliability and is conveyed to spectrum decision.

• Spectrum Decision: After the spectrum holes are sensed, detected and analyzed, a decision is made so as to select a suitable spectrum band according to spectrum characteristics and user requirements. Once the spectrum band is decided, the communication can begin over this band.

Figure 1(b): Representation of cognition cycle.

Cognitive Radio is a new standard in wireless technology which provides many advantages and overcomes some of the issues related to spectrum band utilization. Some of the advantages are as under:

1. The available spectrum band for communication cannot be increased but we can make an efficient use of it so as to meet our growing needs. Since the radio bands available with us are licensed except ISM band. Therefore, instead of buying the costly licenses researchers and industrialists exploit the free ISM bands. Hence all the wireless technologies coexist in ISM band and due to this the ISM band is getting overfull while the licensed band is not utilized to the full extent. Thus, the promising cognitive radio technology can make use of underutilized licensed band without disturbing the licensed user and thereby make efficient use of spectrum.

2. The traditional wireless networks transmit their data over a single channel, hence the chances of collision and packet losses due to single channel increases. But in Cognitive radio technology multiple channels are used for communication so the collision is reduced.

3. Due to large number of collisions because of single channel usage in wireless networks lead to large packet loss and hence extra energy is required for packet retransmission but in cognitive radio networks the collisions are reduced due to intelligent usage of multiple channels and hence it leads to energy saving.

4. Traditional wireless networks work on their particular frequency bands. Since different countries have different spectrum rules and different frequency bands, so one wireless application working in one country on a particular frequency band may not work in another country, hence no global operation can be ensured. But cognitive radio technology can change its operating parameters and work for multiple frequency bands and has global operability.

Cognitive Radio is a promising technology which offers many advantages. There are a few limitations associated with it. They are listed below:

1. One of the biggest constraints of CWSN is the energy of the nodes in the network. Since the nodes are battery operated so they have power constraint like WSN nodes. CWSN nodes apart from sensing the physical parameters of the environment also needs to sense the spectrum continuously to detect the white spaces in the spectrum so their energy is used more rapidly as compared to WSN nodes.

2. The sensor nodes are the secondary users in cognitive radio environment. They continuously sense the licensed band and if empty band exists, they start communicating on these empty bands. They should be able to detect the presence of primary users and relinquish the licensed band to them as they arrive as they are the licensed users of that band and move to some other band for communication. Hence the secondary users should work to avoid any interference with primary user.

3. In CWSN the sensor nodes can turn their transceiver on and off and can dynamically adjust their transmission parameters, hence the topology of these nodes’ changes dynamically. Thus, due to this dynamicity in the environment, the data transfer not always happens because the spectrum is not available all the times. Hence the routing becomes a challenging task.

4. Since multiple sensor nodes can use the same frequency band for communication so there must be a centralized control mechanism to handle the co-channel interference. Hence this imposes a great challenge for CWSN.

5. The duration of spectrum sensing is also a big challenge. During spectrum sensing the nodes continuously keeps on sensing the medium for availability of free channels and hence there is no actual data transfer. Hence this period should be set judiciously.

Rest of the paper is organized as follows. Section I contains the Introduction. An extensive literature has been discussed in section II. In Section III, there is implementation of the proposed scheme. The fourth section discusses the results. The references have been mentioned after that

2 Literature Survey

In [1] author has reviewed and classified different designs required for MAC (Medium Access Control) protocols used in WSN to get efficient energy consumption. They have introduced cross-layer protocols that will increase the life of the network and provide energy efficiency. In [2] authors have explained several challenges in the WSN network. The clusters of nodes form traffic at the sink which leads to the holes around it. They have proposed a multi-layer model of network to reduce these holes by one-to-one connection at all gateway to decrease the load at the network and balance the load. In [3] authors have proposed a method to avoid energy holes at the sink which is based on synchronization of nodes in adjacent annuluses (SNAA). They have explained that the node could find an optimal parent by assessing the residual energy along with the distance between the two nodes in neighboring annuluses through sending and receiving phases.

In [4] the author has discussed the challenges of the WSN which involves the routing algorithm, consumption of energy and selecting the location of sensors in the premises, efficiency and robustness etc. they have provided a survey on formation techniques and mechanisms used in WSNs. In [5] the author has introduced heterogeneous wireless mesh technologies which gives an opportunity for higher quality of service (QoS), higher network capacity and wider coverage. They have proposed a new routing algorithm which is based on reinforced learning and a new heterogeneous routing protocol so they can select the suitable transmission technology depending on the parameters from each network. In the simulation they have achieved a rise of up to 200% efficiency compared to other networks like Wi-Fi, LTE etc.

In [6] the author has presented an adaptive interference mitigation scheme in which they have considered the social interaction of multiple Wireless Body Area Network (WBAN) in the vicinity and relative movement with each other. They have proposed optimized transmission time with parallel and synchronous transmission and achieved energy efficient channel prediction. In [7] authors have investigated the problem of energy holes and the reason behind the early death of the nodes due to the unbalanced load distribution. They have proposed a hole alleviating algorithm in which the energy of the nodes will balance the load distribution ahead of efficient consumption of energy. According to the proposed algorithm the node will find an optimal distance strategy with minimum expected error rate. In [8] authors have investigated the Energy Depletion Attack (EDA) and provide a systematic review on this vulnerability. This vulnerability is a major flaw in the Low Power Wireless (LPW) Network. This vulnerability executes energy-hungry tasks which drains the batteries which leaves the victims disabled. In [9] the author has proposed optimal routing, topology design and selection algorithm of gateways placement in Heterogeneous Cloud Radio Access Network (C-RAN) with exploiting Free Space Optimal (FSO) communication. They have also proposed a disaster recovery algorithm in which the node will maintain its network connectivity in the congested network and will reduce the size of the node by 33% in dynamic traffic operation.

In [10] authors have proposed a design for an antenna with an AODV routing protocol which will be based on hybrid Control Channel. They have discovered the route of channels from LLN boarder router (LBR) to the terminal within the CRNs. It has performed very well compared to other available antennas. In [11] authors have proposed a IoT system which will be based on Cognitive Radio Network (CRN). They have provided a survey on spectrum sensing (SS) and its design factors. In this they have discussed various merits and demerits of the Spectrum Sensing (SS) in CRNs. In [12] authors have proposed a new management scheme and networking protocol to overcome the problem of energy holes to get a balanced routing strategy. In this scheme every node has a load weight according to the next hop which provides an equitable energy consumption between all sensors.

In [13] authors have thoroughly reviewed Medium Access Control (MAC), physical (PHY), and layers of Network in a Cognitive Radio (CR) design and the relation between them. They have addressed the different signal processing techniques used for spectrum sensing and different designs for transceivers. In MAC layer and network layer they have reviewed schemes, designs and protocols used in them to provide current challenges and research gaps. In [14] authors have investigated the scheme of Distributed Clustering and also proposed a routing protocol based on clustering for Delay Tolerant Mobile Networks (DTMNs). In this scheme mobile nodes posse’s same mobility pattern to combine with each other and then interchange their resources or buffer space for load balancing and overhead reduction. In [15] authors have discussed the cognitive radio networks (CRNs)and its two classes in Supply Chain Networks which are based on two mechanisms. First mechanism spectrum gets open-access whether the other mechanism is a market-driven mechanism. They have developed a model for analysis of two classes and its transient and equilibrium behaviour. In [16] authors have discussed a technique to avoid congestion in packet-switched Networks by Random Early Detection (RED) gateways. In this technique a gateway discovers congestion by figuring out the average of queue size and informing the connections. When the size exceeds the threshold value then it got marked before they arrived at gateways with certain probability. It reduces the window size in case of congestion to reduce it.

3 Proposed Scheme

One of the most popular methods for spectrum sensing is energy detection of unknown deterministic signals [7]. The noise in the signal is considered to be additive white Gaussian Noise which has zero mean when the signal is deterministic but with unknown form. This procedure is simple and can be executed without having initial information about the PU. The block diagram below (in Figure 2) illustrates the process. The input is first processed through the analog to digital converter which is then fed to the squaring device and then integrated over a certain period of time. This output which is in the form of energy is then compared to a threshold value which has been determined beforehand to tell us if the PU is present or not.

Figure 2 Block Diagram for energy detection technique.

Threshold value may be fixed depending on channel conditions. Two hypotheses namely H $_{1}$ and H $_{0}$ have been designed. In H $_{1}$ the PU is considered to be present while in H $_{0}$ PU is absent. Equations (5) and (6) give the expressions for H $_{1}$ and H $_{0}$ respectively. Equation (8) tells whether the PU signal is present or not.

In the paper the scheme proposed for channel sensing is based on the Double DQN approach with Prioritized Experience Relay. The exact algorithm has been given below and the illustration of the algorithm is given in Figure 3.

INPUT: Initial Q-value of nodes Ci, weight W and state S of each neighbor for all Ci, step size, replay period K and size N, constants a and b budget T. Initialize replay memory H = Ø, = 0, p1 = 1; Observe S0 and choose A0(S0):

Figure 3 Proposed algorithm.

The idea is to not only pick the channel with highest probabilities, it allows the SU (agent) to find the right balance between exploring new states. Optimal state-action is called Quality-value (Q-value) of the state -action pair (s, a) is the sum of discovered future rewards the agent can expert on average after it reaches the states S and choose action a and it is denoted as Q*(s,a). [17].

In the proposed algorithm an agent will initiate taking actions in an environment and memorize the experience as a tuple of state, next state, action, reward and a Boolean value for indicating the termination of the agent. Furthermore, in the Experience Replay step, a batch of a certain shape would get chosen from the memory and training the neural network would be performed on that particular batch. If the Q-value from the next state is a lot different than the Q-value from the current state, that means that the importance is high whether the Q-value in the next state increases or decreases. This difference is called Temporal Difference error (TD error) [18]. As given below in Equation (1).

T.D. ERROR = | Q (s, a) - Q (s + 1, a) |

(1)

Where Q(s,a) is the Q-value (Quality value) of state action pair (s,a). It is the future reward the agent experiences after it reaches state s by taking action a.

The probability of an experienced memory being chosen is given in Equation (2) below:

$P i$	$= \frac{{(T D i - ε)}^{a}}{\sum_{k}^{memory size} {(T D k + ε)}^{a}}$	(2)
$importance$	$= {(\frac{1}{P i} * \frac{1}{memory size})}^{b}$	(3)

Where ‘ $ε$ ’ is a small value to prevent division by zero ‘a’ is a random value between zero and one ‘b’ is a value starting from zero and gradually reaching to 1.

In Equation (3) given above the expression for importance in terms of P $_{i}$ and memory size has been given. In the equations given epsilon ( $ε$ ) is a small value to prevent division by zero and $a$ is a random value between zero and one. If it’s one it chooses the most important memories and if it is zero the batch would get filled randomly and $b$ is a value starting from zero and gradually reaching to 1. So, P $_{i}$ will be the probability that an experience is important and the batch would get filled considering experienced probabilities. But there is one more thing. As you might know, training the network happens stochastically. That means each experience is used for training individually. So, if an experience has a high probability, then this experience would get chosen each time and the neural network would Overfit on this particular experience so in order to overcome this problem we have multiplied the value to the training loss. So, in this formula (Equation (2)) the importance is calculated from the distribution that the experience came from. So, if a probability is high, it won’t get chosen all the time. Therefore, the training loss, J [23] will be calculated as given below in Equation (4):

J = \frac{1}{m} \sum {(y - y h a t)}^{2} * Importance

(4)

Two hypotheses namely H $_{1}$ and H $_{0}$ have been coined. In H $_{1}$ the PU is considered to be present while in H $_{0}$ PU is absent. Equations (5) and (6) give the expressions for H $_{1}$ and H $_{0}$ respectively.

$H_{1} : y (n) = s (n) + u (n)$	(5)
$H_{0} : y (n) = u (n)$	(6)

Here, u(n) – additive white Gaussian Noise with zero mean s(n) – PU random signal with zero mean

Number samples (N) is given by the product of sensing duration $(τ)$ and sampling frequency (f $_{s}$ ) as given below (Equation (7))

N = τ x f_{s}

(7)

Value of T(y) tells us if the PU is present or absent

T (y) = \sum_{n - 1}^{N} {| y (n) |}^{2}

(8)

A target P ${}_{d}^{'}$ must be achieved to prohibit interference between the transmission of PU and SU, for this P ${}_{d}^{'}$ , the value of P $_{f}$ is given in Equation (9)

P f = Q (\sqrt{2 γ + Q^{- 1}} (P d^{'}) + \sqrt{τ (f s) γ})

(9)

Here $γ$ is the Signal to Noise Ratio of the PU transmitter at SU receiver.

Similarly probability of detection is given in Equation (10) if the target probability of false alarm is P ${}_{f}^{'}$

P d = Q (\frac{1}{\sqrt{2 γ + 1}} Q^{- 1} (P f^{'}) - \sqrt{τ (f s) γ})

(10)

For given target probabilities P ${}_{f}^{'}$ and P ${}_{d}^{'}$ , Minimum Number of samples (N $_{\min}$ ) is given in Equation (11) below:

N m i n = \frac{1}{γ^{2}} {[Q^{- 1} (P f^{'}) - Q^{- 1} (P d^{'}) \sqrt{2 γ + 1}]}^{2}

(11)

In all these equations Q(x) is a complementary distribution function given in Equation (12):

Q (x) = \frac{1}{2 π} \int_{x}^{\infty} \exp (\frac{- t^{2}}{2}) d t

(12)

SNR $_{s}$ represents the signal to noise ratio of Secondary point to point link and SNR $_{p}$ represents Signal to Noise Ratio of PU at the secondary receiver. So, in terms of Sensing Interval (T), Sensing Duration ( $τ$ ), Probability of false alarm (P $_{f}$ ), Probability of Detection (P $_{d}$ ), Probability for which the primary user is active (P(H $_{1}$ )), Probability for which the PU is inactive (P(H $_{0}$ )), Capacity of Secondary Network in the absence of PU(C ${}_{0}=$ log $_{2}$ (1+SNR $_{S}$ )) and Capacity of Secondary Network in the presence of PU(C ${}_{1}=$ log $_{2}$ (1+ $\frac{S N R s}{1 + S N R p})$ ), Achievable Throughput ( $φ$ ) of a Secondary Network is given in Equation (13)

φ = \frac{T - τ}{T} [C 0 (1 - P f) P (H 0) + C 1 (1 - P d) P (H 1)]

(13)

The formula for probability of occurrence of undetectable primary user transmission (P $_{UPT}$ ) is given in Equation (14):

P_{UPT} (T) = \frac{P I {α (T - R I (T))}}{P I {α (T - R I (T)) + P I B (T)} + P B β R B (T)}

(14)

Here the parameters P $_{I}$ , P $_{B}$ denote probability of the channel to be idle and busy respectively and P $_{I}$ , P $_{B}$ , R $_{B}$ , R $_{I}$ all depend on $α$ and $β$ . Here $α$ and $β$ represent frequency of channel state transition. From Equations (13) and (14) we can study and compute the values of P $_{UPT}$ and $φ$ and draw the curves for a better analysis.

4 Results

By the two different approaches we have obtained two different plots for a comparative analysis. We have performed the simulations on Matlab. For the energy detection method, the system was simulated at the values – f ${}_{s}= 6$ MHz, SNR ${}_{p}= - 15$ dB, SNR ${}_{S}= 20$ dB, P(H ${}_{0}) = 0.7$ and P(H $_{1}$ ) now for $τ = 2.55$ ms we obtained Figure 4 which illustrates the ROC (Receiver Operating Characteristics).

According to our Double Deep Q-Learning approach we received the following plots (Figure 5) which shows the learning curve with a sum of rewards also called Q-value as we can see with time few hundreds of episodes, the network is capable of collecting large amounts of rewards in form of Q-value, it shows agent learns to select optimal channel in that environment. The first plot of Figure 5 is a plot between P $_{f}$ and P $_{d}$ for comparing the quality of spectrum sensing with the energy detection approach. The parameters taken to plot the graphs according to the double deep Q learning approach are – Protocol: IEEE802.22-2011, MIN_freq is 54 MHz, MAX_freq is 862 MHz, CB (Channel Bandwidth) is 8 MHz, Modulation – QPSK, Coding Rate – 1/2, Transmitter Power – 1 to 4000 mW.

Figure 4 ROC (receiver operating characteristics) at different frequency.

Figure 5 P $_{d}$ VS P $_{f}$ and sum of rewards VS episode plots.

The next results according to the energy detection model that we have obtained simply convey that both P $_{UPT}$ and Throughput depend on sensing interval and both increase as T is increased. From Figure 6 it is clear that for the same throughput, P $_{UPT}$ is more if the channel changes its the same SNR value as shown in Figure 7.

Figure 6 P $_{UPT}$ (undetectable primary user transmission) vs throughput.

Figure 7 Number of samples VS SNR.

Figure 8 Probability of detection VS number of samples.

Figure 9 Throughput VS time.

Figure 10 Throughput VS sensing interval.

Figure 11 Congestion VS time.

In the Figure 8 shown below it is visible that if the target probability of detection is increased, more samples(N) are required i.e more sensing duration and less throughput.

In the Figure 9 shown above we generated a curve by the double deep Q learning approach between Throughput and Time. With the increase in sensing interval, throughput increases upto a certain level and then saturates as it is visible from Figure 10.

In the Figure 11 above the accumulation of data is shown as time passes. In a network before transmission data spends sometime in the queue this plot illustrates the data in kB and the time for which it is congested. The figures below namely 12, 13, 14 provide us with an overall perspective on how the double Q learning process is better than the energy detection.

Figure 12 P $_{d}$ VS number of samples.

Figure 13 Number of samples VS SNR.

Figure 14 P $_{UPT}$ VS throughput.

The figures above help us draw a comparison between the Double Q learning approach and energy detection approach. It is evident that by using double deep Q learning as illustrated in Figure 12 the probability of detection is higher for a given number of sample channels than the data obtained from energy detection method shown in Figure 8. If the number of samples $=$ 4000, the probability of detection is very close to 0.9 from Figure 12 and from Figure 8 the probability of detection comes out to be around 0.87. Which proves the Double Deep Q learning Method increases the probability of detection. Comparing Figures 7 and 13 on keeping the value of SNR $=$ $-$ 18 dB the value of number of samples from Figure 13 comes out to be approximately 25000 and from Figure 7 it comes out to be approximately 27000 which shows the for a given value Signal to noise ratio number of samples needed are much less in the Double Deep Q learning method. Similarly Figures 6 and 14 show that for a given value of throughput the probability of undetectable primary user transmission is much less if we use the double deep Q learning approach.

5 Conclusion

In the paper we studied the impact of various systems, methods and parameters on spectrum sensing for cognitive radio networks. We can conclude that the double deep Q learning approach is more efficient and performs better as compared to the energy detection approach. This method can be used for various surveillance operations. Comparing Figures 4 and 5 it is evident that in the double deep Q learning method P $_{d}$ is maximised and P $_{f}$ is minimised. In Figure 4 at frequency $=$ 8 MHz if the value of P $_{f}$ is to be zero, we get a mere 0.2 value of P $_{d}$ . And in the fifth figure we can see if P $_{f}$ is to be zero the value of P $_{d}$ comes out to a whooping 0.7. In Figure 9 it is visible that even at a low value of sensing interval throughput is high which is not observed in Figure 10. Keeping in mind the results of both the approaches we can finally conclude that as throughput increases, probability of undetectable PU transmission also increases. To reduce P $_{UPT}$ we can reduce the frame size but reducing the frame size beyond a particular point will lead to a decrease in throughput.

References

[1] Ahlam Saud Althobaiti et al. “Medium Access Control Protocol for Wireless Sensor Networks Classification and Cross-Layering”, ICCMIT 0125.

[2] Amruta Lipare et al. “Energy efficient Routing Structure to avoid Energy Hole Problem in Multi-Layer Network Model” Wireless Personel Communication 23th January 2020.

[3] C. Sha, H. Chen, C. Yao, Y. Liu, R. Wang, “A Type of Energy Hole Avoiding Method Based on Synchronization of Nodes in Adjacent Annuluses for Sensor Network” Hindawi Publishing Corporation International Journal of Distributed Sensor Networks, Volume 2016, Article ID 5828956.

[4] M. Carlos-Mancilla, E. Lopez-Mallado, M Siler,” Wireless Sensor Network Formation approaches and Techniques”, Hindawi Publishing Corporation Journal of Sensors, Volume 2016, 1st February 2016.

[5] A. Al-Saadi, R. Setchi, Y. Hicks, S.M. Allen, “Routing Protocol for Heterogeneous Wireless Mesh Networks”, IEEE Transactions on Vehicular Technology, Vol. 65, No. 12, December 2016.

[6] S. Movassaghi, A. Majidi, A. Jamalipour, D. Saremith, M. Abolhasan, “Enabling Interference-Aware and Energy-Efficient Coexistence of Multiple Wireles Body Area Network with Unknown Dynamics” IEEE Access, Special section on Body Area Network for Interdisciplinary research, 7 July 2016.

[7] N. Jan, N. Javaid, Q. Javaid, N. Alrajeh, et al., “A Balanced Energy-Consuming and Hole-Alleviating Algorithm for Wireless Sensor Networks”, IEEE Access, 17 May 2017.

[8] V. Nguyen, P. LIN, R. Hwang, “Energy Depletion Attack in Low Power Wireless Networks”, IEEE Access 29 April 2019.

[9] H.H.M. Mahmoud, T. Ismail, M.S. Darweesh, “Dynamic Traffic Model with Optimal Gateways Placement in IP Cloud Heterogenous CRAN”, IEEE Access, 8 July 2020.

[10] S. Anamalamudi, A.R. Sangi, M. Alkatheiri, A.M. Ahmed, “AODV routing protocol for Cognitive Radio access-based Internet of Things (IoT)”, future Generation Computer Systems, 29 December 2017.

[11] F.A. Awin, Y.M. Alginahi, E.A. Raheem and K. Tepe, Technical Issues on Cognitive Radio-Based Internet of Things System: A Survey”, IEEE Access.

[12] F. Bouabdallah, C. Zidi, R. Boutaba, “Joint Routing and Energy Management in Underwater Acoustic Sensor Networks”, IEEE Transaction on Network and Service Management, 2017.

[13] Y.C. Liang, K.C. Chen, G.Y. Li, P. Mahonen, “Cognitive Radio Networking and Communication: An Overview”, IEEE transaction on vehicular technology, Vol. 60, No. 7, September 2011.

[14] H. Dang, H. Wu,” Clustering and Cluster-Based Routing Protocol for Delay-Tolerant Mobile Networks”, IEEE Transaction on Wireless Communication, Vol. 9, No. 6, 6 June, 2010.

[15] S. Haykin, P. Setoodeh, “Cognitive Radio Network the Supply Chain Paradigm”, IEEE Transaction on Cognitive Communication and Networking, 2015.

[16] S. Floyd, V. Jacobson, “Random Early Detection Gateways for Congestion Avoidance”, IEEE/ACM Transaction on Networking Vol. 1, No. 4, 4 August 1993.

[17] T. Schaul, J. Quan, I. Antonoglou, and D. Silver, “Prioritized experience replay,” arXiv [cs.LG], 2015.

[18] H. van Hasselt, A. Guez, and D. Silver, “Deep reinforcement learning with Double Q-learning,” arXiv [cs.LG], 2015.

[19] Core.ac.uk. [Online]. Available: https://core.ac.uk/download/pdf/297012544.pdf. [Accessed: 26-Apr-2021] s

[20] H. Urkowitz, “Energy detection of unknown deterministic signals,” IEEE Proceedings, vol. 55, no. 4, pp. 523–531, Apr. 1967.

[21] P. Verma and B. Singh, “Throughput analysis in cognitive radio networks,” 2014 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2014, pp. 1199–1203, doi: 10.1109/ICACCI.2014.6968301.

[22] J. Mitola III and G. Q. Maguire, “Cognitive radio: making software radios more personal”, Personal Communications, IEEE, 1999.

[23] https://medium.com/@parsa\_h\_m/deep-reinforcement-learning-dqn-double-dqn-dueling-dqn-noisy-dqn-and-dqn-with-prioritized-551f621a9823

Biographies

Ranjita Joon, Assistant Professor in computer science department at Pt. JLN Government College, Faridabad. Her areas of interest include cognitive radio networks, wireless sensor networks, and data mining. She is currently pursuing her Phd degree in the area of cognitive radio networks at J.C.Bose University of Science & Technology, Faridabad.

Parul Tomar, Associate Professor in department of computer engineering at JC Bose University of Science & Technology, YMCA, Faridabad. Her areas of interst are Database, IoT, Adhoc Networks and security. She is having more than 18 years of experience. She has authored more than 50 papers in various journals of repute.