Variance Estimation Procedure Using Scrambled Responses and Multi-Auxiliary Variables In Multi-Phase Sampling

Nadia Mushtaq

Department of Statistics, Forman Christian College University, Lahore, Pakistan

E-mail: nadiamushtaq@fccollege.edu.pk

Received 08 October 2020; Accepted 01 March 2021; Publication 11 June 2021

Abstract

Variations in the population can be estimated by variance estimation. In this study, we consider variance estimation procedure using scrambled randomized response for sensitive variable using multi-auxiliary variables in multi-phase sampling. Under Noor-ul-Amin et al. (2018) RRT model, generalized exponential regression type estimator for case-1and case-2 are derived. A simulation study is presented to illustrate the application and computational details. It is observed that proposed model showed better results under both cases.

Keywords: Variance estimation, multi-auxiliary variables, scrambled randomized response.

1 Introduction

The problems encountered in behavioral and social sciences surveys are expecting the refusal or biasness for sensitive questions such as related to drug addiction, HIV/AIDS disease, gambling, sexual behavior etc. In such situations, participants of the surveys may refuse to respond or give false answers. So, refusal to answers or misleading answers comprise the bias and non-sampling errors. Warner (1965) introduced RRT (Randomized Response Technique) to reduce response error problems for sensitive questions. Eichhorn and Hayre (1983) proposed multiplicative RRT for quantitative data. Furthermore, scrambled response models suggested by different authors such as: additive scrambling model (Himmelfarb and Edgel, 1980), subtractive scrambling model (Hussain, 2012), or mixture of additive and multiplicative scrambling models (Huang, 2010).

Variations are occurring in many practical situations such as environmental, genetical, economic studies etc. Variations in the populations can be estimated by variance estimation of population about the existence of variability in it and used for future surveys, predictions, or sample size determinations etc. For estimating variance by using of auxiliary was considered by authors: Das and Tripathi (1978), Isaki (1983), Singh and Joarder (1998), Shabbir and Gupta (2010), Asghar et al. (2014), Yadav et al. (2015) Yasmeen et al. (2018). Neyman (1938) introduced the concept of two-phase, where sample selection can be done in two-phases, in the first phase information can be collected for the auxiliary variable for large sample and in second phase information can be observed for study variable for relatively small sample.

As in multipurpose surveys we need to estimate population parameters for several variables such as in socio-economic surveys multiple variables may be size of houses, monthly incomes, number of unemployed persons etc. The objective of this paper is to present variance estimation procedure using scrambled randomized response for sensitive study variable using multi-auxiliary variables in multi-phase sampling. From available literature, we have noticed that no contribution on variance estimation for sensitive variable using multi-auxiliary variables in multi-phase sampling. In computational procedure we discuss some special cases of the estimators.

2 Variance Estimation Procedure in RRT

Let Y be the sensitive study variable Xi(i=1,2,3,,n) be n non-sensitive auxiliary variables which are correlated with Y in population. Let S be the scrambling variables independent of Y and Xi(i=1,2,3,,n). We are interested to estimate population variance of sensitive variables using multi-auxiliary variables under following cases.

The sampling strategy of the cases given as below:

Case 1: The information on all auxiliary variables are known (we use single phase sampling).

Case 2: The information on all auxiliary variables are unknown. As on the first phase large sample n1(n1U) is drawn to observe only auxiliary variables. The second phase sample n2(n2n1) is drawn to observe sensitive variable of interest Y and all non-sensitive auxiliary variables (n2<n1).

Let Sxi2 be the variance of the population and sxi(l)2 be the sample variances of the auxiliary variable xi(i=1,2,,n), at the l-th phases (l=1,2,,q). The reported optional scrambled response for Y is given by Z=Y+kS, following Noor-ul-Amin et al. (2018).

Let sz2=1n(l)-1j=1n(l)(zj-z¯(l))2, sxi(l)2=1n(l)-1i=1n(l)(xi-x¯(l))2 are the sample variances of the reported response and the auxiliary variables xi(i=1,2,,n) respectively for the l-th phases (l=1,2,,q) having sample means z¯(l)=1n(l)j=1n(l)zj and x¯i(l)=1n(l)j=1n(l)xij.

For l-th phases (l=1,2,,q), we define the following notations:

Let sz(l)2=Sz2(1+ez(l)), sxi(l)2=Sxi2(1+exi(l))

Such that

θ(l)=1n(l),(l=1,2,,q)
Λz(1×1)=[ez(l)],Λx(1×n)=[ex1(l),ex2(l),,exn(l)]
E(ΛzΛz)=θlz(1×1),E(ΛxΛx)=θlx(n×n),
E(ΛxΛz)=θlxz(n×1).

Where

x(2×2)=[var(x1)cov(x1x2)cov(x2x1)var(x2)]

Let Sz2 be the population variance and its unbiased variance given as:

to=Sz2andvar(to)=θlSSz(1×1). (1)

2.1 Proposed Generalized Exponential Regression Type Estimator Using Randomized Response Technique (RRT) in Case 1

For estimating the variance of the population for sensitive variable using multi-auxiliary non-sensitive variables under case 1 given as:

tz=[tz](1×1); (2)

where

tz=sz(l)2+i=1nβi(Sxi2-sxi(l)2)expi=1n(Sxi2-sxi(l)2Sxi2+sxi(l)2); (3)

By solving (2), we have following expressions given below:

tz =[Sz2(1+ez(l))+i=1nβi(Sxi2-Sxi2(1+exi(l)))]
×expi=1n(Sxi2-Sxi2(1+exi(l))Sxi2-Sxi2(1+exi(l)))
tz =[Sz2+Sz2ez(l)-i=1nβiSxi2exi(l)]
×expi=1n-exi(l)2(1+-exi(l)2)-1,
tz =[Sz2+Sz2ez(l)-i=1nβiSxi2exi(l)]
×[1+i=1n{-exi(l)2(1-exi(l)2+exi(l)24±)}]
tz =(Sz2+Sz2ez(l)-i=1nβiSxi2exi(l))(1-i=1nexi(l)2)
tz =(Sz2+Sz2ez(l)-i=1nβiSxi2exi(l)-Szj2i=1nexi(l)2)
tz =(Sz2+Sz2ez(l)-12i=1n(2βiSxi2Sz2+1)exi(l))
tz-Sz2 =Sz2(ez(l)-12i=1n(2βiΨi+1)exi(l))

Where Ψi=Sxi2Sz2

tz=[tz](1×1)=[Sz2(ez(l)-12i=1n(2βijΨi+1)exi(l))](1×1) (4)

For expectations, we proceed as,

tz(1×1) =E(tz(1×1)-S(1×1))(tz(1×1)-S(1×1))
tz(1×1) =Sz4E(ez(l)(1×1)-12ex(l)(1×m)Φ(n×1))
(ez(l)(1×1)-12ex(l)(1×n)Φ(n×1)) (5)

Where,

Φ(n×1)=(2βiΨi+1)(n×1)

By applying expectations, we have the following results

tz(1×1) =SSθl(z(1×1)-12zx(1×n)Φ(n×1)-12Φ(1×n)zx(n×1)
+14Φ(1×n)x(n×n)Φ(n×1)) (6)

We differentiate the (5) w.r.t Φ and get the optimum value of Φ as,

Φopt(n×1)=2zx(n×1)x(n×n) (7)

Using (6) in (5), we get the minimum value of MSE of as:

mintz(1×1)=SSθl(z(1×1)-zx(1×n)x(n×n)-1zx(n×1)) (8)

Remarks 1:

1. If we replace l=1 in (2), so it is single phase sampling using q auxiliary variables given as:

tz1=sz2+i=1nβi(Sxi2-sxi2)expi=1n(Sxi2-sxi2Sxi2+sxi2) (9)

2. For single auxiliary variables we replace i=1 and l=1 in (2), we have single phase sampling as:

tz2=sz2+β(Sx2-sx2)exp(Sx2-sx2Sx2+sx2) (10)

3. If we replace l=1 in (2), and for two auxiliary variables in single phase sampling given following expressions:

tz3=sz2+i=12βi(Sxi2-sxi2)expi=12(Sxi2-sxi2Sxi2+sxi2) (11)

2.2 Generalized Exponential Regression Type Estimator Using RRT in Case 2

Let sy(l+1)2 be the sample variance of the sensitive study variables is selected at the (l+1)th phases. Also sxi(l)2 and sxi(l+1)2 be the sample variances of the auxiliary variable xi(i=1,2,,n), at the l-th and (l+1)-th phases of size n(l) and n(l+1) respectively. The population variance Sxi2 of all multi-auxiliary variables is unknown.

ts=[ts](1×1); (12)

Where

ts=sz(l+1)2+i=1nβi(sxi(l)2-sxi(l+1)2)expi=1n(sxi(l)2-sxi(l+1)2sxi(l)2+sxi(l+1)2); (13)

By solving (12), we have the following expressions:

ts =[Sz2(1+ez(l+1))+i=1nβi(Sxi2(exi(l)-exi(l+1)))]
×expi=1n(Sxi2(exi(l)-exi(l+1))2Sxi2+Sxi2(exi(l)+exi(l+1)))
ts =[Sz2+Sz2ez(l+1)-i=1nβiSxi2(exi(l)-exi(l+1))]
×expi=1n(exi(l)-exi(l+1))2(1+(exi(l)-exi(l+1))2)-1,
ts =[Sz2+Sz2ez(l+1)-i=1nβiSxi2(exi(l)-exi(l+1))]
×[1+i=1n{(exi(l)-exi(l+1))2(1-(exi(l)-exi(l+1))2±)}]
ts =(Sz2+Sz2ez(l+1)-i=1nβiSxi2(exi(l)-exi(l+1)))
×(1-i=1n(exi(l)-exi(l+1))2)
ts =(Sz2+Sz2ez(l+1)-i=1nβiSxi2(exi(l)-exi(l+1))
-Szj2i=1n(exi(l)-exi(l+1))2)
ts =(Sz2+Sz2ez(l+1)-12i=1n(2βiSxi2Sz2+1)(exi(l)-exi(l+1)))
ts-Sz2 =Sz2(ez(l+1)-12i=1n(2βiΨij+1)(exi(l)-exi(l+1)))

Where Ψi=Sxi2Sz2

ts=[ts](1×1)=[Sz2(ez(l+1)-12i=1n(2βiΨi+1)(exi(l)-exi(l+1)))](1×1) (14)

For expectations, we proceed as:

ts(1×1) =E(ts(1×1)-S(1×1))(ts(1×1)-S(1×1))
ts(1×1) =Sz4E(ez(l+1)(1×1)-12(ex(l)(1×n)-ex(l+1)(1×n))Φ(n×1))
×(ez(l+1)(1×1)-12(ex(l)(1×n)-ex(l+1)(1×n))Φ(n×1)) (15)

Where,

Φ(n×1)=(2βiΨi+1)(n×1)

By applying expectations on (14), we have the following results

ts(1×1)=SS(θ(l+1)z(1×1)-(θ(l)-θ(l+1))12zx(1×n)Φ(n×1)-(θ(l)-θ(l+1))12Φ(1×n)zx(1×n)+(θ(l)-θ(l+1))14Φ(1×n)x(n×n)Φ(n×1)) (16)

We differentiate (15) w.r.t Φ and get the optimum value of Φ as,

Φopt(n×1)=2zx(n×1)x(n×n) (17)

Using (16) in (15), we get the minimum value of MSE ofas:

mints(1×1)=SS(θ(l+1)z(1×1)-(θ(l)-θ(l+1))zx(1×n)x(n×n)zx(n×1)) (18)

Remarks 2:

1. If we replace l=1 in (12), so it is two-phase sampling using q multi-auxiliary variables:

ts1=sz(2)2+i=1nβi(sxi(1)2-sxi(2)2)expi=1n(sxi(1)2-sxi(2)2sxi(1)2+sxi(2)2) (19)

2. We replace i=1 and l=1 in (12) so it is two-phase sampling for single auxiliary variable:

ts2=sz(2)2+β(sx(1)2-sx(2)2)exp(sx(1)2-sx(2)2sx(1)2+sx(2)2) (20)

3. If we replace l=1 in (12), it is two auxiliary variables two-phase sampling given following expressions:

ts3=sz(2)2+i=12βi(sxi(1)2-sxi(2)2)expi=12(sxi(1)2-sxi(2)2sxi(1)2+sxi(2)2) (21)

3 Computational Procedure

We consider computational and application of proposed estimators by using two type of simulation studies given below:

3.1 We use the simulation studies for efficiency comparison by empirically and theoretically. Two populations for simulation studies of size 1000 each from bivariate normal populations for (Y,X), with different covariance matrices are used. The Scrambling variable SN(0,0.1σx) and Z=Y+kS, k=-1,-0.5,0.5,1.

Mean of [Y,X] given as μ=[2,2]

Population 1:

=[91.91.94],ρXY=0.3209

Population 2:

=[10332],ρXY=0.6746;

Population 3:

=[6332],ρXY=0.8684;

For each population we considered three sample sizes for first phase: n=250,500 and for second phase given as: n1=100,200 respectively.

3.2 In this computational procedure, we consider single and two auxiliary variables for both cases given as:

Yi=RXi+ei, where eiN(0,1) and R=1.5. (Sensitive Study Variable)
XiG(a,b), where a=2 and b=3. (Auxiliary Variable)
wiG(a,b), where a=2 and b=10. (Another auxiliary variable)
SB(α,β), where α=6.5 and β=0.5. (Scrambling Variable)
Zi=Yi+kSi, i=1,2,3,,n, k=-1,-0.5,0.5,1. (Randomized reported response)
N=3000,

We computed percent relative efficiencies of the proposed variance estimators regarding to t0 for case-I and case-2 given as:

PRE=t=15000(t0-Sz2)2t=15000(tm-Sz2)2×100,m=z2,z3,s2,s3.

The results of the simulation study are as in Tables 1 and 2.

Table 1 Percent relative efficiencies of the proposed estimators for Section 3.1

Case-I, PRE Case-2, PRE

=1000 k n to tz2 n1 to ts2
Pop. 1 ρXY=0.3209 -1 500 100 100.88 200 100 100.35
250 100 101.35 100 100 100.96
-0.5 500 100 100.64 200 100 100.67
250 100 101.94 100 100 100.49
0.5 500 100 100.98 200 100 100.32
250 100 100.25 100 100 100.29
1 500 100 100.51 200 100 100.11
250 100 100.58 100 100 100.01
Pop. 2 ρXY=0.6746 -1 500 100 118.59 200 100 111.86
250 100 119.72 100 100 109.94
-0.5 500 100 121.78 200 100 110.25
250 100 128.28 100 100 115.20
0.5 500 100 122.51 200 100 126.78
250 100 124.03 100 100 116.63
1 500 100 127.95 200 100 112.84
250 100 123.67 100 100 128.88
Pop. 3 ρXY=0.8684 -1 500 100 228.51 200 100 150.35
250 100 211.63 100 100 145.95
-0.5 500 100 212.78 200 100 147.56
250 100 214.83 100 100 148.37
0.5 500 100 236.52 200 100 151.06
250 100 216.48 100 100 144.06
1 500 100 213.12 200 100 146.64
250 100 202.28 100 100 144.47

4 Main Findings

The results of the estimators for case-1 and case-2 are presented in Tables 12 for PRE. Major findings of the estimators are:

i. PREs are higher than usual variance estimator, which are shown in Tables 1 and2 respectively, which show the efficiency of the proposed estimator estimators.

ii. From Table 1, it is observed that proposed estimators have the highest efficiency when the value of correlation coefficient is highest.

iii. It is observed that the value of k varies from -1 to +1, the value of PRE is increase of the proposed estimators in both cases.

iv. In Table 1, we consider single and two-phase sampling using single auxiliary variable. So, we also noticed that the value of sample size increases efficiency of the estimators also increases.

v. In Table 2, we consider Section 3.2 and compute results for single and two auxiliary variables for single and two-phase sampling. So, by increasing sample size efficiency of the estimators also increases.

vi. From Remark 1, it is shown that tz2 and tz3 are the special cases of tz that are for single and two-auxiliary variables respectively. From Table 2, case-1 the efficiency for proposed two-auxiliary variables estimator increased.

vii. From Table 2, we noticed that when the value of k increases the efficiency of the proposed estimators is increased.

Table 2 PRE of the proposed estimators in case-1 and case-2 with respect to to for Section 3.2

PRE for case-1 PRE for case-2

k n to tz2 tz3 n1 to ts2 ts3
-1 250 100 127.43 114.77 100 100 115.59 120.09
500 100 124.51 152.49 200 100 115.20 103.37
-0.5 250 100 128.69 117.27 100 100 114.97 111.31
500 100 128.01 117.56 200 100 116.19 110.84
0.5 250 100 139.96 146.93 100 100 121.81 135.71
500 100 123.28 133.87 200 100 112.52 104.30
1 250 100 113.89 131.99 100 100 113.66 115.00
500 100 137.66 148.05 200 100 119.49 116.98

From above discussions, we can conclude that the proposed estimators performed better as compared to population variance. So computational procedure supports the theoretical findings in both cases in randomized response technique.

5 Concluding Remarks

The objective of the survey sampling techniques is to estimate population characteristics with precision, and it can be increased by using proper methodology. In this study, we have suggested variance estimation procedure using scrambled randomized response and multi-auxiliary variables for multi-phase sampling. The proposed estimators are more efficient than usual variance estimator which is shown in Tables 1 and 2 for case-1 and case-II. We use different sample sizes and value of k lie between (-1,1) for different scramble responses for estimating MSEs of the proposed estimators. From computational procedure, it can be shown that proposed estimators are more efficient and helpful in estimation of variance for sensitive variable using RRT.

References

[1] Asghar, A., Sanaullah, A. and Hanif, M. (2014). Generalized exponential type estimator for population variance in survey sampling. Revista Colombiana de Estadística, 37(1), 213–224.

[2] Das, A.K. and Tripathi, T.P. (1978): Use of auxiliary information in estimating the finite population variance. Sankhya, 40, C, 139–148.

[3] Eichhorn, B.H. and Hayre, L.S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference, 7(4), 307–316.

[4] Himmelfarb, S. and Edgell, S.E. (1980). Additive constants model: A randomized response technique for eliminating evasiveness to quantitative response questions. Psychological Bulletin, 87(3), 525–530.

[5] Huang, K.C. (2010). Unbiased estimators of mean, variance and sensitivity level for quantitative characteristics in finite population sampling. Metrika, 71(3), 341–352.

[6] Hussain, Z. (2012). Improvement of the Gupta and Thornton scrambling model through double use of randomization device. International Journal of Academic Research in Business and Social Sciences, 2(6), 91–97.

[7] Isaki, C.T. (1983). Variance estimation using auxiliary information. Journal of the American Statistical Association, 78(381), 117–123.

[8] Shabbir, J., Gupta, S. (2010) Some estimators of finite population variance of stratifed sample mean, Communication in Statistics – Theory and Methods 39, 3001–3008.

[9] Singh, S., Joarder, A.H. (1998). Estimation of finite population variance using random nonresponse in survey sampling, Metrika 47, 241–249, 1998.

[10] Neyman, J. (1938) Contributions to the Theory of Sampling Human Populations. Journal of the American Statistical Association, 33, 101–116.

[11] Noor-ul-Amin, M., Mushtaq, N., and Hanif, M. (2018). Estimation of mean using generalized optional scrambled responses in the presence of nonsensitive auxiliary variable, Journal of Statistics and Management Systems. 21(2), 287–304.

[12] Warner, S.L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63–69.

[13] Yadav, S.K., Kadilar, C., Shabbir, J. and Gupta, S. (2015) Improved family of estimators of population variance in simple random sampling. Journal of Statistical Theory and Practice, 9(2), 9219–9226.

[14] Yasmeen, U., Noor-ul-Amin, M., Hanif, M. (2018). Exponential Estimators of Finite Population Variance Using Transformed Auxiliary Variables.   Proceedings of the National Academy of Sciences, India Section A: Physical. https://doi.org/10.1007/s4001

Biographies

images

Nadia Mushtaq received her MSc and MPhil degrees in Statistics from Quaid-i-Azam University Islamabad, Pakistan and PhD degree in Statistics from National College of Business administration & Economics Lahore, Pakistan. Dr. Mushtaq is currently working as an Assistant Professor at Forman Christian College Lahore, Pakistan. She has more than fifteen years of teaching/research experience at university. Her research interests include sampling techniques, Time series analysis and statistical data analysis using different statistical software such as: SPSS, SAS, Minitab, and R-Language. She published ten research papers in national and international Journals.

Abstract

1 Introduction

2 Variance Estimation Procedure in RRT

2.1 Proposed Generalized Exponential Regression Type Estimator Using Randomized Response Technique (RRT) in Case 1

2.2 Generalized Exponential Regression Type Estimator Using RRT in Case 2

3 Computational Procedure

4 Main Findings

5 Concluding Remarks

References

Biographies