Variance Estimation Procedure Using Scrambled Responses and Multi-Auxiliary Variables In Multi-Phase Sampling
Nadia Mushtaq
Department of Statistics, Forman Christian College University, Lahore, Pakistan
E-mail: nadiamushtaq@fccollege.edu.pk
Received 08 October 2020; Accepted 01 March 2021; Publication 11 June 2021
Variations in the population can be estimated by variance estimation. In this study, we consider variance estimation procedure using scrambled randomized response for sensitive variable using multi-auxiliary variables in multi-phase sampling. Under Noor-ul-Amin et al. (2018) RRT model, generalized exponential regression type estimator for case-1and case-2 are derived. A simulation study is presented to illustrate the application and computational details. It is observed that proposed model showed better results under both cases.
Keywords: Variance estimation, multi-auxiliary variables, scrambled randomized response.
The problems encountered in behavioral and social sciences surveys are expecting the refusal or biasness for sensitive questions such as related to drug addiction, HIV/AIDS disease, gambling, sexual behavior etc. In such situations, participants of the surveys may refuse to respond or give false answers. So, refusal to answers or misleading answers comprise the bias and non-sampling errors. Warner (1965) introduced RRT (Randomized Response Technique) to reduce response error problems for sensitive questions. Eichhorn and Hayre (1983) proposed multiplicative RRT for quantitative data. Furthermore, scrambled response models suggested by different authors such as: additive scrambling model (Himmelfarb and Edgel, 1980), subtractive scrambling model (Hussain, 2012), or mixture of additive and multiplicative scrambling models (Huang, 2010).
Variations are occurring in many practical situations such as environmental, genetical, economic studies etc. Variations in the populations can be estimated by variance estimation of population about the existence of variability in it and used for future surveys, predictions, or sample size determinations etc. For estimating variance by using of auxiliary was considered by authors: Das and Tripathi (1978), Isaki (1983), Singh and Joarder (1998), Shabbir and Gupta (2010), Asghar et al. (2014), Yadav et al. (2015) Yasmeen et al. (2018). Neyman (1938) introduced the concept of two-phase, where sample selection can be done in two-phases, in the first phase information can be collected for the auxiliary variable for large sample and in second phase information can be observed for study variable for relatively small sample.
As in multipurpose surveys we need to estimate population parameters for several variables such as in socio-economic surveys multiple variables may be size of houses, monthly incomes, number of unemployed persons etc. The objective of this paper is to present variance estimation procedure using scrambled randomized response for sensitive study variable using multi-auxiliary variables in multi-phase sampling. From available literature, we have noticed that no contribution on variance estimation for sensitive variable using multi-auxiliary variables in multi-phase sampling. In computational procedure we discuss some special cases of the estimators.
Let be the sensitive study variable be non-sensitive auxiliary variables which are correlated with in population. Let be the scrambling variables independent of and . We are interested to estimate population variance of sensitive variables using multi-auxiliary variables under following cases.
The sampling strategy of the cases given as below:
Case 1: The information on all auxiliary variables are known (we use single phase sampling).
Case 2: The information on all auxiliary variables are unknown. As on the first phase large sample is drawn to observe only auxiliary variables. The second phase sample is drawn to observe sensitive variable of interest and all non-sensitive auxiliary variables .
Let be the variance of the population and be the sample variances of the auxiliary variable , at the -th phases . The reported optional scrambled response for is given by , following Noor-ul-Amin et al. (2018).
Let , are the sample variances of the reported response and the auxiliary variables respectively for the -th phases having sample means and .
For -th phases , we define the following notations:
Let ,
Such that
Where
Let be the population variance and its unbiased variance given as:
(1) |
For estimating the variance of the population for sensitive variable using multi-auxiliary non-sensitive variables under case 1 given as:
(2) |
where
(3) |
By solving (2), we have following expressions given below:
Where
(4) |
For expectations, we proceed as,
(5) |
Where,
By applying expectations, we have the following results
(6) |
We differentiate the (5) w.r.t and get the optimum value of as,
(7) |
Using (6) in (5), we get the minimum value of MSE of as:
(8) |
Remarks 1:
1. If we replace in (2), so it is single phase sampling using q auxiliary variables given as:
(9) |
2. For single auxiliary variables we replace and in (2), we have single phase sampling as:
(10) |
3. If we replace in (2), and for two auxiliary variables in single phase sampling given following expressions:
(11) |
Let be the sample variance of the sensitive study variables is selected at the th phases. Also and be the sample variances of the auxiliary variable , at the -th and -th phases of size and respectively. The population variance of all multi-auxiliary variables is unknown.
(12) |
Where
(13) |
By solving (12), we have the following expressions:
Where
(14) |
For expectations, we proceed as:
(15) |
Where,
By applying expectations on (14), we have the following results
(16) |
We differentiate (15) w.r.t and get the optimum value of as,
(17) |
Using (16) in (15), we get the minimum value of MSE ofas:
(18) |
Remarks 2:
1. If we replace in (12), so it is two-phase sampling using multi-auxiliary variables:
(19) |
2. We replace and in (12) so it is two-phase sampling for single auxiliary variable:
(20) |
3. If we replace in (12), it is two auxiliary variables two-phase sampling given following expressions:
(21) |
We consider computational and application of proposed estimators by using two type of simulation studies given below:
3.1 We use the simulation studies for efficiency comparison by empirically and theoretically. Two populations for simulation studies of size 1000 each from bivariate normal populations for , with different covariance matrices are used. The Scrambling variable and ,
Mean of given as |
Population 1:
Population 2:
Population 3:
For each population we considered three sample sizes for first phase: ,500 and for second phase given as: ,200 respectively.
3.2 In this computational procedure, we consider single and two auxiliary variables for both cases given as:
, where and . | (Sensitive Study Variable) |
, where and . | (Auxiliary Variable) |
, where and . | (Another auxiliary variable) |
, where and . | (Scrambling Variable) |
, , | (Randomized reported response) |
, |
We computed percent relative efficiencies of the proposed variance estimators regarding to for case-I and case-2 given as:
The results of the simulation study are as in Tables 1 and 2.
Table 1 Percent relative efficiencies of the proposed estimators for Section 3.1
Case-I, PRE | Case-2, PRE | |||||||
N 1000 | ||||||||
Pop. 1 | 1 | 500 | 100 | 100.88 | 200 | 100 | 100.35 | |
250 | 100 | 101.35 | 100 | 100 | 100.96 | |||
0.5 | 500 | 100 | 100.64 | 200 | 100 | 100.67 | ||
250 | 100 | 101.94 | 100 | 100 | 100.49 | |||
0.5 | 500 | 100 | 100.98 | 200 | 100 | 100.32 | ||
250 | 100 | 100.25 | 100 | 100 | 100.29 | |||
1 | 500 | 100 | 100.51 | 200 | 100 | 100.11 | ||
250 | 100 | 100.58 | 100 | 100 | 100.01 | |||
Pop. 2 | 1 | 500 | 100 | 118.59 | 200 | 100 | 111.86 | |
250 | 100 | 119.72 | 100 | 100 | 109.94 | |||
0.5 | 500 | 100 | 121.78 | 200 | 100 | 110.25 | ||
250 | 100 | 128.28 | 100 | 100 | 115.20 | |||
0.5 | 500 | 100 | 122.51 | 200 | 100 | 126.78 | ||
250 | 100 | 124.03 | 100 | 100 | 116.63 | |||
1 | 500 | 100 | 127.95 | 200 | 100 | 112.84 | ||
250 | 100 | 123.67 | 100 | 100 | 128.88 | |||
Pop. 3 | 1 | 500 | 100 | 228.51 | 200 | 100 | 150.35 | |
250 | 100 | 211.63 | 100 | 100 | 145.95 | |||
0.5 | 500 | 100 | 212.78 | 200 | 100 | 147.56 | ||
250 | 100 | 214.83 | 100 | 100 | 148.37 | |||
0.5 | 500 | 100 | 236.52 | 200 | 100 | 151.06 | ||
250 | 100 | 216.48 | 100 | 100 | 144.06 | |||
1 | 500 | 100 | 213.12 | 200 | 100 | 146.64 | ||
250 | 100 | 202.28 | 100 | 100 | 144.47 |
The results of the estimators for case-1 and case-2 are presented in Tables 1–2 for PRE. Major findings of the estimators are:
i. PREs are higher than usual variance estimator, which are shown in Tables 1 and2 respectively, which show the efficiency of the proposed estimator estimators.
ii. From Table 1, it is observed that proposed estimators have the highest efficiency when the value of correlation coefficient is highest.
iii. It is observed that the value of varies from 1 to 1, the value of PRE is increase of the proposed estimators in both cases.
iv. In Table 1, we consider single and two-phase sampling using single auxiliary variable. So, we also noticed that the value of sample size increases efficiency of the estimators also increases.
v. In Table 2, we consider Section 3.2 and compute results for single and two auxiliary variables for single and two-phase sampling. So, by increasing sample size efficiency of the estimators also increases.
vi. From Remark 1, it is shown that and are the special cases of that are for single and two-auxiliary variables respectively. From Table 2, case-1 the efficiency for proposed two-auxiliary variables estimator increased.
vii. From Table 2, we noticed that when the value of k increases the efficiency of the proposed estimators is increased.
Table 2 PRE of the proposed estimators in case-1 and case-2 with respect to for Section 3.2
PRE for case-1 | PRE for case-2 | ||||||||
1 | 250 | 100 | 127.43 | 114.77 | 100 | 100 | 115.59 | 120.09 | |
500 | 100 | 124.51 | 152.49 | 200 | 100 | 115.20 | 103.37 | ||
0.5 | 250 | 100 | 128.69 | 117.27 | 100 | 100 | 114.97 | 111.31 | |
500 | 100 | 128.01 | 117.56 | 200 | 100 | 116.19 | 110.84 | ||
0.5 | 250 | 100 | 139.96 | 146.93 | 100 | 100 | 121.81 | 135.71 | |
500 | 100 | 123.28 | 133.87 | 200 | 100 | 112.52 | 104.30 | ||
1 | 250 | 100 | 113.89 | 131.99 | 100 | 100 | 113.66 | 115.00 | |
500 | 100 | 137.66 | 148.05 | 200 | 100 | 119.49 | 116.98 |
From above discussions, we can conclude that the proposed estimators performed better as compared to population variance. So computational procedure supports the theoretical findings in both cases in randomized response technique.
The objective of the survey sampling techniques is to estimate population characteristics with precision, and it can be increased by using proper methodology. In this study, we have suggested variance estimation procedure using scrambled randomized response and multi-auxiliary variables for multi-phase sampling. The proposed estimators are more efficient than usual variance estimator which is shown in Tables 1 and 2 for case-1 and case-II. We use different sample sizes and value of k lie between (1,1) for different scramble responses for estimating MSEs of the proposed estimators. From computational procedure, it can be shown that proposed estimators are more efficient and helpful in estimation of variance for sensitive variable using RRT.
[1] Asghar, A., Sanaullah, A. and Hanif, M. (2014). Generalized exponential type estimator for population variance in survey sampling. Revista Colombiana de Estadística, 37(1), 213–224.
[2] Das, A.K. and Tripathi, T.P. (1978): Use of auxiliary information in estimating the finite population variance. Sankhya, 40, C, 139–148.
[3] Eichhorn, B.H. and Hayre, L.S. (1983). Scrambled randomized response methods for obtaining sensitive quantitative data. Journal of Statistical Planning and Inference, 7(4), 307–316.
[4] Himmelfarb, S. and Edgell, S.E. (1980). Additive constants model: A randomized response technique for eliminating evasiveness to quantitative response questions. Psychological Bulletin, 87(3), 525–530.
[5] Huang, K.C. (2010). Unbiased estimators of mean, variance and sensitivity level for quantitative characteristics in finite population sampling. Metrika, 71(3), 341–352.
[6] Hussain, Z. (2012). Improvement of the Gupta and Thornton scrambling model through double use of randomization device. International Journal of Academic Research in Business and Social Sciences, 2(6), 91–97.
[7] Isaki, C.T. (1983). Variance estimation using auxiliary information. Journal of the American Statistical Association, 78(381), 117–123.
[8] Shabbir, J., Gupta, S. (2010) Some estimators of finite population variance of stratifed sample mean, Communication in Statistics – Theory and Methods 39, 3001–3008.
[9] Singh, S., Joarder, A.H. (1998). Estimation of finite population variance using random nonresponse in survey sampling, Metrika 47, 241–249, 1998.
[10] Neyman, J. (1938) Contributions to the Theory of Sampling Human Populations. Journal of the American Statistical Association, 33, 101–116.
[11] Noor-ul-Amin, M., Mushtaq, N., and Hanif, M. (2018). Estimation of mean using generalized optional scrambled responses in the presence of nonsensitive auxiliary variable, Journal of Statistics and Management Systems. 21(2), 287–304.
[12] Warner, S.L. (1965). Randomized response: A survey technique for eliminating evasive answer bias. Journal of the American Statistical Association, 60(309), 63–69.
[13] Yadav, S.K., Kadilar, C., Shabbir, J. and Gupta, S. (2015) Improved family of estimators of population variance in simple random sampling. Journal of Statistical Theory and Practice, 9(2), 9219–9226.
[14] Yasmeen, U., Noor-ul-Amin, M., Hanif, M. (2018). Exponential Estimators of Finite Population Variance Using Transformed Auxiliary Variables. Proceedings of the National Academy of Sciences, India Section A: Physical. https://doi.org/10.1007/s4001
Nadia Mushtaq received her MSc and MPhil degrees in Statistics from Quaid-i-Azam University Islamabad, Pakistan and PhD degree in Statistics from National College of Business administration & Economics Lahore, Pakistan. Dr. Mushtaq is currently working as an Assistant Professor at Forman Christian College Lahore, Pakistan. She has more than fifteen years of teaching/research experience at university. Her research interests include sampling techniques, Time series analysis and statistical data analysis using different statistical software such as: SPSS, SAS, Minitab, and R-Language. She published ten research papers in national and international Journals.
Journal of Reliability and Statistical Studies, Vol. 14, Issue 1 (2021), 209–222.
doi: 10.13052/jrss0974-8024.14110
© 2021 River Publishers