An Improvement in Estimation of Population Mean using Two Auxiliary Variables and Two- Phase Sampling Scheme under Non-Response
Manoj K. Chaudhary1 and Amit Kumar2,*
1Department of Statistics, Institute of Science, Banaras Hindu University, Varanasi, India
2Department of Statistics, University of Allahabad, Prayagraj, India
E-mail: ritamanoj15@gmail.com; amit.stat40@gmail.com
*Corresponding Author
Received 02 July 2020; Accepted 16 November 2020; Publication 04 January 2021
In the present paper, we have proposed some improved ratio and regression-type estimators of the finite population mean utilizing the information on two auxiliary variables in the presence of non-response. The two-phase sampling scheme has been used to accomplish the job of estimating the desired parameter. The expressions for the basic properties such as bias and mean square error (MSE) of the proposed estimators have been derived up to the first order of approximation. A comparative study of the proposed estimators with some existing estimators has also been carried out through a real data set.
Keywords: Ratio and regression-type estimators, population mean, two-phase sampling scheme, auxiliary information, non-response.
There are several practical situations in which the survey (study) variable is highly correlated with more than one auxiliary variable. In such situations, the information on the auxiliary variables can be utilized to provide the improved estimates of the population parameters. Olkin (1958) proposed a multivariate ratio estimator utilizing the information on a number of auxiliary variables. Further, it has been extended by Srivastava (1966). Chand (1975) and Kiregyera (1980) have suggested some ratio-type estimators to resolve the problem of estimating the population mean with the help of two or more auxiliary variables.
The problem of non-response is very common in most of the sample surveys. The non-response arises from the fact that the surveyor fails to obtain the information from the respondents. It is of the great importance to deal with the problem of non-response through some innovative ideas or techniques. Hansen and Hurwitz (1946) resolved the problem of non-response by pioneering out a technique of sub-sampling of non-respondents. Besides it, there are several authors who have contributed a lot to deal with the problem of non-response. The authors such as Khare and Srivastava (1995), Singh and Kumar (2009), Singh et al. (2010), Shabbir and Nasir (2013), Verma et al. (2014) and Chaudhary and Kumar (2016) have tackled the problem of non-response in the case of two-phase sampling scheme.
In the subsequent sections, we have suggested some improved ratio and regression-type estimators of the finite population mean for the survey variable under non-response. We have utilized the information on two auxiliary variables with unknown means under the situation in which the first auxiliary variable is suffering from non-response, whereas the second (additional) auxiliary variable is free from the non-response. The basic properties of the proposed estimators have been discussed in detail. An empirical study has also been carried out to compare the efficiency of the suggested estimators with that of some existing estimators.
Let be a population of units. Let be the study variable with population mean . Let and be the auxiliary variables with respective population means and . Let , and be the observations on the th unit for the variables , and respectively. The aim of the present work is to estimate the population mean using the information on two auxiliary variables and with unknown means and under the assumption that the non-response occurs on study variable and auxiliary variable whereas the auxiliary variable is free from the non-response. Adopting the concrete idea of two-phase (or double) sampling scheme, we select a first-phase sample of units using simple random sampling without replacement (SRSWOR) scheme and collect the information on the auxiliary variables and . Now, a second-phase sample of units is selected from the first-phase sample of units by the method of SRSWOR and the information is collected on . Out of units at the first-phase, it is noted that there are responding units and non-responding units for the auxiliary variable . Further, we select a sub-sample of units from non-responding units and collect the information on all units. Following Hansen and Hurwitz (1946), the unbiased estimator of at the first-phase is given by
(1) |
The unbiased estimator of at the first-phase is given as
(2) |
where and are the means based on responding units and non-responding units respectively for the auxiliary variable .
The expressions for the variance of the estimators and are respectively given by
(3) | ||
(4) |
where and are the population mean squares of the entire group for the auxiliary variables and respectively. is the population mean square of the non-response group for the auxiliary variable . is the non-response rate in the population.
At the second-phase, units do respond and units do not respond on the study variable and auxiliary variable such that . A sub-sample of units is now selected from the non-responding units and the information is collected from all units on the variables and . Thus, the unbiased estimators of , and at the second-phase are respectively given as [See Hansen and Hurwitz (1946)]
(5) | ||
(6) | ||
(7) |
where , and are the means based on responding units for the variables , and respectively. , and are respectively the means based on the sub-sample of non-responding units for the variables , and .
Note: The functional form of the estimator is similar to that of the estimators and . The concept behind it is only to get the similar estimator, however the non-response does not occur on the auxiliary variable .
The expressions for the variance of the estimators , and are respectively given by
(8) | ||
(9) | ||
(10) |
where and are respectively the population mean squares of the entire group and non-response group for the study variable . is the population mean square of the non-response group for the auxiliary variable .
Chaudhary and Kumar (2016) have suggested some ratio and regression-type estimators of the finite population mean utilizing the information on the auxiliary variable with unknown mean under the situation in which both the variables and are suffering from non-response as follows:
(11) | ||
(12) |
where . and are the unbiased estimators based on units for and respectively. . is the population correlation coefficient between and .
The expressions for the mean square error (MSE) of the estimators and up to the first order of approximation are respectively given as
(13) | ||
(14) |
where is the population regression coefficient of on and is the population correlation coefficient between and of the non-response group. .
We now propose some improved ratio and regression-type estimators of the finite population mean utilizing the information on two auxiliary variables and with unknown means under the given situation of non-response as
(15) | ||
(16) |
where , and . and are respectively the unbiased estimators based on units for and . . is the population correlation coefficient between and .
To obtain the biases and mean square errors of the proposed estimators and , we use the concept of large sample approximations. Let
such that ,
where is the population correlation coefficient between and . and are the population correlation coefficients between , and , respectively for the non-response group. and are respectively the numbers of responding and non-responding units in the population such that .
Expressing the equation (15) in terms of , , , and then removing the terms having powers of , , , greater than two, we get
(17) |
Now, taking expectation on both sides of the equation (17), we find the expression for the bias of up to the first order of approximation as
(18) |
where , . .
Squaring both sides of the equation (17) and then taking expectation on avoiding the terms having powers of , , , higher than two, we get
Thus, the expression for the MSE of up to the first order of approximation is given as
(19) |
Now, expressing the Equation (16) in terms of , , , , , , , and then neglecting the terms involving powers of , , , , , , greater than two, we get
(20) |
where is the population regression coefficient of on .
The expression for the bias of up to the first order of approximation can be obtained on taking expectation of both sides of the equation (3). Thus, we have
(21) |
Squaring both sides of the equation (3) and then captivating expectation on neglecting the terms having powers of , , , , , , larger than two, we get
Therefore, the expression for the MSE of up to the first order of approximation is given by
(22) |
Here, the data considered by Khare and Sinha (2007) have been used to demonstrate the theoretical results. The data relate to the physical growth of upper socioeconomic group of 95 school children of Varanasi district under an ICMR study, Department of Pediatrics, B.H.U., during 1983–84. In this data set, we have Weight (in kg.) of the children, Skull circumference (in cm.) of the children and Chest circumference (in cm.) of the children. The first 25% units have been considered as the non-responding units for all the variables , and . The details are given below:
Table 1 shows the variance/MSE of the estimators, , , and along with their percentage relative efficiency (PRE). The PRE is computed with respect to .
Table 1 Variance/MSE and PRE of , , , ,
Variance/MSE | PRE | |||||||||
1.5 | 0.187 | 0.171 | 0.150 | 0.146 | 0.106 | 100.00 | 109.57 | 124.67 | 127.65 | 176.36 |
2.0 | 0.207 | 0.188 | 0.165 | 0.165 | 0.116 | 100.00 | 109.78 | 125.27 | 125.51 | 177.66 |
2.5 | 0.226 | 0.206 | 0.180 | 0.183 | 0.127 | 100.00 | 109.96 | 125.77 | 123.79 | 178.74 |
3.0 | 0.246 | 0.224 | 0.195 | 0.201 | 0.137 | 100.00 | 110.10 | 126.19 | 122.39 | 179.65 |
From the above table, it is revealed that the MSE of the proposed ratio-type estimator is much smaller than that of the existing ratio-type estimator and hence the PRE of is higher than that of . The same pattern of results is seen in the case of the proposed regression-type estimator and the existing regression-type estimator . It is also revealed that the proposed estimators and perform very well than the usual mean estimator .
We have proposed some improved ratio and regression-type estimators for estimating the finite population mean utilizing the information on two auxiliary variables in the presence of non-response. We have discussed the situation in which the knowledge about the population means of both auxiliary variables is not available and hence we have adopted the concrete idea of two-phase sampling scheme to get the desired information. A theoretical study along with the basic properties of the suggested estimators has been presented. To strengthen the theoretical results, we have carried out an empirical study by considering a natural data set. In Table 1, we have presented a comparative study of the proposed estimators with the usual mean estimator and some other existing estimators. The Table 1 shows that the proposed ratio and regression-type estimators and provide better estimates as compared to the usual mean estimator and existing ratio and regression-type estimators and . Thus, the proposed estimators may be preferred over the usual mean estimator and existing estimators.
[1] Chand, L. (1975). Some ratio-type estimators based on two or more auxiliary variables, Unpublished Ph. D. Thesis, Iowa State University, Ames, Iowa, U.S.A.
[2] Chaudhary, M. K. and Kumar, A. (2016). Estimation of mean of finite population using double sampling scheme under non-response, Journal of Statistics Application and Probability (JSAP), Vol. 5, No. 2, 287–297.
[3] Hansen, M. H. and Hurwitz, W. N. (1946). The problem of non-response in sample surveys, Journal of the American Statistical Association, 41, 517–529.
[4] Khare, B. B. and Sinha, R. R. (2007). Estimation of the ratio of the two population means using multi auxiliary characteristics in the presence of non-response, In Statistical Techniques in Life Testing, Reliability, Sampling Theory and Quality Control, 163–171.
[5] Khare, B. B. and Srivastava, S. (1995). Study of conventional and alternative two-phase sampling ratio, product and regression estimators in presence of non-response, Nat. Acad. Sci. Lett. India, 65, 195–203.
[6] Kiregyera, B. (1980). A chain ratio-type estimator in finite population double sampling using two auxiliary variables, Metrika, 31, 215–226.
[7] Olkin, I. (1958). Multivariate ratio estimation for finite populations, Biometrika, 45, 154–165.
[8] Shabbir, J. and Nasir, S. (2013). On estimating the finite population mean using two auxiliary variables in two phase sampling in the presence of non response, Communication in Statistics-Theory and Methods, 42, 4127–4145.
[9] Singh, H. P. and Kumar, S. (2009). A general procedure of estimating the population mean in the presence of non-response under double sampling using auxiliary information, SORT, 33(2.1), 71–84.
[10] Singh, H. P., Kumar, S. and Kozak, M. (2010). Improved estimation of finite population mean using sub-sampling to deal with non response in two phase sampling scheme, Communication in Statistics-Theory and Method, 39(5), 791–802.
[11] Srivastava, S. K. (1966). A note on Olkin’s multivariate ratio estimator, Journal of the Indian Statistical Association, 4, 202–208.
[12] Verma, H. K., Sharma, P. and Singh, R. (2014). Some ratio cum product type estimators for population mean under double sampling in the presence of non-response, J. Stat. Appl. Pro. 3, No. 3, 379–385.
Manoj K. Chaudhary is a Professor (Associate) in the Department of Statistics at Banaras Hindu University, Varanasi, India. He has made significant contributions to the survey methodology and methods of estimation. He has published more than 45 research papers in internationally and nationally renowned journals.
Amit Kumar is a Guest Faculty in the Department of Statistics at University of Allahabad, Prayagraj, India. He has made significant contributions to the survey methodology and methods of estimation. He has published more than 8 research papers in internationally and nationally renowned journals.
Journal of Reliability and Statistical Studies, Vol. 13_2-4, 349–362.
doi: 10.13052/jrss0974-8024.13247
© 2020 River Publishers