Efficient Method of Estimating the Finite Population Mean Based on Two Auxiliary Variables in the Presence of Non-Response Under Stratified Sampling
Housila P. Singh1 and Pragati Nigam2,*
1School of Studies in Statistics, Vikram University, Ujjain, Madhya Pradesh, India
2Mandsaur University, Mandsaur, Madhya Pradesh, India
E-mail: pragatinigam1@gmail.com
*Corresponding Author
Received 16 October 2020; Accepted 06 March 2021; Publication 18 June 2021
This article addresses the problem of estimating the population mean using information on two auxiliary variables in presence of non-response on study variable only under stratified random sampling. A class of estimators has been defined. We have derived the bias and mean squared error up to first order of approximation. Optimum conditions are obtained in which the suggested class of estimators has minimum mean squared error. In addition to Chaudhury et al. (2009) estimator, many estimators can be identified as a member of the suggested class of estimators. It has been shown that the suggested class of estimators is better than the Chaudhury et al. (2009) estimator and other estimators. Results of the present study are supported through numerical illustration.
Keywords: Study variable, auxiliary variable, finite population, non-response, bias, mean squared error.
In many surveys, auxiliary information is usually used to improve the precision or accuracy of the estimator of the population mean under the supposition that all the observations in the sample are available. However in many surveys covering human population, information is in most cases not obtained from all the units in the surveys even after call-backs. For example, the selected families may not be at home at the first attempt and some may refuse to cooperate with the interviewer even if contacted. This is true in mail surveys in which questionnaire are mailed to the sampled respondents who are requested to send back their returns by some dead line. As many respondents do not reply, available sample of returns is incomplete. The resulting incompleteness, called non response [Sukhatme et al. (1984, pp. 484–485)]. Incompleteness or non-response in the form of absence, censoring, or grouping is a troubling issue of several data sets. Statisticians have identified for some time that failure to account for the stochastic nature of incompleteness or non-response can spoil the nature of data. An estimate derived from incomplete data may be misleading especially when the respondents differ from the non-respondents because the estimate can be biased. Hansen and Hurwitz (1946) suggested a method for adjusting for non-response to address the bias problem. Their idea is to select a sub-sample from the non-respondents to obtain an estimate for the sub-population represented by the non-respondents [Okafor and Lee (2000, p. 183)].
When the population mean of the auxiliary variable is known; Cochran (1977), using Hansen and Hurwitz (1946) technique, envisaged the ratio and regression estimators of the population mean of the study variable in which information on the auxiliary variable is obtained from all the sample units, while some sample units failed to supply information on the study. Later various authors including Rao (1986), Khare and Srivastava (1993, 1997), Tripathi and Khare (1997), Okafor and Lee (2000), Tabasum and Khan (2004, 2006), Singh and Kumar (2008, 2009), Singh et al. (2010), Khare et al. (2013) have paid their attention towards the estimation of the population mean of the study variable using information on auxiliary variable in presence of non-response. Singh and Khalid (2015) suggested exponential chain dual to ratio and regression type estimators of the population mean in two-phase sampling. Further Chaudhary et al. (2011), Haq and Shabbir (2013), Sanaullah et al. (2015) and Saleem et al. (2018) envisaged some improved estimators of the population mean of the study variable using auxiliary information for stratified random sampling under non-response.
Let a finite population of size N be stratified into L strata (homogeneous). Let be the size of the stratum : . Let be the values on the unit of the stratum of the study variable y and auxiliary variables (x, z) respectively. Corresponding to the population means , let be the sample means of the stratum respectively. In practice it is usually not possible to gather information on all the variables/units selected in the sample . In this paper we have studied the situation when non-response occurs only on the study variable y whereas the two auxiliary variables (x, z) are observed with complete response.
Let units from a sample of size respond and units do not. Employing Hansen and Hurwitz (1946) method of sub-sampling the non-respondents, a sub-sample of size from non-respondent group is selected at random and denotes the sampling fraction among the non-respondent group in the stratum. In practice, is generally not integer and has to be rounded. In accordance with most of the current literature on this research topic, we suppose that the followed-up units respond on the second call. Further, let d denotes a dummy variable taking value on the population unit of stratum h and has stratum population mean . Hereafter, d may stand for if, x or for a second auxiliary variable (i.e. may stand for and in stratified sampling). Let
and
(1) |
where is the mean of respondents on first call and is the mean of units respond on the second call and denotes the unbiased Hansen and Hurwitz (1946) estimator of for stratum h.
Thus we define an unbiased estimator of the population mean as
(2) |
and the variance/MSE of is given by
(3) |
where and are respectively mean square of entire group and non-response group of variable d in the population for the stratum, , , , , and being the size of the non-response group of the population in the stratum.
For obtaining the bias and mean squared errors (MSEs) of the proposed estimators we below give the values of the required expectations:
We write
such that
and
where
are the correlation coefficients between the subscripted variables of entire population.
When non-response occurs only on the study variable y (i.e. incomplete information is available on the study variable y in the stratum while complete information on the sample of size is available for the auxiliary variables (x,z)), we define the following class of estimators for population mean as
(4) |
where are suitably chosen scalars and are constants to be determined such that MSE of is minimum. For the class of estimators reduces to the family of estimators due to Chaudhary et al. (2009).
Using the standard procedure we obtained the bias and MSE of to the first degree of approximation, respectively given by
(5) | ||
(6) |
where
The at (2) is minimized for
(7) |
Substitution of (7) in (2) yields the minimum MSE of as
(8) |
Thus we arrived at the following theorem.
Theorem 2.1. The MSE of the suggested class of estimators is greater than equal to the minimum MSE of i.e.
with equality holding if
A large number of estimators can be generated from the proposed class of estimators for suitable values of scalars involved in it. Some members of the proposed class of estimators are discussed below.
Case I. Putting and in (4) we get a class of estimators for population mean as
(9) |
Inserting and in (5) and (6) we get the bias and MSE of to the first degree of approximation as
(10) | ||
(11) |
where
The at (11) is minimum when
(12) |
Thus the resulting minimum MSE of is given by
(13) |
Now we arrived at the following theorem.
Theorem 2.2. To the first degree if approximation,
with equality holding if
Case II. If we set in (4) we get the class of estimators for as
(14) |
Putting in (5) and (6) we get the bias and MSE of to the first degree of approximation as
(15) | ||
(16) |
The at (16) is minimum when
(17) |
Thus the resulting minimum MSE of is given by
(18) |
Now we arrived at the following theorem.
Theorem 2.3. The MSE of is greater than equal to the minimum MSE of i.e.
with equality holding if
Case III. If we put in (2.1) we get an improved version of Chaudhary et al. (2009) class of estimators as
(19) |
Inserting in (15) and (16) we get the bias and MSE of to the first degree of approximation as
(20) | ||
(21) |
where
The at (21) is minimum when
Thus the resulting minimum MSE of is given by
(22) |
We now established the following theorem.
Theorem 2.4. The MSE of is greater than equal to the minimum MSE of i.e.
with equality holding if
• From (8) and (13) we have that
if
(23) |
This always met in survey situations. Thus the proposed class of estimators is more efficient than the class of estimators .
• From (8) and (18) we note that
(24) |
which follows that the proposed class of estimators is better than -family of estimators and hence it is more efficient than -family of estimators.
If we set in (19) we get a class of estimators due to Chaudhury et al. (2009):
(25) |
To the first degree of approximation the bias and MSE of are respectively given as
(26) | ||
(27) |
We have from (22) and (27) we have
(28) |
which follows that the proposed -family of estimators is more efficient than Chaudhury et al. (2009) class of estimators .
Finally it follows from (24) and (28) that the proposed -family of estimators is better than and -families of estimators.
For numerical illustration we consider a data set [Source: Koyuncu and Kadilar (2009)], in which y: Number of teachers; x: number of students and z: number of classes in primary and secondary schools for 923 districts and 6 regions in Turkey in 2007.
Stratum (h) | 1 | 2 | 3 | 4 | 5 | 6 | |
Stratified mean, | 127 | 117 | 103 | 170 | 205 | 201 | |
Standard deviations | 31 | 21 | 29 | 38 | 22 | 39 | |
and Correlation | 70 | 50 | 75 | 95 | 70 | 90 | |
coefficients | 883.84 | 644.92 | 1033.40 | 810.58 | 403.65 | 711.72 | |
30486.70 | 15180.77 | 27549.69 | 18218.93 | 8497.77 | 23094.14 | ||
555.58 | 365.46 | 612.95 | 458.03 | 260.85 | 397.05 | ||
703.74 | 413.00 | 573.17 | 424.66 | 267.03 | 393.84 | ||
20804.59 | 9211.79 | 14309.30 | 9478.85 | 5569.95 | 12997.59 | ||
498.28 | 318.33 | 431.36 | 311.32 | 227.20 | 313.71 | ||
0.94 | 1.00 | 0.99 | 0.98 | 0.99 | 0.97 | ||
0.94 | 0.97 | 0.98 | 0.96 | 0.97 | 1.00 | ||
0.98 | 0.98 | 0.98 | 0.98 | 0.96 | 0.98 | ||
W=10% | 510.57 | 386.77 | 1872.88 | 1603.30 | 264.19 | 497.84 | |
Non-response | 9446.93 | 9198.29 | 52429.99 | 34794.90 | 4972.56 | 12485.10 | |
303.92 | 278.51 | 960.71 | 821.29 | 190.85 | 287.99 | ||
1.00 | 1.00 | 1.00 | 0.97 | 1.00 | 0.93 | ||
0.99 | 0.99 | 1.00 | 0.96 | 0.99 | 0.98 | ||
0.99 | 0.99 | 1.00 | 0.99 | 0.99 | 0.96 | ||
W=20% | 396.77 | 406.15 | 1654.40 | 1333.35 | 335.83 | 903.91 | |
Non-response | 7439.16 | 8880.46 | 45784.78 | 29219.30 | 6540.43 | 28411.44 | |
244.56 | 274.42 | 965.42 | 680.28 | 214.49 | 469.86 | ||
1.00 | 0.99 | 1.00 | 0.98 | 1.00 | 0.99 | ||
0.99 | 0.99 | 0.98 | 0.96 | 0.98 | 0.98 | ||
0.99 | 0.98 | 0.98 | 0.99 | 0.98 | 0.99 | ||
W=30% | 500.26 | 356.95 | 1383.70 | 1193.47 | 289.41 | 825.24 | |
Non-response | 14017.99 | 7812.00 | 38379.77 | 26090.60 | 5611.32 | 24571.95 | |
284.44 | 247.63 | 811.21 | 631.28 | 188.30 | 437.90 | ||
0.96 | 0.99 | 1.00 | 0.98 | 1.00 | 0.97 | ||
0.91 | 0.98 | 0.98 | 0.97 | 0.98 | 0.96 | ||
0.97 | 0.98 | 0.98 | 0.99 | 0.98 | 0.98 |
Table 1 PRE of when and 30% non-response for different values of the constants
10% | 20% | 30% | |||||
0.5 | 0.20 | 0.75 | 1 | 1 | 118.84 | 115.69 | 113.44 |
0.5 | 0.25 | 0.75 | 1 | 1 | 124.26 | 145.26 | 166.26 |
0.5 | 0.30 | 0.75 | 1 | 1 | 130.01 | 151.98 | 173.95 |
0.5 | 0.40 | 0.75 | 1 | 1 | 142.56 | 166.65 | 190.73 |
0.5 | 0.50 | 0.75 | 1 | 1 | 156.64 | 183.11 | 209.58 |
0.5 | 0.60 | 0.75 | 1 | 1 | 172.43 | 201.56 | 230.7 |
Table 2 PRE of when and 30% non-response for different values of the constants
10% | 20% | 30% | ||||||||||
0.5 | 0.5 | 0.2 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 841 | 406.07 | 292.87 |
0.5 | 0.5 | 0.3 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 652.71 | 764.80 | 875.35 |
0.5 | 0.5 | 0.25 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 715.58 | 838.56 | 959.76 |
0.5 | 0.5 | 0.4 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 841.53 | 986.46 | 1129.05 |
0.5 | 0.5 | 0.3 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 928.01 | 1088.16 | 1245.45 |
0.5 | 0.5 | 0.5 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 936.55 | 1097.88 | 1256.58 |
0.5 | 0.5 | 0.4 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1304.33 | 1530.74 | 1751.99 |
0.5 | 0.5 | 0.5 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1409.92 | 1654.48 | 1893.63 |
0.5 | 0.5 | 0.25 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1515.89 | 1780.35 | 2037.68 |
0.5 | 0.5 | 0.6 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 2619.87 | 3079.66 | 3524.80 |
0.5 | 0.5 | 0.3 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 2854.18 | 3364.82 | 3851.19 |
0.5 | 0.5 | 0.5 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 4795.72 | 5670.80 | 6490.48 |
Table 3 PRE of when and 30% non-response for different values of the constants
10% | 20% | 30% | ||||||||||
0.5 | 0.5 | 0.2 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 137.09 | 130.58 | 126.19 |
0.5 | 0.5 | 0.25 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 143.51 | 167.78 | 192.03 |
0.5 | 0.5 | 0.25 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 148.61 | 173.73 | 198.85 |
0.5 | 0.5 | 0.3 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 150.33 | 175.75 | 201.15 |
0.5 | 0.5 | 0.3 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 155.75 | 182.09 | 208.41 |
0.5 | 0.5 | 0.3 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 161.42 | 188.73 | 216.01 |
0.5 | 0.5 | 0.4 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 171.36 | 200.35 | 229.31 |
0.5 | 0.5 | 0.4 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 177.79 | 207.87 | 237.92 |
0.5 | 0.5 | 0.5 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 181.97 | 212.74 | 243.49 |
0.5 | 0.5 | 0.5 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 188.91 | 220.86 | 252.79 |
0.5 | 0.5 | 0.5 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 196.17 | 229.37 | 262.52 |
0.5 | 0.5 | 0.6 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 200.74 | 234.69 | 268.61 |
Table 4 PRE of the proposed estimator when and 30% non-response for different values of the constants
10% | 20% | 30% | ||||||||||
0.5 | 0.5 | 0.2 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 919.84 | 415.39 | 295.18 |
0.5 | 0.5 | 0.3 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 693.78 | 813.14 | 930.68 |
0.5 | 0.5 | 0.25 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 767.55 | 899.74 | 1029.80 |
0.5 | 0.5 | 0.4 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 931.43 | 1092.35 | 1250.25 |
0.5 | 0.5 | 0.3 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1038.48 | 1218.37 | 1394.48 |
0.5 | 0.5 | 0.5 | 0.3 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1065.42 | 1249.68 | 1430.31 |
0.5 | 0.5 | 0.4 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1598.60 | 1878.31 | 2149.81 |
0.5 | 0.5 | 0.5 | 0.25 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1802.76 | 2118.4 | 2424.60 |
0.5 | 0.5 | 0.25 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 1906.51 | 2242.58 | 2566.73 |
0.5 | 0.5 | 0.3 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 5238.85 | 6226.55 | 7126.56 |
0.5 | 0.5 | 0.6 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 5327.30 | 6303.86 | 7215.05 |
0.5 | 0.5 | 0.5 | 0.2 | 0.75 | 1.25 | 1 | 1 | 1 | 1 | 68342.8 | 96766.1 | 110753 |
Table 1 gives the PRE of the Chaudhury et al. (2009) class of estimators when and 30% non-response respectively for different values of the constants .
Table 2 gives the PRE of estimator when and 30% non-response respectively for various values of the constants .
Table 3 gives the PRE of estimator when and 30% non-response respectively for multiple values of the constants .
Table 4 shows the PRE of the proposed estimator when is 10%, 20% and 30% respectively at varying constants .
It is observed from Tables 2–4 that for the constants , , , considered in these tables, the PREs of the estimators , , and are larger than 100%. So the estimators , , and are more efficient than the usual unbiased estimator which does not utilize auxiliary information. It shows that the use of auxiliary variable(s) at the estimation stage is advantageous. For all the choices of constants the PREs increase for increasing values of expect for the values of constants given in first row of the Tables 2–4, where the values of PREs decrease with increasing values of . Larger gain in efficiency is observed by using the proposed class of estimators over as compared to the estimators and . From the results of the Table 4 it is clear that there is enough scope of selecting the values of the constants in obtaining estimators from the suggested class of estimators better than the estimators , , and . Thus the proposal of the class of estimators is justified.
In this article we have developed the generalized version of Chaudhury et al. (2009) estimator using information on two auxiliary variables in presence of non-response under stratified sampling. In addition to Chaudhury et al. (2009) estimator, a large number of estimators can be identified as a member of the suggested class of estimators. We have obtained the bias and MSE of the envisaged class of estimators up to first order of approximation. The conditions are obtained at which the class of estimators has the minimum MSE. Thus this study unifies several results at one place. So it is advantageous to the researchers engaged in this area. It has been demonstrated both theoretically and numerically that proposed class of estimators is more efficient than the Chaudhury et al. (2009) estimator. Thus we recommend the proposed class of estimators for its use in practice.
We are grateful to the two learned referees, associate editor and editor-in-chief for their valuable suggestions that helped to improved our paper.
[1] Chaudhury, M. K., Singh, R., Shukla, R. K., Kumar, M. and Smarandache, F. (2009). A family of estimators for estimating population mean in stratified sampling under non-response, Pakistan Journal of Statistics and Operation Research, 5(1), 47–54.
[2] Chaudhury, M. K., Singh, V. K., Singh, R., and Smarandache, F. (2011). Estimating the population mean in stratified population using auxiliary information under non-response. Studies in Sampling Techniques and Time Series Analysis, Zip Publishing, USA.
[3] Cochran, W. G. (1977). Sampling techniques, 3rd Edition, John Wiley: New York.
[4] Hansen, M. H. and Hurwitz, W. N. (1946). The problem of non-response in sample surveys, Journal of the American Statistical Association, 41, 517–529.
[5] Haq, A. and Shabbir, J. (2013). Improved family of ratio estimators in simple and stratified random sampling, Communications in Statistics- Theory and Methods, 42(5), 782–799.
[6] Khare, B. B. and Srivastava, S. (1993). Estimation of population mean using auxiliary character in presence of non-response, National Academy Science Letters, 16, 111–114.
[7] Khare, B. B. and Srivastava, S. (1997). Transformed ratio-type estimators for the population mean in the presence of non-response, Communications in Statistics-Theory and Methods, 26(7), 1779–1791.
[8] Khare, B. B., Srivastava, U. and Kamlesh, K. (2013). Generalized chain type estimators for ratio of two population means using two auxiliary characters in the presence of non-response, International Journal of Statistics and Economics, 10(1).
[9] Koyuncu, N. and Kadilar, C. (2009). Ratio and product estimators in stratified random sampling, Journal of Statistical Planning and Inference, 139, 2552–2558.
[10] Okafor F. C. and Lee, H. (2000). Double sampling for ratio and regression estimation with sub-sampling the non-respondent, Survey Methodology, 26(2), 183–188.
[11] Rao, P. S. R. S. (1986). Ratio estimation with sub-sampling the non-respondents, Survey Methodology, 12(2), 217–230.
[12] Saleem, I., Sanaullah, A. and Hanif, M. (2018). A generalized class of estimators for estimating population mean in the presence of non-response, Journal of Statistical Theory and Applications, 17(4), 616–626.
[13] Sanaullah, A., Noor-ul-Amin, M. and Hanif, M. (2015). Generalized exponential-type ratio-cum-ratio and product-cum-product estimators for population mean in the presence of non-response under stratified two-phase random sampling, Pakistan Journal of Statistics and Operation Research, 31(1), 71–94.
[14] Singh, G. N. and Khalid, M. (2015). Exponential chain dual to ratio and regression type estimators of population mean in two-phase sampling, Statistica, 75(4), 379–389.
[15] Singh, H. P. and Kumar, S. (2008). A general family of estimators of finite population ratio, product and mean using two-phase sampling scheme in presence of non-response, Journal of Statistical Theory and Practice, 2(4), 677–692.
[16] Singh, H. P. and Kumar, S. (2009). A general procedure of estimating the population mean in the presence of non-response under double sampling using auxiliary information, SORT, 33(1), 71–84.
[17] Singh, H. P., Kumar, S. and Kozak, M. (2010). Improved estimation of finite population mean using sub-sampling to deal with non-response in two-phase sampling scheme, Communications in statistics-Theory and Methods, 39(5), 791–802.
[18] Sukhatme, P. V., Sukhatme, B. V., Sukhatme, S. and Asok , C. (1984). “Sampling theory and surveys with applications”, Towa State University Press, Ames.
[19] Tabasum, R. and Khan, I. A. (2004). Double sampling for ratio estimation with non-response, Journal of Indian Society of Agricultural Statistics, 53(3), 300–306.
[20] Tabasum, R. and Khan, I. A. (2006). Double sampling ratio estimator for population mean in presence of non-response. Assam Statistical Review, 20, 73–83.
[21] Tripathi, T. P. and Khare, B. B. (1997). Estimation of mean vector in presence of non-response, Communications and Statistics-Theory and Methods, 26(9), 2255–2269.
Housila P. Singh is Professor of Statistics at School of Studies in Statistics, Vikram University, Ujjain. He has guided 22 Ph.D. scholars and has published more than 500 research papers in national and international journals of repute.
Pragati Nigam is Assistant Professor of Statistics at Mandsaur University, Mandsaur. She completed her Ph.D. in 2021 from Vikram University, Ujjain under the guidance of Prof. H. P. Singh. Dr. Pragati published 4 research papers in national and international journals of repute.
Journal of Reliability and Statistical Studies, Vol. 14, Issue 1 (2021), 223–242.
doi: 10.13052/jrss0974-8024.14111
© 2021 River Publishers