A Family of Estimators for Population Mean Under Model Approach in Presence of Non-Response
Ajeet Kumar Singh1,* and V. K. Singh2
1Department of Statistics, University of Rajasthan, Jaipur, India
2Department of Statistics, Banaras Hindu University, Varanasi, India
E-mail: ajeetvns.singh@gmail.com; vijay_usha_2000@yahoo.com
Corresponding Author
Received 01 August 2021; Accepted 17 December 2021; Publication 28 January 2022
We have defined a class of estimators for population mean under non-response error based upon the concept of sub-sampling of non-respondents utilizing an auxiliary variable. The class is a one-parameter class of estimators which is based on the idea of exponential type estimators (ETE). The model biasness and model-mean square error of the class and some of its important members have been derived under polynomial regression model (PRM). The effect of variations in PRM specifications on the efficiency of the estimators has been discussed based upon the empirical results.
Keywords: Non-response, families of estimators, polynomial regression model, mean square error.
The errors arising due to drawing inferences about the population on the basis of only a part of it; the sample. This type of errors are called ‘sampling errors’. A second source of error may arise because of failure to measure some of the units in the selected sample or in the ascertainment of information some of the units or wrong reporting or recording or tabulation or processing of data. When all of these types are grouped together they are termed as ‘non- sampling errors’. Further non-sampling error has been studied with illustrations by Deming (1944), Mahalanobis (1946), Moser (1958) and Zarkovich (1966). There are several literatures by Basu (1958), Cassel et al. (1976), Brewer (1963), Singh et al. (2009b) and Shukla, R. K. (2010), Royall (1971) and Royall and Herson (1973a, 1973b) stating that the model-based inference is not only desirable but almost necessary.
Sometimes, these non-sampling errors are more serious than the sampling error. Therefore an effort has been made to reduce the non-sampling error along with the sampling error incurred while conducting sample surveys. The main factors of non-sampling error can be encountered as measurement error and error due to non-response. Hansen and Hurwitz (1946) were the first to deal with the problem of non-response in mail surveys with the concept of sub-sampling the non-respondents. They developed an unbiased estimator for population mean on the basis of respondents and non-respondents.
Contrary to the classical survey sampling theory which assumes that the observations on the units of the population are fixed, there is another school of thought that advocates the use of the concepts that the finite population values are realized outcomes of a set of random variables which has been selected from a probability distribution . This approach is known as Super population Model Approach (SPMA), in contrast to classical theory known as Fixed Population Approach (FPA). The SPMA has its own advantages and disadvantages. In the literature, a good number of studies are available based upon SPMA. In most of these studies, the main aim has been to investigate the “robustness of the estimators under the misspecifications of the models, that is, how much the estimator is consistent in terms of stability of its variance or mean square error over the changes of the model.
A special case of super-population model was proposed by Royall and Herson (1973a, 1973b) named as Polynomial Regression Model (PRM), which is described as
(1) |
with
where is the random variable associated with the th unit of the finite population of size N, is the value of the th unit of the population on the known auxiliary variable X typically referred to as their measures of size ( for ), are independent random variables each having mean zero and variance is zero or one according as the term is absent or present respectively in the model (1), is a known function of x-values and are unknown model parameters. Royall and Herson (1973a) denoted this model as . Chambers (1986) has mentioned that in both sample survey theory and practice, expectation of is proportional to and the variance of is proportional to a known function of the are of considerable interest.
Let a finite population of size N consists respondents and non-respondent units and a sample of size n, drawn from the population, consists respondents and non-respondents. Further let non-respondent units are randomly selected from the non-respondents and all efforts were made to take information from these non-respondents.
Let the samples of sizes n and be denoted by s and respectively. Let where and be two disjoint sets of units, implying that is the non-observed part of the population . Similarly let ( and being the disjoint sets of units) and , where be the remaining part of . Let stands for the summation over the set (subset) p of units. We define the following notations:
Let Z Variable Y or X
Sample means:
Further let for the variable X
Obviously we observed that
H-H (1946) proposed an unbiased estimator of population mean as follows:
(2) |
Obviously, we have
(3) |
and
(4) |
If the information on an auxiliary character is available for each and every unit of the population, it could be assumed to be known for every unit in the samples of sizes and also. We can, therefore, utilize this information in order to define the estimator
(5) |
for estimating the unknown sample mean of the non-respondents present in the sample of size . Based upon the estimator and , let us now define the estimator
(6) |
which is an alternative to the estimator defined in (7) and might be considered as an extension of the Hansen and Hurwitz estimator .
We define here a one-parameter family of estimators for population mean in the presence of non-response as
(7) |
where is a function of the parameter , population mean and sample mean based on the sample of size of the auxiliary variable and is given by
(8) |
Remark 1: The function is developed on the lines of exponential-type estimator suggested by Bahl and Tuteja (1991).
Remark 2: Some special cases of is of importance. Letting , we have
(9) |
which are similar to the estimators defined by H-H (1946). Further, for and 1, and respectively yield exponential-type ratio and product estimators, proposed by Bahl and Tuteja (1991), assuming that H-H (1946) technique of sub-sampling of non-respondents was followed for tackling the problem of non-response.
We now obtain the -Bias and -MSE of the general family under super population approach considering the PRM .
Theorem 1: The -bias of the estimator is
(10) |
The proof of the expression (10) is presented in the Appendix, Section-I.
Theorem 2: The -MSE of the estimator is given by
(11) |
The proof of the above expression is presented in the Appendix, Section-II.
Some Particular Cases of the General PRM may be considered with changes in two different components, viz., in the function and in the function . For example, some particular forms of the general model may be taken as
(12) | ||
(13) | ||
(14) |
Obviously models (12), (13) and (14) may be denoted respectively as , and . Cochran (1953) and Brewer (1963) have shown that if function is taken to be with then majority of the situations occurring in practice might be covered as far as the variance function of the model is concerned. Under this consideration, therefore, we may consider the following six PRMs for further analysis:
Model I | (15) | ||
Model II | (16) | ||
Model III | (17) | ||
Model IV | (18) | ||
Model V | (19) | ||
Model VI | (20) |
Royall and Herson (1973a) have shown that under the Model II: , the conventional ratio estimator becomes unbiased while it is biased under design approach. On the other hand, contrary to fixed population approach, the sample mean estimator is not unbiased under the model . All other models also have their importance and significance under different situations.
The expression for MSE of under the six models are given as
(21) | ||
(22) | ||
(23) | ||
(24) | ||
(25) | ||
(26) |
Singh et al. (2017) defined two families of estimators and under non-response and compared then under different polynomial regression models. The estimators and their MSEs under are as follows:
(i) | (27) | ||||
(28) | |||||
(ii) | (29) |
Where
(30) | ||
(31) |
with
(32) |
We made a study of the proposed family of estimators, in terms of robustness criterion and compared it with the estimators and for its efficiency. All these estimators have been developed utilizing the concept of sub-sampling of non-response in order to cope up the problem of non-response, inherent in the population and their -bias and -MSE are obtained under the general PRM with different set-up of polynomial regression function and variance function .
Since theoretical comparisons of -MSEs of the estimators are not simple and any concrete conclusion can not be drawn, we have used an empirical data for this purpose. In order to make numerical comparison of robustness of the proposed estimator and MSE comparison of , and , we have considered an empirical data presented in Singh et al. (2017). The data has been taken from Kish (1967). The details of the data have been given in Appendix-E of Kish (1967). For the data, we have the following particulars for non-response rates 15, and 30 percent respectively: The represents the number of dwellings whereas denotes dwelling occupied by renters.
For the data set, we obtained the following values:
(i) For 15 percent non-response rate:
(ii) For 30 percent non-response rate:
Tables 1–2, 3–4 and 5–6 depict the variations in MSEs of estimators , and respectively for and 1 with 15, and 30 percent non-response rates over Models I–VI.
Table 1 Comparisons of MSEs of estimators with 15% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 221.9701 | 33.63682 | 221.9745 |
1 | Model-II | 223.2496 | 35.04206 | 223.3812 |
2 | Model-III | 291.1687 | 109.2985 | 295.6571 |
0 | Model-IV | 221.9701 | 34.13085 | 221.9745 |
1 | Model-V | 223.2496 | 35.53468 | 223.3812 |
2 | Model-VI | 291.1687 | 109.7911 | 295.6571 |
Table 2 Comparisons of MSEs of estimators with 30% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 19.85436 | 26.58954 | 19.8524 |
1 | Model-II | 21.38647 | 28.27287 | 21.28827 |
2 | Model-III | 110.8182 | 126.3833 | 104.4577 |
0 | Model-IV | 19.85436 | 26.15504 | 19.8524 |
1 | Model-V | 21.38647 | 27.83837 | 21.28827 |
2 | Model-VI | 110.8179 | 125.9488 | 104.4577 |
Table 3 Comparisons of MSEs of estimators with 15% non-response rate for d over Models I–VI
g | Model | |||
0 | Model-I | 108.2256 | 9.458615 | 108.2302 |
1 | Model-II | 109.5656 | 10.9243 | 109.704 |
2 | Model-III | 180.5684 | 87.10091 | 185.2896 |
0 | Model-IV | 108.6612 | 9.820009 | 108.6657 |
1 | Model-V | 110.0011 | 11.28569 | 110.1395 |
2 | Model-VI | 181.004 | 87.65979 | 185.7251 |
Table 4 Comparisons of MSEs of estimators with 30% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 0.124783 | 6.455189 | 0.1227161 |
1 | Model-II | 1.730512 | 8.08324 | 1.627152 |
2 | Model-III | 95.3875 | 102.5966 | 88.69285 |
0 | Model-IV | 0.112961 | 6.28924 | 0.1108941 |
1 | Model-V | 1.71869 | 7.917298 | 1.615329 |
2 | Model-VI | 95.3756 | 102.4307 | 88.68103 |
Table 5 Comparisons of MSEs of estimators with 15% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 372.1006 | 67.9551 | 372.1047 |
1 | Model-II | 373.3227 | 69.31012 | 373.4478 |
2 | Model-III | 438.3187 | 141.6982 | 442.5852 |
0 | Model-IV | 371.3129 | 68.71755 | 371.317 |
1 | Model-V | 372.535 | 69.77258 | 372.6601 |
2 | Model-VI | 437.531 | 142.3581 | 441.7975 |
Table 6 Comparisons of MSEs of estimators with 30% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 82.59208 | 61.88629 | 82.59022 |
1 | Model-II | 84.05434 | 63.62918 | 83.96106 |
2 | Model-III | 169.4755 | 165.6155 | 163.4338 |
0 | Model-IV | 82.22125 | 61.0706 | 82.21938 |
1 | Model-V | 83.68351 | 62.81353 | 83.59023 |
2 | Model-VI | 169.1047 | 164.7998 | 163.0629 |
Here we have used simulation study for the data given above. We have drawn 30000 times samples from the population of size 90 and take a sample of size 20 and use 15% non-response rate to find the MSEs of estimators for .
Table 7 Comparisons of MSEs of estimators with 15% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 1927.238 | 1727.755 | 558.0646 |
1 | Model-II | 1928.518 | 1729.029 | 559.3441 |
2 | Model-III | 1996.437 | 1796.658 | 627.2632 |
0 | Model-IV | 1927.238 | 1727.589 | 558.0646 |
1 | Model-V | 1928.518 | 1728.863 | 559.3441 |
2 | Model-VI | 1996.437 | 1796.492 | 627.2632 |
Table 8 Comparisons of MSEs of estimators with 15% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 1724.227 | 1124.534 | 53.7949 |
1 | Model-II | 1725.227 | 1125.977 | 55.31391 |
2 | Model-III | 1795.378 | 1199.191 | 135.4364 |
0 | Model-IV | 1725.312 | 1127.193 | 54.9664 |
1 | Model-V | 1726.629 | 1128.636 | 56.48532 |
2 | Model-VI | 1780.501 | 1202.047 | 136.6079 |
Table 9 Comparisons of MSEs of estimators with 15% non-response rate for over Models I–VI
g | Model | |||
0 | Model-I | 2138.325 | 2272.202 | 1487.107 |
1 | Model-II | 2139.568 | 2273.383 | 1488.19 |
2 | Model-III | 2205.636 | 2337.634 | 1546.085 |
0 | Model-IV | 2137.135 | 2268.999 | 1481.493 |
1 | Model-V | 2138.378 | 2270.18 | 1482.576 |
2 | Model-VI | 2204.446 | 2334.629 | 1540.47 |
From the tables some conclusions can be drawn about the family for and 1 regarding its robustness and precision as compared with other estimators. There are as follows:
(i)As far as the robustness property of the estimator is considered, it can be concluded that the data give sufficient evidence that the proposed family seems to be robust enough under the models I, II, IV and V, where the polynomial regression function changes and variance function is of the form , . Similarly, the estimator seems to be robust under models III and VI where variance function is proportional to . It is also to be emphasized that this property of the estimator holds good irrespective of “the non-response rate” and the choice of the parameter .
(ii)However, as the value of g increases, the performance of the estimators, irrespective of the model-choice and the choice of the parametric value, decreases.
(iii)From the Tables 7, 8 and 9 for 15% non-response rate, we conclude that this is similar to Tables 1, 3 and 5.
Section I: We have
(B1) |
Now, using the PRM given in (1), we can write
(B2) |
Since for all k, we have
Thus expression (10) follows.
Section II “The -MSE of the estimator” “under the model 1 is derived as follows”:
We have
(B4) |
“Now realizing that” “ for and” , “we have”
(B5) |
Expression (B5) can further be written as
(B6) |
Hence the expression (11) follows.
Bahl, S. Tuteja, R. K. (1991): Ratio and product type-exponential estimator, Information and Optimization Sciences, XII, I, 159–163.
Basu, D. (1958): On sampling with and without replacement, Sankhya, 20 A, 287–294.
Brewer, K. R. W. (1963): Ratio estimation and finite populations: Some results deducible from the assumption of an underlying stochastic process, Australian Journal of Statistics, 5, 93–105.
Cassel, C. M., Sarndal, C. E. and Wretman, J. H. (1976): Some results on generalized difference estimation and generalized regression estimation for finite populations, Biometrika, 63, 615–620.
Chambers, R.L. (1986): Outlier robust finite population estimation, Journal of the American Statistical Association, 81(396), 1063–1069.
Cochran, W. G. (1953): Sampling Techniques, John Wiley and Sons, Inc., New York, I Edition.
Deming, W. E. (1944): On errors in surveys, American Sociological Review, 9, 359–369.
Hansen, M. H. and Hurwitz, W. N. (1946): The problem of non-response in sample surveys, Journal of the American Statistical Association, 41, 517–529.
Kish, L. (1967): Survey Sampling. John Wiley and Sons, Inc., New York, II Edition.
Mahalanobis, P. C. (1946): Recent experiments in statistical sampling in The Indian Statistical Institute, Journal of The Royal Statistical Society, 109A, 325–378.
Moser, C. A. (1958): Survey Methods in Social Investigation, Heinemann, London.
Royall, R. M. (1971): Linear regression models in finite population sampling theory, in Foundations of Statistical Inference, V. P. Godambe and D. A. Sprott (eds.), Toronto: Holt, Rinehart and Winston, 259–274.
Royall, R. M. and Herson, J. (1973a): Robust estimation in finite populations I, Journal of the American Statistical Association, 68(344), 880–889.
Royall, R. M. and Herson, J. (1973b): Robust estimation in finite populations II: Stratification on a size variable, Journal of the American Statistical Association, 68(344), 890–893.
Singh H.P., Solanki, R. S. (2011): Estimation of finite population mean using random non-response in survey sampling, Pakistan Journal of Statistics and Operation Research, 7(1), 21–41.
Singh H.P., Solanki, R. S. (2011): A General procedure for estimating the population parameter in the presence of random non-response. Pakistan Journal of Statistics, 27(4), 427–465 (2011).
Singh, V. K., Singh, R. V. K. and Shukla, R. K. (2009b): Model-based study of some estimators in the presence of non-response, in Population, Poverty and Health: Analytical Approaches (Eds. K. K. Singh, R. C. Yadava and Arvind Pandey), Hindustan Publishing Corporation, New Delhi, India, 360–365.
Singh, A. K. Singh, Priyanka and Singh, V. K. (2017): Model based study of families of exponential type estimators in presence of nonresponse, Communications in Statistics – Theory and Methods, 46,13, 6478–6490.
Shukla, R. K. (2010): Model-Based Efficiencies of Some Families of Estimators in Presence of Non-Response and Measurement Errors, Unpublished, Ph.D. Thesis submitted to Banaras Hindu University, Varanasi, India.
Zarkovich, S. S. (1966): Quality of Statistical Data, Food and Agricultural Organization of the United Nations, Rome.
Ajeet Kumar Singh is Assistant Professor in Department of Statistics, University of Rajasthan, Jaipur. He received the Ph.D degree in Statistics from Banaras Hindu University. He has published more than 20 research articles in reputed international journals. Field of specializations is Sampling Theory.
V. K. Singh is retired Professor, from Department of Statistics, Institute of Science, Banaras Hindu University, Varanasi, India since 2000. Joined the Department as Assistant Professor in 1972. Did M.Sc (Statistics) and Ph.D. (Statistics) from Banaras Hindu University in 1972 and 1979 respectively. Having 45 years teaching experience and 43 years research experience. Field of specializations is Sampling Theory, Stochastic Modelling, Mathematical Demography and Operations Research. Published 92 research papers in reputed international/national journals. Guided 15 Ph.D. scholars for their Ph.D. Degree. Visited United Kingdom, Australia and Sri Lanka for attending International Conferences and organizing Symposiums. Convened 2 national conferences. Life member of Indian Statistical Association, Member of International Association of Survey Statisticians (IASS), Associate Editor of Assam Statistical Review, India.
Journal of Reliability and Statistical Studies, Vol. 15, Issue 1 (2022), 1–20.
doi: 10.13052/jrss0974-8024.1511
© 2022 River Publishers