Families of Estimators for Estimating Mean Using Information of Auxiliary Variate Under Response and Non-Response
R. R. Sinha* and Bharti
Department of Mathematics, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India
E-mail: raghawraman@gmail.com; bhartikhanna_512@yahoo.com
*Corresponding Author
Received 20 July 2019; Accepted 25 March 2020; Publication 10 September 2020
This research article is concerned with the efficiency improvement of estimators for finite population mean under complete and incomplete information rising as a result of non-response. Different families of estimators for estimating the mean of study variate via known population mean, proportion and rank of auxiliary variate under different situations are proposed along with their bias and mean square error . Optimum conditions are suggested to attain minimum mean square error of proposed families of estimators. Further the problem is extended for the situation of unknown parameters of auxiliary variate and two phase sampling families of estimators are suggested along with their properties under fixed cost and precision. Employing real data sets, theoretical and empirical comparisons are executed to explain the efficiency of the proposed families of estimators.
Keywords: Non-response, bias, mean square error, rank, auxiliary variate
It is a well-known fact that the efficiency of the estimators often increases by utilising the available information of the auxiliary variate [see Cochran (1940), Tripathi et al. (1994), Khare (2003)], but there are many causes when the required information may not be available specifically or completely, on different variates. For instance, sometimes respondents are reluctant to answer the questionnaire or do not provide complete information, this circumstance is called non-response which causes the decrease in the efficiency. To resolve this problem of non-response, Hansen and Hurwitz (1946) introduced a methodology of sub-sampling from non-responding group of population to suggest an unbiased estimator for estimating the population mean. Several authors Rao (1986, 90), Khare and Srivastava (1993, 95, 97, 2000), Khare and Sinha (2002, 2009), Singh and Kumar (2009), Sinha and Kumar (2011, 13, 14) suggested various type of estimators/classes of estimators for estimating the population mean using the methodology of sub-sampling the non-respondents with known and unknown population mean of auxiliary variate(s).
Let us suppose that is a finite population consist with units. A sample of size is drawn from using simple random sampling without replacement (SRSWOR) method and value of study and auxiliary variates for the unit of the population is denoted by and respectively. Let the population is dichotomy in two non-overlapping strata of responding ( units) and non-responding ( units) groups, i.e. . Further, in a sample () of units, responding () and non-responding units are supposed to come from responding and non-responding group of population. Let a subsample of size , units is drawn arbitrarily from , where denotes the sub-sampling fraction at second phase sample. If and are the estimates of unknown proportions and respectively, therefore on the basis of available information on units, Hansen and Hurwitz (1946) suggested an unbiased estimator for estimating the population mean as
(1) |
whose variance is given by
(2) |
where , , and are the coefficients of variation of complete and non-responding group of the study variate while and are sample means of the study variate depending upon and units respectively.
Following the strategies of Rao (1986, 90), Khare and Srivastava (1993), and Khare and Srivastava (2000), the conventional ratio, product, regression, generalized and class of estimators for estimating the population mean with known mean and proportion of auxiliary variate under unit non-response on study as well as auxiliary variates may be defined as
, | , |
, | , |
, | , |
, | , |
where , | where , |
The and of all the above estimators up to the order of are given as:
(3) | ||
(4) | ||
(5) | ||
(6) | ||
(7) |
where , and
(8) |
at
(9) | ||
(10) |
at
(11) |
where
(12) |
at
(13) | ||
(14) | ||
(15) | ||
(16) | ||
(17) |
where , and ,
(18) |
at
(19) | ||
(20) |
at
(21) |
where
and
(22) |
at
However, it has been observed in some cases when complete information is available on auxiliary variate but not on study variate [see Rao (1990)], so in this circumstance the alternative ratio, product, regression, generalized and class of estimators using the known population mean and proportion of the auxiliary variate may be defined as
, | , |
, | , |
, | , |
, | , |
where , | where , |
The and of all the above estimators up to the order of are given as follows:
(23) | ||
(24) | ||
(25) | ||
(26) | ||
(27) |
where and
(28) | ||
(29) | ||
(30) | ||
(31) |
where , , ,
(32) | ||
(33) | ||
(35) | ||
(36) | ||
(37) |
where , and
(38) | ||
(39) | ||
(40) | ||
(41) |
where
and
(42) |
Further, Sinha and Kumar (2014) extended this problem to propose the classes of estimators using population mean and proportion of auxiliary variate when non-response occurs only on study variate and studied their properties.
Following the approach of Sinha and Kumar (2014) and using the information of known mean, proportion and rank of auxiliary variable, two situations are considered for wider families of estimators to estimate the population mean of study variate.
Situation I: The families of estimators proposed for the situation when non-response takes place on both study and auxiliary variates
(43) |
and
(44) |
such that .
Situation II: When non-response takes place only on study variate but complete response accessible on auxiliary variate, the families of estimators proposed for this situation are
(45) |
and
(46) |
such that .
The families of estimators and may be combine to study their properties and they may be defined as
(47) |
and
(48) |
such that .
It is to be noted here that the continuous functions and assume positive values in a bounded subsets and containing the point on the real line. The first and second order partial derivatives of functions and with respect to and are assumed to be continuous and bounded in and respectively.
To obtain and of and , we assume for ease that the population size is large enough as compared to the sample size so that the finite population correction terms may be ignored. Therefore, we define the following terms under the large sample approximation as
such that .
and the following terms are obtained up to the first order of approximation
where
Here and , , denote the coefficient of correlation between , respectively for complete and non-responding group of population while and denote the coefficient of bi-serial correlation between , , for complete and non-responding group of population.
Now, expanding the functions and about the point by Taylor’s series up to the partial derivatives of second order and applying the condition , we have
(49) | ||
(50) | ||
(51) | ||
(52) |
where
Since, it is assumed that the sample size is so large to justify the first degree of approximation and under the regularity conditions imposed on , their and will always exist. Therefore, the bias and of up to the first order of approximation are as follows
(53) | ||
(54) | ||
(55) | ||
(56) | ||
(57) | ||
(58) | ||
(59) | ||
(60) |
Using the principle of maxima and minima, partially differentiating the ’s of with respect to the corresponding and , the optimum values of the functions can be obtained. Hence, the minimum ’s of along with the optimum conditions are given by
(61) |
if
and
(62) |
if
and
(63) |
if
and
(64) |
if
and
Since the values of and , involve the unknown parameters, therefore their values can be obtained by using the prior data or replacing with their consistent estimate, Reddy (1978), Srivastava and Jhajj (1983) and Koyuncu and Kadilar (2009) have shown that these estimates do not affect the minimum mean square error of the estimators up to the order .
In this segment, the problem is extended to the two-phase sampling for estimating the population mean when non-response observes only on study variate while mean and rank of auxiliary variate are unknown. In this situation, a larger sample of size at first phase is chosen from the population of size using SRSWOR to estimate unknown population mean and rank . Let the estimates are and based on complete information available on units of auxiliary variate thereafter a sample of second phase is drawn from the sample of first phase, which is used to obtain the obligatory information on study variate . Now, the three different families of estimators under these situations are as follows
(65) | ||
(66) |
and
[Sinha and Kumar (2014)] | (67) |
Following Khare and Sinha (2002) and Sinha and Kumar (2014), the proposed family of two-phase sampling estimators for estimating the population mean using the estimates and is given by
(68) |
such that and the function satisfy some regularity conditions required for the its expansion by Taylor’s series.
Proceeding from the previous section, some large approximations under SRSWOR are defined to obtain bias and ’s of as
such that ,
where .
Now, expanding the function by Taylor’s series about the point and using the condition , we have
(69) |
where
and
The and of proposed family of estimators up to order are given as
(70) | ||
(71) |
The of proposed family of estimators will attain its minimum value when
and the minimum of is given by
(72) |
Let the total fixed cost apart from the overhead charge of the survey is . The cost function is
(73) |
Here -Cost for observing and identifying an auxiliary variate.
-Cost of sending a questionnaire or visiting the unit at second phase.
-Cost for processing and collecting information on a unit of study variate obtained from responding units and -Cost for processing and collecting information on a sub-sampled unit of study variate by interview basis.
The , can be expressed as given below:
(74) |
where coefficient of terms, coefficient of terms, coefficient of terms.
In order to optimize of the estimators and , and for the fixed cost , a function can be defined as
(75) |
where is a Lagrange’s multipliers.
Differentiate the function with respect to , and and equating them to zero, we have
(76) | ||
(77) |
and
(78) |
(79) |
Substituting the obtained values of , and in (73), we get
(80) |
Finally, the optimum value of is given by:
(81) |
For , the expected total cost is considered as
(82) |
and
(83) |
Suppose is the fixed variance for the estimator , and let
(84) |
Now consider a function to optimize the average total cost for the fixed variance of the estimator
(85) |
where is a Lagrange’s multiplier.
In order to optimize the cost function, differentiate w.r.t. and and equating them to zero, we have
(86) | ||
(87) |
and
(88) |
(89) |
Putting the values of and in (84), we have
(90) |
The optimum expected total cost incurred in attaining the fixed variance for the families of estimators is given by:
(91) |
For the optimum cost for fixed variance is given by
(92) |
(i) From Equations (2) and (2), we get
(ii) From Equations (2) and (12), we get
(iii) From Equations (2) and (22), we get
if
(iv) From Equations (62) and (2), we get
(v) From Equations (62) and (12), we get
(vi) From Equations (62) and (22), we get
(vii) From Equations (63) and (2), we get
(viii) From Equations (63) and (12), we get
(ix) From Equations (63) and (22), we get
if
(x) From Equations (3) and (2), we get
(xi) From Equations (3) and (12), we get
if
(xii) From Equations (3) and (22), we get
where
For the efficiency comparisons of with respect to the relevant estimators, the minimum of the estimators can be defined as follows
at
(93) | ||
(94) |
at
(95) | ||
(96) |
at
and
(xiii) From Equations (3) and (2), we get
(xiv) From Equations (3) and (12), we get
if
(xv) From Equations (3) and (94), we get
if
(xvi) From Equations (3) and (12), we get
if
An empirical study is carried out using real data sets of 109 village wise population of Baria (Urban) Police station Champua Tahsil, District-Orissa, India taken from Census Handbook of Orissa, 1981 published by Government of India. 25% villages (i.e. 27 villages) from upper part are considered to constitute non-respondents of the population to show the efficiency of suggested families of estimators.
Data 1: The study and auxiliary variates are as follows:
Agricultural labours, | Occupied houses |
Occupied houses more than 70, | Rank of |
The parameters for data 1 are:
6 | 0.706 | |
Data 2: The study and auxiliary variates are as follows:
Agricultural labours, | Total population |
Population greater than 500, | Rank of |
For Data 1 |
|||
Estimators | |||
89.1518 (100%)* | 76.9587 (100%) | 64.7656 (100%) | |
99.5688 (89.5%) | 80.5597 (95.5%) | 61.5506 (105.2%) | |
164.3870 (54.2%) | 142.5310 (53.9%) | 120.6750 (53.7%) | |
83.0203 (107.4%) | 70.0188 (109.9%) | 56.4732 (114.7%) | |
83.0203 (107.4%) | 70.0188 (109.9%) | 56.4732 (114.7%) | |
; | 83.0203 (107.4%) | 70.0188 (109.9%) | 56.4732 (114.7%) |
106.4870 (83.7%) | 88.4612 (86.9%) | 70.4351 (92%) | |
214.2020 (41.6%) | 185.6030 (41.5%) | 157.0040 (41.3%) | |
78.9660 (112.9%) | 67.1410 (114.6%) | 55.1977 (117.3%) | |
78.9660 (112.9%) | 67.1410 (114.6%) | 55.1977 (117.3%) | |
; | 78.9660 (112.9%) | 67.1410 (114.6%) | 55.1977 (117.3%) |
; , | 73.0153 (121.5%) | 61.7334 (124.6%) | 50.4251 (128.4%) |
; , | 73.273 0(121.7%) | 61.9016 (124.3%) | 50.5046 (128.2%) |
For Data 2 |
|||
Estimators | |||
89.1518 (100%) | 76.9587 (100%) | 64.7656 (110%) | |
95.5120 (93.3%) | 77.9377 (98.7%) | 60.3634 (107.3%) | |
165.5510 (53.9%) | 143.0470 (53.8%) | 120.5440 (53.7%) | |
81.7425 (109.1%) | 69.0576 (111.4%) | 55.9540 (115.7%) | |
81.7425 (109.1%) | 69.0576 (111.4%) | 55.9540 (115.7%) | |
; | 81.7425 (109.1%) | 69.0576 (111.4%) | 55.9540 (115.7%) |
159.7890 (55.8%) | 129.9020 (59.2%) | 100.0150 (64.8%) | |
310.5190 (28.7%) | 264.5480 (29.1%) | 218.5760 (29.6%) | |
79.4260 (112.2%) | 67.5372 (114%) | 55.4718 (116.8%) | |
79.4260 (112.2%) | 67.5372 (114%) | 55.4718 (116.8%) | |
; | 79.4260 (112.2%) | 67.5372 (114%) | 55.4718 (116.8%) |
; , | 73.9494 (120.6%) | 62.5112 (123.1%) | 51.0590 (126.8%) |
; , | 73.7819 (120.8%) | 62.4101 (123.3%) | 51.0242 (126.9%) |
For Data 1 |
|||
Estimators | |||
89.1518 (100%)* | 76.9587 (100%) | 64.7656 (100%) | |
79.1207 (112.7%) | 66.9277 (115%) | 54.7346 (118.2%) | |
135.3990 (65.8%) | 123.2060 (62.5%) | 111.0130 (58.3%) | |
78.2200 (114%) | 66.0270 (116.6%) | 53.8339 (120.3%) | |
78.2200 (114%) | 66.0270 (116.6%) | 53.8339 (120.3%) | |
; | 78.2200 (114%) | 66.0270 (116.6%) | 53.8339 (120.3%) |
88.9882 (100.2%) | 76.7952 (100.2%) | 64.6021 (100.2%) | |
164.9850 (54%) | 152.7920 (50.4%) | 140.5990 (46.1%) | |
79.6111 (112%) | 67.4180 (114.2%) | 55.2250 (117.2%) | |
79.6111 (112%) | 67.4180 (114.2%) | 55.2250 (117.2%) | |
; | 79.6111 (112%) | 67.4180 (114.2%) | 55.2250 (117.2%) |
; , | 75.6380 (117.9%) | 63.4450 (121.3%) | 51.2519 (126.4%) |
= -0.0018 | |||
; , | 75.6022 (117.9%) | 63.4091 (121.4%) | 51.2160 (126.4%) |
For Data 2 |
|||
Estimators | |||
89.1518 (100%) | 89.1518 (100%) | 89.1518 (100%) | |
79.3684 (112.3%) | 79.3684 (112.3%) | 79.3684 (112.3%) | |
134.6190 (66.2%) | 134.6190 (66.2%) | 134.6190 (66.2%) | |
78.4585 (113.7%) | 78.4585 (113.7%) | 78.4585 (113.7%) | |
78.4585 (113.7%) | 78.4585 (113.7%) | 78.4585 (113.7%) | |
; | 78.4585 (113.7%) | 78.4585 (113.7%) | 78.4585 (113.7%) |
103.7000 (86%) | 103.7000 (86%) | 103.7000 (86%) | |
212.1900 (42%) | 212.1900 (42%) | 212.1900 (42%) | |
78.4585 (113.6%) | 78.4585 (113.6%) | 78.4585 (113.6%) | |
78.4585 (113.6%) | 78.4585 (113.6%) | 78.4585 (113.6%) | |
; | 78.4585 (113.6%) | 78.4585 (113.6%) | 78.4585 (113.6%) |
; , | 76.1586 (117.1%) | 76.1586 (117.1%) | 76.1586 (117.1%) |
; , | 75.9604 (117.4%) | 75.9604 (117.4%) | 75.9604 (117.4%) |
For Data 1 |
|||
Estimators | |||
89.1518 (100%)* | 76.9587 (100%) | 64.7656 (100%) | |
, | 79.7253 (112.5%) | 67.5322 (114%) | 55.3391 (117%) |
where | 80.9248 (110.2%) | 68.7317 (112%) | 56.5387 (115%) |
; and | 78.7465 (113%) | 66.5534 (115.6%) | 54.3603 (119.1%) |
; and | 77.4988 (115%) | 65.3057 (117.8%) | 53.1126 (121.9%) |
For Data 2 |
|||
Estimators | |||
89.1518 (100%) | 76.9587 (100%) | 64.7656 (100%) | |
, | 79.9309 (111.5%) | 67.7378 (113.6%) | 55.5447 (116.6%) |
where | 80.9248 (110.2%) | 68.7317 (120%) | 56.5387 (114.6%) |
; and | 78.8845 (113%) | 66.6915 (115.4%) | 54.4984 (118.8%) |
; and | 77.9477 (114.4%) | 65.7546 (117%) | 53.5616 (120.9%) |
, , ,
|
||||||
(fixed) |
||||||
Estimators | (Approx.) | (Approx.) | (Approx.) | (Approx.) | ||
For Data 1 | 27 | 1.75 | – | 69.8493 | 100% | |
25 | 1.51 | 83 | 63.4754 | 110% | ||
25 | 1.55 | 77 | 65.0922 | 107.3% | ||
24 | 1.49 | 88 | 62.0982 | 112.5% | ||
24 | 1.45 | 94 | 60.2736 | 115.9% | ||
27 | 1.75 | – | 69.8493 | 100% | ||
25 | 1.51 | 82 | 63.7582 | 110% | ||
For Data 2 | 25 | 1.52 | 82 | 63.7582 | 110% | |
24 | 1.49 | 88 | 62.2119 | 112.3% | ||
24 | 1.47 | 92 | 60.9387 | 114.6% |
, , ,
|
|||||
(fixed) |
|||||
Expected | |||||
Estimators | (Approx.) | (Approx.) | (Approx.) | cost in Rs. | |
For Data 1 | 27 | 1.75 | – | 3613.47 | |
22 | 1.36 | 75 | 3357.02 | ||
23 | 1.49 | 71 | 3422.08 | ||
22 | 1.27 | 79 | 3301.62 | ||
21 | 1.18 | 82 | 3228.21 | ||
27 | 1.75 | – | 3613.47 | ||
22 | 1.38 | 75 | 3368.40 | ||
For Data 2 | 22 | 1.38 | 75 | 3368.40 | |
22 | 1.28 | 78 | 3306.19 | ||
21 | 1.21 | 81 | 3254.97 |
The parameters for data 2 are:
Two different data sets are considered to demonstrate the efficiency of the suggested families of estimators, their minimum mean square errors are calculated along with relevant estimators at various levels of sub-sampling fractions. The percentage relative efficiency of ; with respect to corresponding relevant estimators is calculated by
The minimum mean square errors and of , and , with respect to for data 1 and 2 are respectively given in Table 1–3 while the analysis of cost functions are given in Table 4.
Tables 1 and 2 exhibit that the suggested families of estimators , , and are more efficient than the corresponding estimators at all the sub-sampling fractions for both the data 1 and 2. The mean square errors of the suggested families of estimators are decreasing as the sub-sampling fraction increases for both the data 1 and 2. Similarly, in the case of two-phase sampling estimation, shows efficient results compared to the existing estimators , , , . From Table 4, it has been observed that the estimator is more efficient than the existing estimators , , , for fixed cost while expected cost incurred for is less compared to expected cost incurred for existing estimators , , , . Therefore, the suggested families of estimators can be recommended on the account of theoretical and empirical studies discussed in the text.
[1] Cochran, W.G. (1940). The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produce, The Journal of Agricultural Science, 30(2), pp. 262–275.
[2] Hansen, M.H. and Hurwitz, W.N. (1946). The problem of non-response in sample surveys, Journal of the American Statistical Association, 41, pp. 517–529.
[3] Khare, B.B. (2003). Use of auxiliary information in sample surveys up to 2000- A review, Proc. Biotechnology and Science, India: M/S Centre of Bio-Mathematical Studies, pp. 76–87.
[4] Khare, B.B. and Srivastava, S. (1993). Estimation of population mean using auxiliary character in presence of non-response, National Academy Science Letters, 16, pp. 111–114.
[5] Khare, B.B. and Srivastava, S. (1995). Study of conventional and alternative two phase sampling ratio, product and regression estimators in presence of non-response, Proceedings of the National Academy of Sciences, 65(A) II, pp. 195–203.
[6] Khare, B.B. and Srivastava, S. (1997). Transformed ratio type estimators for the population mean in the presence of non-response, Communications in Statistics – Theory and Methods, 26(7), pp. 1779–1791.
[7] Khare, B.B. and Srivastava, S. (2000). Generalized estimators for population mean in presence of no response, International Journal of Mathematics and Statistics, 9, pp. 75–87.
[8] Khare, B.B. and Sinha, R.R. (2002). Estimation of the ratio of two population means using auxiliary character with unknown population mean in presence of no-response, Progress of Mathematics, B. H. U., 36, pp. 337–348.
[9] Khare, B.B. and Sinha, R.R. (2009). On class of estimators for population mean using multi-auxiliary characters in the presence of non-response, Statistics in Transition New Series, 10(1), pp. 3–14.
[10] Koyuncu, N. and Kadilar, C. (2009). Efficient Estimators for the Population mean, Hacettepe Journal of Mathematics and Statistics, 38(2), pp. 217–225.
[11] Rao, P.S.R.S. (1986). Ratio estimation with sub-sampling the non-respondents, Survey Methodology, 12, pp. 217–230.
[12] Rao, P.S.R.S. (1990). Regression estimators with sub-sampling of non-respondents, In-Data Quality Control, Theory and Pragmatics, (Eds.) Gunar E. Liepins and V.R.R. Uppuluri, Marcel Dekker, New York, pp. 191–208.
[13] Reddy, V.N. (1978). A study on the use of prior knowledge on certain population parameters in estimation, Sankhya C, 40, pp. 29–37.
[14] Singh, H.P. and Kumar, S. (2009). A general class of estimators of the population mean in survey sampling using auxiliary information with sub-sampling the non-respondents, The Korean Journal of Applied Statistics, 22(2), pp. 387–402.
[15] Sinha, R.R. and Kumar, V. (2011). Generalized Estimators for Population Mean with Sub Sampling the Non-Respondents, Aligarh Journal of Statistics, 31, pp. 53–62.
[16] Sinha, R.R. and Kumar, V. (2013). Improved Estimators for Population Mean using Attributes and Auxiliary Characters under Incomplete Information, International Journal of Mathematics and Statistics, 14, pp. 43–54.
[17] Sinha, R.R. and Kumar, V. (2014). Improved classes of estimators for population mean using information on auxiliary character under double sampling the non-respondents, National Academy Science Letters, 37(1), pp. 71–79.
[18] Srivastava, S.K. and Jhajj H.S. (1983). A class of estimators of population mean using multi-auxiliary information, Calcutta Statistical Association Bulletin, 32, pp. 47–56.
[19] Tripathi, T.P., Das, A.K. and Khare, B.B. (1994). Use of auxiliary information in sample surveys – A review, Aligarh Journal of Statistics, 14, pp. 79–134.
R. R. Sinha is an Assistant Professor in the Department of Mathematics, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India and obtained his Ph. D. Degree in “Sampling Techniques” from the Department of Statistics, Banaras Hindu University, Varanasi, India in 2001. He has guided one Ph. D. and three M. Phil. candidates. He has life membership of Indian Statistical Association and International Indian Statistical Association. Dr. Sinha has published more than 25 research papers in international/national journals and conferences and presented more than 22 research papers in international/national conferences. His area of specialization is Sampling Theory, Data Analysis and Inference. ORCID identifier number of Dr. R. R. Sinha is 0000-0001-6386-1973.
Bharti is a Ph. D. student at Dr. B. R. Ambedkar National Institute of Technology, Jalandhar since 2018. She has done her B.Sc. in Computer Science in 2015 from DAV College, Jalandhar (GNDU) and completed her M.Sc. in Mathematics in 2017 from DAV College, Jalandhar (GNDU). She has one year of teaching experience. Bharti is pursuing her doctoral degree in Mathematics at Dr. B. R. Ambedkar, National Institute of Technology, Jalandhar. Her doctoral degree is on Estimation of Parameters using Auxiliary Character under Complete and Incomplete Information.
Journal of Reliability and Statistical Studies, Vol. 13_1, 21–60.
doi: 10.13052/jrss0974-8024.1312
© 2020 River Publishers