An Improvement in Estimation of Population Mean using Two Auxiliary Variables and Two- Phase Sampling Scheme under Non-Response

Manoj K. Chaudhary1 and Amit Kumar2,*

1Department of Statistics, Institute of Science, Banaras Hindu University, Varanasi, India

2Department of Statistics, University of Allahabad, Prayagraj, India

E-mail: ritamanoj15@gmail.com; amit.stat40@gmail.com

*Corresponding Author

Received 02 July 2020; Accepted 16 November 2020; Publication 04 January 2021

Abstract

In the present paper, we have proposed some improved ratio and regression-type estimators of the finite population mean utilizing the information on two auxiliary variables in the presence of non-response. The two-phase sampling scheme has been used to accomplish the job of estimating the desired parameter. The expressions for the basic properties such as bias and mean square error (MSE) of the proposed estimators have been derived up to the first order of approximation. A comparative study of the proposed estimators with some existing estimators has also been carried out through a real data set.

Keywords: Ratio and regression-type estimators, population mean, two-phase sampling scheme, auxiliary information, non-response.

1 Introduction

There are several practical situations in which the survey (study) variable is highly correlated with more than one auxiliary variable. In such situations, the information on the auxiliary variables can be utilized to provide the improved estimates of the population parameters. Olkin (1958) proposed a multivariate ratio estimator utilizing the information on a number of auxiliary variables. Further, it has been extended by Srivastava (1966). Chand (1975) and Kiregyera (1980) have suggested some ratio-type estimators to resolve the problem of estimating the population mean with the help of two or more auxiliary variables.

The problem of non-response is very common in most of the sample surveys. The non-response arises from the fact that the surveyor fails to obtain the information from the respondents. It is of the great importance to deal with the problem of non-response through some innovative ideas or techniques. Hansen and Hurwitz (1946) resolved the problem of non-response by pioneering out a technique of sub-sampling of non-respondents. Besides it, there are several authors who have contributed a lot to deal with the problem of non-response. The authors such as Khare and Srivastava (1995), Singh and Kumar (2009), Singh et al. (2010), Shabbir and Nasir (2013), Verma et al. (2014) and Chaudhary and Kumar (2016) have tackled the problem of non-response in the case of two-phase sampling scheme.

In the subsequent sections, we have suggested some improved ratio and regression-type estimators of the finite population mean for the survey variable under non-response. We have utilized the information on two auxiliary variables with unknown means under the situation in which the first auxiliary variable is suffering from non-response, whereas the second (additional) auxiliary variable is free from the non-response. The basic properties of the proposed estimators have been discussed in detail. An empirical study has also been carried out to compare the efficiency of the suggested estimators with that of some existing estimators.

2 Sampling Strategy

Let U={1,2,,N} be a population of N units. Let Y be the study variable with population mean Y¯. Let X and Z be the auxiliary variables with respective population means X¯ and Z¯. Let yi, xi and zi be the observations on the ith unit for the variables Y, X and Z respectively. The aim of the present work is to estimate the population mean Y¯ using the information on two auxiliary variables X and Z with unknown means X¯ and Z¯ under the assumption that the non-response occurs on study variable Yand auxiliary variable X whereas the auxiliary variable Z is free from the non-response. Adopting the concrete idea of two-phase (or double) sampling scheme, we select a first-phase sample of n units using simple random sampling without replacement (SRSWOR) scheme and collect the information on the auxiliary variables X and Z. Now, a second-phase sample of nunits is selected from the first-phase sample of n units by the method of SRSWOR and the information is collected on Y. Out of n units at the first-phase, it is noted that there are n1 responding units and n2 non-responding units for the auxiliary variable X. Further, we select a sub-sample of h2(h2=n2/L;L>1) units from n2 non-responding units and collect the information on all h2 units. Following Hansen and Hurwitz (1946), the unbiased estimator of X¯ at the first-phase is given by

x¯*=n1x¯n1+n2x¯h2n (1)

The unbiased estimator of Z¯at the first-phase is given as

z¯=1ninzi (2)

where x¯n1 and x¯h2 are the means based on n1 responding units and h2 non-responding units respectively for the auxiliary variable X.

The expressions for the variance of the estimators x¯ and z¯ are respectively given by

V(x¯) =(1n-1N)SX2+(L-1)nW2SX22 (3)
V(z¯) =(1n-1N)SZ2 (4)

where SX2 and SZ2 are the population mean squares of the entire group for the auxiliary variables X and Z respectively. SX22 is the population mean square of the non-response group for the auxiliary variable X. W2 is the non-response rate in the population.

At the second-phase, n1 units do respond and n2 units do not respond on the study variable Y and auxiliary variable X such that n1+n2=n. A sub-sample of h2(h2=n2/L;L>1) units is now selected from the n2 non-responding units and the information is collected from all h2 units on the variables Y and X. Thus, the unbiased estimators of Y¯, X¯ and Z¯ at the second-phase are respectively given as [See Hansen and Hurwitz (1946)]

y¯* =n1y¯n1+n2y¯h2n (5)
x¯* =n1x¯n1+n2x¯h2n (6)
z¯* =n1z¯n1+n2z¯h2n (7)

where y¯n1, x¯n1 and z¯n1 are the means based on n1 responding units for the variables Y, X and Z respectively. y¯h2, x¯h2 and z¯h2 are respectively the means based on the sub-sample of h2 non-responding units for the variables Y, X and Z.

Note: The functional form of the estimator z¯* is similar to that of the estimators y¯* and x¯*. The concept behind it is only to get the similar estimator, however the non-response does not occur on the auxiliary variable Z.

The expressions for the variance of the estimators y¯*, x¯* and z¯* are respectively given by

V(y¯*) =(1n-1N)SY2+(L-1)nW2SY22 (8)
V(x¯*) =(1n-1N)SX2+(L-1)nW2SX22 (9)
V(z¯*) =(1n-1N)SZ2+(L-1)nW2SZ22 (10)

where SY2 and SY22 are respectively the population mean squares of the entire group and non-response group for the study variable Y. SZ22 is the population mean square of the non-response group for the auxiliary variable Z.

Chaudhary and Kumar (2016) have suggested some ratio and regression-type estimators of the finite population mean Y¯ utilizing the information on the auxiliary variable X with unknown mean under the situation in which both the variables Y and Xare suffering from non-response as follows:

T1* =y¯*x¯*x¯ (11)
T2* =y¯*+b*(x¯-x¯*) (12)

where b*=sxy*/sx2*. sxy* and sx*2 are the unbiased estimators based on (n1+h2) units for SXY and SX2 respectively. SXY=ρXYSXSY. ρXY is the population correlation coefficient between Y and X.

The expressions for the mean square error (MSE) of the estimators T1* and T2* up to the first order of approximation are respectively given as

MSE(T1*) =(1n-1N)SY2+(1n-1n)(SY2+R2SX2-2ρXYRSXSY)
+(L-1)nW2SY22
+W2(R2SX22-2ρXY2RSX2SY2)[(L-1)n-(L-1)n] (13)
MSE(T2*) =[(1n-1N)+(1n-1n)(1-ρXY2)]SY2
+(L-1)nW2SY22
+W2(β12SX22+2β1ρXY2SX2SY2)[(L-1)n-(L-1)n] (14)

where β1 is the population regression coefficient of Y on X and ρXY2 is the population correlation coefficient between Y and X of the non-response group. R=Y¯/X¯.

3 Proposed Estimators

We now propose some improved ratio and regression-type estimators of the finite population mean Y¯ utilizing the information on two auxiliary variables X and Z with unknown means under the given situation of non-response as

T1 =y¯*x¯*x¯R (15)
T2 =y¯*+b*(x¯lr-x¯*) (16)

where x¯R=x¯z¯*z¯ , x¯lr=x+b(z¯-z¯*) and b=sxzsz2*. sxz and sz*2 are respectively the unbiased estimators based on (n1+h2) units for SXZ and SZ2. SXZ=ρXZSXSZ. ρXZ is the population correlation coefficient between X and Z.

To obtain the biases and mean square errors of the proposed estimators T1 and T2, we use the concept of large sample approximations. Let

y¯* =Y¯(1+e0),x¯*=X¯(1+e1),z¯*=Z¯(1+e2),x¯=X¯(1+e1),
z¯ =Z¯(1+e2),sxy*=SXY(1+e3),
sx*2 =SX2(1+e4),sxz=SXY(1+e5),sz2=SZ2(1+e6)

such that E(e0)=E(e1)=E(e2)=E(e1)=E(e2)=E(e3)=E(e4)=E(e5)=E(e6)=0,

E(e02) =(1n-1N)CY2+(L-1)nW2SY22Y¯2,E(e12)
=(1n-1N)CX2+(L-1)nW2SX22X¯2,
E(e22) =(1n-1N)CZ2+(L-1)nW2SZ22Z¯2,E(e12)=E(e1e1)
=(1n-1N)CX2+(L-1)nW2SX22X¯2,
E(e22) =E(e2e2)=(1n-1N)CZ2,E(e0e1)
=(1n-1N)ρXYCXCY+(L-1)nW2ρXY2SX2X¯SY2Y¯,
E(e0e2) =(1n-1N)ρYZCYCZ+(L-1)nW2ρYZ2SY2Y¯SZ2Z¯,E(e0e1)
=(1n-1N)ρXYCXCY+(L-1)nW2ρXY2SX2X¯SY2Y¯,
E(e0e2) =(1n-1N)ρYZCYCZ,E(e1e2)
=(1n-1N)ρXZCXCZ+(L-1)nW2ρXZ2SX2X¯SZ2Z¯,
E(e1e2) =E(e1e2)=(1n-1N)ρXZCXCZ,E(e1e3)
=N(N-n)(N-1)(N-2)μ21nX¯SXY+W2(L-1)nμ21(2)X¯SXY,
E(e1e4) =N(N-n)(N-1)(N-2)μ30nX¯SX2+W2(L-1)nμ30(2)X¯SX2,E(e1e5)
=N(N-n)(N-1)(N-2)μ21nX¯SXZ+W2(L-1)nμ21(2)X¯SXZ,
E(e2e1) =(1n-1N)ρXZCXCZ+(L-1)nW2ρXZ2SX2X¯SZ2Z¯,
E(e2e5) =E(e2e4)=E(e2e3)=E(e1e5)
=N(N-n)(N-1)(N-2)μ21nZ¯SXZ+W2(L-1)nμ21(2)Z¯SXZ,
E(e2e6) =N(N-n)(N-1)(N-2)μ30nZ¯SZ2+W2(L-1)nμ30(2)Z¯SZ2,
E(e1e3) =N(N-n)(N-1)(N-2)μ21nX¯SXY+W2(L-1)nμ21(2)X¯SXY,
E(e1e4) =N(N-n)(N-1)(N-2)μ30nX¯SX2+W2(L-1)nμ30(2)X¯SX2,
E(e2e3) =E(e2e4)=E(e2e5)=N(N-n)(N-1)(N-2)μ30nZ¯SZ2,
μrs =1Ni=1N(xi-X¯)r(yi-Y¯)s,
μrs(2) =1N2iN2(xi-X¯(2))r(yi-Y¯(2))s,
μrs =1Ni=1N(xi-X¯)r(zi-Z¯)s,
μrs(2) =1N2i=1N2(xi-X¯(2))r(zi-Z¯(2))s,
Y¯(2) =1N2iN2yi,X¯(2)=1N2iN2xi,Z¯(2)=1N2iN2zi,
CY =SYY¯,CX=SXX¯andCZ=SZZ¯

where ρYZ is the population correlation coefficient between Yand Z. ρYX2 and ρXZ2 are the population correlation coefficients between Y ,X and X, Z respectively for the non-response group. N1 and N2 are respectively the numbers of responding and non-responding units in the population such that N=N1+N2.

Expressing the equation (15) in terms of e0, e1, e1, e2,e2 and then removing the terms having powers of e0, e1, e1, e2,e2 greater than two, we get

T1-Y¯ =Y¯(e0-e1-e2+e1+e2+e02+e12+e12+e22+e22-e0e1
-e0e2+e0e1+e0e2+e1e2-e1e2-e1e1+e1e2-e1e2
-e2e2) (17)

Now, taking expectation on both sides of the equation (17), we find the expression for the bias of T1 up to the first order of approximation as

Bias(T1) =Y¯[(f+f)CX2+fCZ2+(f-f)
(-ρXYCXCY-ρYZCYCZ+ρXZCXCZ)
-fρXZCXCZ+(L-1)nW2SZ22Z¯2
-(L-1)nW2ρYZ2SY2SZ2Y¯Z¯-(L-1)nW2ρXZ2SX2SZ2X¯Z¯
+(SX22X¯2-ρXY2SX2SY2X¯Y¯+ρXZ2SX2SZ2X¯Z¯)
W2[(L-1)n-(L-1)n]] (18)

where f=(1n-1N), f=(1n-1N). .

Squaring both sides of the equation (17) and then taking expectation on avoiding the terms having powers of e0, e1, e1 ,e2,e2 higher than two, we get

E[T1-Y¯]2 =Y¯2[E(e02)+E(e12)+E(e22)+E(e12)+E(e22)
-2E(e0e1)-2E(e0e2)+2E(e0e1)+2E(e0e2)
+2E(e1e2)-2E(e1e2)-2E(e1e1)-2E(e1e2)
-2E(e2e2)+2E(e1e2)]

Thus, the expression for the MSE of T1 up to the first order of approximation is given as

MSE(T1) =Y¯2[(1n-1N)CY2+(1n-1n)
(C+X2CZ2-2ρXYCXCY-2ρYZCYCZ
+2ρXZCXCZ)+W2(L-1)n
(SY22Y¯2+SZ22Z¯2-2ρYZ2SY2SZ2Y¯Z¯)
+W2(SX22X¯2-2ρXY2SX2SY2X¯Y¯+2ρXZ2SX2SZ2X¯Z¯)
[(L-1)n-(L-1)n]] (19)

Now, expressing the Equation (16) in terms of e0, e1, e1, e2,e2, e3, e4, e5, e6 and then neglecting the terms involving powers of e0, e1, e1, e2,e2 e3, e4, e5, e6 greater than two, we get

T2-Y¯ =Y¯e0+β1X¯(e1-e1e4+e1e3-e1+e1e4-e1e3)
+β1β2Z¯(e2-e2-e2e6+e2e6-e4e2-e4e2
+e3e2-e3e2+e2e5-e2e5) (20)

where β2 is the population regression coefficient of X on Z.

The expression for the bias of T2up to the first order of approximation can be obtained on taking expectation of both sides of the equation (3). Thus, we have

Bias(T2)
  =β1{(N(N-n)n(N-1)(N-2)-N(N-n)n(N-1)(N-2))(μ21SYX-μ30SX2)
  -W2(μ21(2)SYX-μ30(2)SX2)((L-1)n-(L-1)n)}
  +β1β2{(N(N-n)n(N-1)(N-2))(μ21SXZ-μ30SZ2)
  -(N(N-n)n(N-1)(N-2))(3μ21SXZ-μ30SZ2)
-W2(L-1n)(3μ21(2)SXZ-μ30(2)SZ2)} (21)

Squaring both sides of the equation (3) and then captivating expectation on neglecting the terms having powers of e0, e1, e1, e2,e2 e3, e4, e5, e6 larger than two, we get

E(T2-Y¯)2 =Y¯2E(e02)+β12X¯2(E(e12)+E(e12)-2E(e1e1))
+β12β22Z¯2(E(e22)+E(e22)-2E(e2e2))+2Y¯X¯β1
×(E(e0e1)-E(e0e1))+2β1β2Y¯Z¯(E(e0e2)-E(e0e2))
+2β12β2X¯Z¯(E(e1e2)-E(e1e2)-E(e1e2)+E(e1e2))

Therefore, the expression for the MSE of T2 up to the first order of approximation is given by

MSE(T2) =(1n-1N)SY2+(1n-1n)
×(β1SX22+β12β22SZ2-2β1ρXYSXSY
-2β1β2ρYZSYSZ-2β12β2ρXZSXSZ)
+(L-1)nW2(SY22+β12β22SZ22-2β1β2ρYZ2SY2SZ2)
+W2(β1SX222-2β1ρXY2SX2SY2+2β12β2ρXZ2SX2SZ2)
×[(L-1)n-(L-1)n] (22)

4 Empirical Study

Here, the data considered by Khare and Sinha (2007) have been used to demonstrate the theoretical results. The data relate to the physical growth of upper socioeconomic group of 95 school children of Varanasi district under an ICMR study, Department of Pediatrics, B.H.U., during 1983–84. In this data set, we have Y = Weight (in kg.) of the children, X = Skull circumference (in cm.) of the children and Z = Chest circumference (in cm.) of the children. The first 25% units have been considered as the non-responding units for all the variables Y, X and Z. The details are given below:

N =95,n=70,n=35,Y¯=19.4968,X¯=51.1726
Z¯ =55.8611,CY=0.15613,CX=0.03006,
CZ =0.05860,SY2=2.3542,SX2=1.2681,SZ2=3.0176,
ρYX =0.328,ρYX2=0.477,
ρYZ =0.846,ρYZ2=0.729,ρXZ=0.297,
ρXZ2 =0.570,W2=0.25.

Table 1 shows the variance/MSE of the estimatorsy¯*, T1*, T2*, T1 and T2 along with their percentage relative efficiency (PRE). The PRE is computed with respect to y¯*.

Table 1 Variance/MSE and PRE of y¯*, T1*, T2*, T1, T2

Variance/MSE PRE

L=L y¯* T1* T1 T2* T2 y¯* T1* T1 T2* T2
1.5 0.187 0.171 0.150 0.146 0.106 100.00 109.57 124.67 127.65 176.36
2.0 0.207 0.188 0.165 0.165 0.116 100.00 109.78 125.27 125.51 177.66
2.5 0.226 0.206 0.180 0.183 0.127 100.00 109.96 125.77 123.79 178.74
3.0 0.246 0.224 0.195 0.201 0.137 100.00 110.10 126.19 122.39 179.65

From the above table, it is revealed that the MSE of the proposed ratio-type estimator T1is much smaller than that of the existing ratio-type estimator T1* and hence the PRE of T1 is higher than that of T1*. The same pattern of results is seen in the case of the proposed regression-type estimator T2 and the existing regression-type estimator T2*. It is also revealed that the proposed estimators T1 and T2 perform very well than the usual mean estimator y¯*.

5 Concluding Remarks

We have proposed some improved ratio and regression-type estimators for estimating the finite population mean utilizing the information on two auxiliary variables in the presence of non-response. We have discussed the situation in which the knowledge about the population means of both auxiliary variables is not available and hence we have adopted the concrete idea of two-phase sampling scheme to get the desired information. A theoretical study along with the basic properties of the suggested estimators has been presented. To strengthen the theoretical results, we have carried out an empirical study by considering a natural data set. In Table 1, we have presented a comparative study of the proposed estimators with the usual mean estimator and some other existing estimators. The Table 1 shows that the proposed ratio and regression-type estimators T1 and T2 provide better estimates as compared to the usual mean estimator y¯* and existing ratio and regression-type estimators T1* and T2*. Thus, the proposed estimators may be preferred over the usual mean estimator and existing estimators.

References

[1] Chand, L. (1975). Some ratio-type estimators based on two or more auxiliary variables, Unpublished Ph. D. Thesis, Iowa State University, Ames, Iowa, U.S.A.

[2] Chaudhary, M. K. and Kumar, A. (2016). Estimation of mean of finite population using double sampling scheme under non-response, Journal of Statistics Application and Probability (JSAP), Vol. 5, No. 2, 287–297.

[3] Hansen, M. H. and Hurwitz, W. N. (1946). The problem of non-response in sample surveys, Journal of the American Statistical Association, 41, 517–529.

[4] Khare, B. B. and Sinha, R. R. (2007). Estimation of the ratio of the two population means using multi auxiliary characteristics in the presence of non-response, In Statistical Techniques in Life Testing, Reliability, Sampling Theory and Quality Control, 163–171.

[5] Khare, B. B. and Srivastava, S. (1995). Study of conventional and alternative two-phase sampling ratio, product and regression estimators in presence of non-response, Nat. Acad. Sci. Lett. India, 65, 195–203.

[6] Kiregyera, B. (1980). A chain ratio-type estimator in finite population double sampling using two auxiliary variables, Metrika, 31, 215–226.

[7] Olkin, I. (1958). Multivariate ratio estimation for finite populations, Biometrika, 45, 154–165.

[8] Shabbir, J. and Nasir, S. (2013). On estimating the finite population mean using two auxiliary variables in two phase sampling in the presence of non response, Communication in Statistics-Theory and Methods, 42, 4127–4145.

[9] Singh, H. P. and Kumar, S. (2009). A general procedure of estimating the population mean in the presence of non-response under double sampling using auxiliary information, SORT, 33(2.1), 71–84.

[10] Singh, H. P., Kumar, S. and Kozak, M. (2010). Improved estimation of finite population mean using sub-sampling to deal with non response in two phase sampling scheme, Communication in Statistics-Theory and Method, 39(5), 791–802.

[11] Srivastava, S. K. (1966). A note on Olkin’s multivariate ratio estimator, Journal of the Indian Statistical Association, 4, 202–208.

[12] Verma, H. K., Sharma, P. and Singh, R. (2014). Some ratio cum product type estimators for population mean under double sampling in the presence of non-response, J. Stat. Appl. Pro. 3, No. 3, 379–385.

Biographies

images

Manoj K. Chaudhary is a Professor (Associate) in the Department of Statistics at Banaras Hindu University, Varanasi, India. He has made significant contributions to the survey methodology and methods of estimation. He has published more than 45 research papers in internationally and nationally renowned journals.

images

Amit Kumar is a Guest Faculty in the Department of Statistics at University of Allahabad, Prayagraj, India. He has made significant contributions to the survey methodology and methods of estimation. He has published more than 8 research papers in internationally and nationally renowned journals.

Abstract

1 Introduction

2 Sampling Strategy

3 Proposed Estimators

4 Empirical Study

5 Concluding Remarks

References

Biographies