Generalized Estimator of Population Mean Using Auxiliary Information in Presence of Measurement Errors

Peeyush Misra

Department of Statistics, D.A.V.(P.G.) College, Dehradun – 248001, Uttarakhand, India
E-mail: dr.pmisra.dav@gmail.com

Received 07 March 2023; Accepted 02 October 2023; Publication 23 January 2024

Abstract

It is assumed in survey research that the respondent’s reported response is precise. More often, due to prestige bias, the data provided by respondents frequently include estimates that are significantly different from the genuine values. As a consequence, measurement error is present in the sample estimates that may affect the results. Therefore, this study illustrates an improved generalized estimator that utilizes auxiliary data under measurement error. A numerical study to establish its effectiveness is also conducted.

Keywords: Auxiliary variable, bias, mean squared error, efficiency and measurement errors.

1 Introduction

Many survey scientists have focused on the issue of parameter estimation in the face of measurement errors. The characteristics of estimators based on data in survey sampling normally presumes that the observations of the attributes being researched is accurate. This premise is not always met in practise, as measurement problems such non-response errors, recording errors, and calculation errors pollute data leading to invalid results. The statistical conclusions drawn from the observed data remain valid if measurement errors are negligibly small and can be disregarded. On the other hand, if they are not comparatively small and inconsequential, the deductions might not only be inaccurate and invalid but frequently have unintended, regrettable, and unfortunate results. For more details one may refer Cochran, W.G. (1968), Lessler Judith. T. and Kalsbeek, William, D. (1992), Paul P. Biemer, Robert M. Groves, Lars E. Lyberg, Nancy A. Mathiowetz and Seymour Sudman (1991), Sukhatme, P. V. and G.R. Seth (1952) etc. Numerous statisticians have addressed the issue of determining population mean when measurement errors are present, including Shalabh (1997), Singh and Karpe (2009), Misra and Yadav (2015), Manisha and Singh R. K (2002).

Let U=U1,U2,,UN be a finite population of N distinct and identifiable units with Y being the study variable and X being the auxiliary variable taking the value Yi and Xi for the unit ith of the population U respectively. Additionally, suppose that n paired observations of characteristics X and Y were gathered using a basic random sampling process without replacement. Further, for size n simple random sample, let (xi,yi) be the observed values rather than true values (Xi,Yi) for the ith (i=1,2,n) sampling unit in the sample. Also, ui=yi-Yi and vi=xi-Xi where ui and vi are related measurement errors that are stochastic in character with mean zero and variance σu2 and σv2 respectively. Additionally, assume that uis and vis are uncorrelated while Xis and Yis are correlated. Let μX,μY,σX2,σY2 and ρ represents the population mean, variances and correlation coefficient between X and Y.

Let x^=1ni=1nxi&y^=1ni=1nyi denotes the unbiased estimators of populations means μX and μY i.e. E(x^)=μX and E(y^)=μY.

But in presence of measurement errors, sX2=1n-1i=1n(xi-x^)2 and sY2=1n-1i=1n(yi-y^)2 are biased estimators of the population variances σX2 and σY2. E(sY2)=σY2+σu2 provides the expected value of sy2, in the presence of measurement errors.

In the event where error variances σu2 and σv2 are known in advance, then σ^Y2=sy2-σu2>0 and σ^X2=sx2-σv2>0 are unbiased estimators of population variances under measurement errors.

Further let,

CY=σYμYandCX=σXμX,γ2Y=β2Y-3,
γ2X=β2X-3,γ2u=β2u-3,γ2v=β2V-3.
γ1(X)=β1(X)
β2(X)=μ4(X)μ22(X),
β2(u)=μ4(u)μ22(u),
β2(v)=μ4(v)μ22(v),
μqrst=E[(X-μX)q(Y-μY)rvsut]
μ2000=σX2,μ0200=σY2,μ0020=σv2andμ0002=σu2.

Now suppose that sample values are collected with measurement errors i.e. the observed values (xi,yi) as different from the values (Xi,Yi). It is suggested to use the following generalised estimator to estimate population mean when measurement errors are present.

y¯^gME=g(y¯,b,x¯,σ^x2) (1)

where ‘b’ is defined as an estimate of the change in y caused by a unity increase in x and g(y¯,b,x¯,σ^x2) is bounded in such a way that at the point (μY,β,μX,σX2), we have

g(μY,β,μX,σX2)=μY (2)
g0=y¯g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)=1 (3)
g1=bg(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)=0 (4)
g2=x¯g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)=-β (5)
g00=y¯2g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)=0 (6)
g22=x¯2g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)=0 (7)

2 Bias and Mean Squared Error

Here we consider the approximations as

y^=μY(1+e0)
x^=μX(1+e1)
σ^y2=σY2(1+e2)
σ^x2=σX2(1+e3)
σ^xy=σXY(1+e4)
so thatE(ei)=0,i=1,2,3,4 (8)

Using the results from Singh and Karpe (2009), we have

E(e02)=CY2nθYandE(e12)=CX2nθX (9)

where, θY=σY2σY2+σu2 and θX=σX2σX2+σv2

E(e32)=AXn,whereAX=γ2X+γ2νσν4σX4+2(1+σν2σX2)2 (10)
E(e22)=AYn,whereAY=γ2Y+γ2uσu4σY4+2(1+σu2σY2)2 (11)
E(e1e3)=μ3000nσX2μX (12)
E(e0e2)=μ0300nσY2μY (13)
E(e0e3)=μ2100nσX2μY (14)
E(e1e2)=μ1200nσY2μX (15)
E(e0e1)=σXYnμXμY=ρCXCYn (16)
E(e1e4)=μ2100nσXYμX (17)
E(e3e4)=μ3100nσX2σXY (18)
E(e2e3)=δ-1n,whereδ=μ2200σX2σY2 (19)
E(e2e4)=μ1200nσY2σXY (20)

Using Taylor’s Series expansion, we now expand g(y¯,b,x¯,σ^x2) as

y¯^gME =g(μY,β,μX,σX2)+(y¯-μY)g0+(b-β)g1+(x¯-μX)g2
+(σ^x2-σX2)g3+12{(y¯-μY)2g00+(b-β)2g11+(x¯-μX)2g22
+(σ^x2-σX2)2g33+2(y¯-μY)(b-β)g01
+2(y¯-μY)(x¯-μX)g02+2(y¯-μY)(σ^x2-σX2)g03
+2(b-β)(x¯-μX)g12+2(b-β)(σ^x2-σX2)g13
+2(X¯-μX)(σ^x2-σX2)g23}
+13!{(y¯-μY)y¯+(b-β)b+(x¯-μX)x¯
+(σ^x2-σX2)σ^x2}3g(y¯*,b*,x¯*,σ^x2*) (21)

where g0,g1,g2,g00&g22 are already defined and

g3=σ^x2g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)
g02=y¯x¯g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)
g12=bx¯g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)
g23=x¯σ^x2g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)
g01=y¯bg(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)
g03=y¯σx2g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)
g13=bσ^x2g(y¯,b,x¯,σ^x2)](μY,β,μX,σX2)
y¯*=μY+h(y¯-μY)
b*=β+h(b-β)
x¯*=μX+h(x¯-μX)
σ^x2*=σX2+h(σ^x2-σX2)for 0<h<1.

Under the above mentioned conditions given from (2) to (7), the expression (2) in terms of ei’s reduces to,

y¯^gME =μY+μYe0+μXe1(-β)+σX2e3g3
+12{σX4e32g33+2μYe0σX2e3g03
+2β(e4-e3+e32-e3e4)μXe1g12
+2β(e4-e3+e32-e3e4)σX2e3g13+2μXe1σX2e3g23}

or

y¯^gME-μY =μYe0-βμXe1+σX2e3g3
+12{σX4e32g33+2μYσX2e0e3g03+2βμX(e1e4-e1e3)g12
+2βσX2(e3e4-e32)g13+2μXσX2e1e3g23} (22)

The expression of bias is now derived as

Bias(y¯^gME) =E(y¯^gME-μY)
=12n{σX4AXg33+2μYσX2μ2100σX2μYg03
+2βμX(μ2100σXYμX-μ3000σX2μX)g12
+2βσX2(μ31σX2σXY-AX)g13+2μXσX2μ3000σX2μXg23}

Now on squaring (22) on both the sides and approximating to the first degree, we have

MSE(y¯^gME) =E(y¯^gME-μY)2
=E(μYe0-βμXe1+σX2e3g3)2
=E(μY2e02+σXY2σX4μX2e12+σX4e32g32
-2μYσXYσX2μXe0e1+2μYσX2e0e3g3
-2σXYσX2μXσX2e1e3g3)

on using values of the expectations given from (8) to (20), the above expression becomes

MSE(y¯^gME) =(1-ρ2)σY2n+1n(σu2+ρ2σY2σv2σX2)
+(σX4AXng32+2μ2100ng3-2σXYσX2nμ3000g3) (24)

Now for optimizing (2) w.r.t g3, the optimum value of g3 is given by

g3=(σXYσX2μ3000-μ2100)σX4AX. (25)

Therefore, using optimum value of g3 from (25), we get

MSE(y¯^gME)min =(1-ρ2)σY2n+1n(σu2+ρ2σY2σv2σX2)
-(σXYσX2μ3000-μ2100)nσX4AX2 (26)

3 Theoretical Comparison

To establish the effectiveness and superiority of the recommended estimator we now contrast it to the usual estimator of mean when measurement errors are present.

y¯m=1nEinyi (27)

Expressing above in terms of ei’s, we have

y¯m=μY(1+e0)
y¯m-μY=μYe0

Therefore

Bias(y¯m)=0 (28)

& From Salabh (1997), we have

MSE(y¯)m=σY2n(1+σu2σY2) (29)

If there exists measurement error, the proposed estimator y¯^gME will now be more effective than the conventional estimate of mean if

MSE(y¯)m-MSE(y¯^gME)min>0

i.e.

σY2n(1+σu2σY2)-(1-ρ2)σY2n-1n(σu2+ρ2σY2σv2σX2)
  +(σXYσX2μ3000-μ2100)nσX4AX2>0
orh>ρ2σY2n(σv2σX2-1)whereh=(σXYσX2μ3000-μ2100)nσX4AX2.

Therefore, if condition (3) is met, the suggested estimator y¯^gME will be more effective than the already accepted typical estimator of mean in the presence of measurement errors.

4 Empirical Study

Using a known population data set, we compare the effectiveness of the proposed estimator to that of the conventional mean estimator in this section. The population set is described as follows. We have taken the data set for empirical study from Gujrati, Porter and Sangeetha (2012) as

Y=i “True Consumption Expenditure”

X=i “True Income”

y=i “Measured consumption expenditure”

x=i “Measure Income”

The population characteristics obtained using above data is as follows

n=10,μX=170,μY=127,σX2=3300,σY2=1278,σu2=32.4001,
σv2=32.3998,CY=0.2815,CX=0.3379,ρXY=0.9641,β2Y=1.9026,
β2X=1.7758,β2u=1.7186,β2v=1.8409

Substituting the above parameters in Equations (2) and (29), the MSE’s of the usual and recommended estimator with measurement errors is given by

MSE(y^m) =131.033
MSE(y^^ME)min =13.372

The above results justify the efficiency of the recommended estimator over the usual counterparts.

5 Conclusion

Mean squared error criterion has been used to assess the effectiveness of the estimators in both theoretical and empirical studies. The proposed estimator is contrasted with the standard mean estimator and it is discovered that the proposed estimator is more effective in terms of MSE. Based on the aforementioned MSEs, the suggested estimator’s percent relative efficiency (PRE) over the standard estimate of mean under measurement error is 979, demonstrating its improved efficiency.

Acknowledgement

I would like to extend my sincere thanks to the editor-in- chief and the anonymous referees for their invaluable input, expertise and insights in improving the paper.

References

[1] Cochran, W.G.(1968): Errors of Measurement in Statistics, Technometrics, 10, 637–666.

[2] Gujrati, D.N., Porter, D.C. and Gunasekar Sangeetha (2012): Basic Econometrics, Fifth Edition, McGraw-Hill Education (India) Private Limited, New Delhi.

[3] Lessler Judith. T. and Kalsbeek, William, D. (1992): Non-Sampling Error in Surveys, John Wiley and Sons.

[4] Maneesha and Singh R.K. (2002): Role of regression estimator involving measurement errors, Brazilian Journal of Probability and Statistics, 16, 39–46.

[5] Misra, S. and Yadav, D. K. (2015): Estimating population mean using known coefficient of variation under measurement errors, in the edited book “Statistics and Informatics in Agricultural Research”, edited by Indian Society of Agricultural Statistics (ISAS), Library Avenue, Pusa, New Delhi and published by Excel India Publishers, New Delhi, ISBN 978-93-84869-98-4, pp. 175–182.

[6] Paul P. Biemer, Robert M. Groves, Lars E. Lyberg, Nancy A. Mathiowetz and Seymour Sudman (1991): Measurement Errors in Surveys, New York: Wiley.

[7] Sukhatme, P. V. and G.R. Seth (1952): Non Sampling Errors in Surveys, Journal of the Indian Society of Agricultural Statistics, 4, 5–41.

[8] Shalabh (1997): Ratio Method of Estimation in the Presence of Measurement Errors, Journal of Indian Society of Agricultural Statistics, Vol. I, No. 2, 150–155.

[9] Singh H.P. and Karpe, N. (2009): A General Procedure for Estimating the General Parameter using Auxiliary Information in Presence of Measurement Errors, Communication of the Korean Statistical Society, 16(5), 821–840.

Biography

images

Peeyush Misra received the bachelor’s degree in Science, the master’s degree in Statistics and Ph.D degree in Statistics from University of Lucknow, Lucknow, Uttar Pradesh, India in 2001, 2004 and 2008, respectively. He is currently working as an Associate Professor at the Department of Statistics, DAV (PG) College, Dehradun, Uttarakhand, India. His field of specialization is sampling Theory.

Abstract

1 Introduction

2 Bias and Mean Squared Error

3 Theoretical Comparison

4 Empirical Study

5 Conclusion

Acknowledgement

References

Biography