Families of Estimators for Estimating Mean Using Information of Auxiliary Variate Under Response and Non-Response

R. R. Sinha* and Bharti

Department of Mathematics, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India

E-mail: raghawraman@gmail.com; bhartikhanna_512@yahoo.com

*Corresponding Author

Received 20 July 2019; Accepted 25 March 2020; Publication 10 September 2020

Abstract

This research article is concerned with the efficiency improvement of estimators for finite population mean under complete and incomplete information rising as a result of non-response. Different families of estimators for estimating the mean of study variate via known population mean, proportion and rank of auxiliary variate under different situations are proposed along with their bias and mean square error (MSE). Optimum conditions are suggested to attain minimum mean square error of proposed families of estimators. Further the problem is extended for the situation of unknown parameters of auxiliary variate and two phase sampling families of estimators are suggested along with their properties under fixed cost and precision. Employing real data sets, theoretical and empirical comparisons are executed to explain the efficiency of the proposed families of estimators.

Keywords: Non-response, bias, mean square error, rank, auxiliary variate

1 Introduction

It is a well-known fact that the efficiency of the estimators often increases by utilising the available information of the auxiliary variate [see Cochran (1940), Tripathi et al. (1994), Khare (2003)], but there are many causes when the required information may not be available specifically or completely, on different variates. For instance, sometimes respondents are reluctant to answer the questionnaire or do not provide complete information, this circumstance is called non-response which causes the decrease in the efficiency. To resolve this problem of non-response, Hansen and Hurwitz (1946) introduced a methodology of sub-sampling from non-responding group of population to suggest an unbiased estimator for estimating the population mean. Several authors Rao (1986, 90), Khare and Srivastava (1993, 95, 97, 2000), Khare and Sinha (2002, 2009), Singh and Kumar (2009), Sinha and Kumar (2011, 13, 14) suggested various type of estimators/classes of estimators for estimating the population mean using the methodology of sub-sampling the non-respondents with known and unknown population mean of auxiliary variate(s).

Let us suppose that 𝕊=(𝕊1,𝕊2𝕊N) is a finite population consist with 𝒩 units. A sample 𝕊n of size n is drawn from 𝕊 using simple random sampling without replacement (SRSWOR) method and value of study and auxiliary variates for the ith unit (i=1,2,n) of the population is denoted by yi and xi respectively. Let the population is dichotomy in two non-overlapping strata of responding (𝒩1 units) and non-responding (𝒩2 units) groups, i.e. 𝕊=𝕊𝒩1+𝕊𝒩2. Further, in a sample (𝕊n) of n units, n1 responding (𝕊n1) and n2 non-responding units (𝕊n2) are supposed to come from 𝕊𝒩1 responding and 𝕊𝒩2 non-responding group of population. Let a subsample of size r=n2h, (h>1) units is drawn arbitrarily from 𝕊n2, where h denotes the sub-sampling fraction at second phase sample. If w1=n1n and w2=n2n are the estimates of unknown proportions W1=𝒩1𝒩 and W2=𝒩2𝒩 respectively, therefore on the basis of available information on (n1+r) units, Hansen and Hurwitz (1946) suggested an unbiased estimator for estimating the population mean as

th=y¯h=w1y¯1+w2y¯2(r) (1)

whose variance is given by

𝒱(th)=Y¯2[λnC02+λhC0(2)2] (2)

where λn=1n-1𝒩, λh=w2(h-1)n, C02(=Sy2Y¯2) and C0(2)2(=Sy(2)2Y¯(2)2) are the coefficients of variation of complete and non-responding group of the study variate while y¯1 and y¯2(r) are sample means of the study variate depending upon n1 and r units respectively.

Following the strategies of Rao (1986, 90), Khare and Srivastava (1993), and Khare and Srivastava (2000), the conventional ratio, product, regression, generalized and class of estimators for estimating the population mean with known mean and proportion of auxiliary variate under unit non-response on study as well as auxiliary variates may be defined as

tr1(1)=y¯hX¯x¯h, tr1(2)=y¯hP¯xp¯xh,
tp1(1)=y¯hx¯hX¯, tp1(2)=y¯hp¯xhP¯x,
treg1(1)=y¯h+b1h(X¯-x¯h), treg1(2)=y¯h+b2h(P¯x-p¯xh),
tg1(1)=y¯h(x¯hX¯)a1, tg1(2)=y¯h(p¯xhP¯x)a2,
tc1(1)=f1(y¯h,θ1) where θ1=x¯hX¯, tc1(2)=f2(y¯h,θ2) where θ2=p¯xhP¯x,

The Bias and MSE of all the above estimators up to the order of n-1 are given as:

(tr1(1)) =Y¯[λn(C12-2ρ01C0C1)+λh(C1(2)2-2ρ01(2)C0(2)C1(2))] (3)
(tr1(1)) =Y¯2[λn(C02+C12-2ρ01C0C1)+λh(C0(2)2
+C1(2)2-2ρ01(2)C0(2)C1(2))] (4)
(tp1(1)) =Y¯[λnC0C1+λhρ01(2)C0(2)C1(2)] (5)
(tp1(1)) =Y¯2[λn(C02+C12+2ρ01C0C1)+λh(C0(2)2+C1(2)2
+2ρ01(2)C0(2)C1(2))] (6)
(treg1(1)) =βλn(μ30Sx2-μ21Syx) (7)

where β=syxSx2, μ30=E(x¯h-X¯)3(y¯h-Y¯)0 and μ21=E(x¯h-X¯)2 (y¯h-Y¯)

(treg1(1))={λnSy2+λhSy(2)2}-{λnρ01SySx+λhρ01(2)Sy(2)Sx(2)}2{λnSx2+λhSx(2)2} (8)

at

(b1h)opt ={λnρ01SySx+λhρ01(2)Sy(2)Sx(2)}{λnSx2+λhSx(2)2},
(tg1(1)) =Y¯[α1(α1-1)2{λnC12+λhC1(2)2}
+α1{λnρ01C0C1+λhρ01(2)C0(2)C1(2)}] (9)
[(tg1(1))]min ={λnSy2+λhSy(2)2}
-{λnρ01SySx+λhρ01(2)Sy(2)Sx(2)}2{λnSx2+λhSx(2)2} (10)

at

(α1)opt =-{λnρ01C0C1+λhρ01(2)C0(2)C1(2)}{λnC12+λhC1(2)2},
(tc1(1)) ={λnSy2+λhSy(2)2}22f1y¯h2|(y¯h,θ1)
+{λnC12+λhC1(2)2}22f1θ12|(y¯h,θ1)
+Y¯{λnρ01C0C1+λhρ01(2)C0(2)C1(2)}2f1y¯hθ1|(y¯h,θ1) (11)

where

y¯h =Y¯+ψ0(y¯h-Y¯),θ1=1+ψ1(θ1-1), 0<ψ0,ψ1<1,
[(tc1(1))]min ={λnSy2+λhSy(2)2}-{λnρ01SySx+λhρ01(2)Sy(2)Sx(2)}2{λnSx2+λhSx(2)2} (12)

at

f1(2)(Y¯,1) =f1θ1|(Y¯,1)=-X¯{λnρ01SySx+λhρ01(2)Sy(2)Sx(2)}{λnSx2+λhSx(2)2},
(tr1(2)) =Y¯[{λnC22+λhC2(2)2}-{λnρ02C0C2+λhρ02(2)C0(2)C2(2)}] (13)
(tr1(2)) =Y¯2[λn(C02+C22-2ρ02C0C2)
+λh(C0(2)2+C2(2)2-2ρ02(2)C0(2)C2(2)) (14)
(tp1(2)) =Y¯[λnρ02C0C2+λhρ02(2)C0(2)C2(2)] (15)
(tp1(2)) =Y¯2[λn(C02+C22+2ρ02C0C2)+λh(C0(2)2
+C2(2)2+2ρ02(2)C0(2)C2(2))] (16)
(treg1(2)) =βλn(30Sp2-21Syp) (17)

where β=sypSp2, δ30=E(p¯xh-P¯x)3(y¯h-Y¯)0 and δ21=E(p¯xh-P¯x)2 (y¯h-Y¯),

(treg1(2))={λnSy2+λhSy(2)2}-{λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}2{λnSp2+λhSp(2)2} (18)

at

(b2h)opt ={λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}{λnSp2+λhSp(2)2},
(tg1(2)) =Y¯[α2(α2-1)2{λnC22+λhC2(2)2}
+α2{λnρ02C0C2+λhρ02(2)C0(2)C2(2)}] (19)
[(tg1(2))]min ={λnSy2+λhSy(2)2}
-{λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}2{λnSp2+λhSp(2)2} (20)

at

(α2)opt =-{λnρ02C0C2+λhρ02(2)C0(2)C2(2)}{λnC22+λhC2(2)2},
(tc1(2)) ={λnSy2+λhSy(2)2}22f2y¯h2|(y¯h,θ2)
+{λnC22+λhC2(2)2}22f2θ22|(y¯h,θ2)
+Y¯{λnρ02C0C2+λhρ02(2)C0(2)C2(2)}2f2y¯hθ2|(y¯h,θ2) (21)

where

y¯h=Y¯+ψ0(y¯h-Y¯),θ2=1+ψ2(θ2-1),0<ψ0,ψ2<1,

and

[(tc1(2))]min={λnSy2+λhSy(2)2}-{λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}2{λnSp2+λhSp(2)2} (22)

at

f2(2)(Y¯,1)=f2θ2|(Y¯,1)=-P¯x{λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}{λnSp2+λhSp(2)2}.

However, it has been observed in some cases when complete information is available on auxiliary variate but not on study variate [see Rao (1990)], so in this circumstance the alternative ratio, product, regression, generalized and class of estimators using the known population mean and proportion of the auxiliary variate may be defined as

tr2(1)=y¯hX¯x¯, tr2(2)=y¯hP¯xp¯x,
tp2(1)=y¯hx¯X¯ , tp2(2)=y¯hp¯xP¯x,
treg2(1)=y¯h+b3h(X¯-x¯), treg2(2)=y¯h+b4h(P¯x-p¯x),
tg2(1)=y¯h(x¯X¯)a3, tg2(2)=y¯h(p¯xP¯x)a4,
tc2(1)=f3(y¯h,θ3) where θ3=x¯X¯, tc2(2)=f4(y¯h,θ4) where θ4=p¯xP¯x,

The Bias and MSE of all the above estimators up to the order of n-1 are given as follows:

(tr2(1)) =λnY¯(C12-2ρ01C0C1) (23)
(tr2(1)) =Y¯2{λnC12+(λnC02+λhC0(2)2)-2λnρ01C0C1} (24)
(tp2(1)) =λnY¯ρ01C0C1 (25)
(tp2(1)) =Y¯2{λnC12+(λnC02+λhC0(2)2)+2λnρ01C0C1} (26)
(treg2(1)) =βλn(μ30Sx2-μ21Syx) (27)

where β=syxSx2,μ30=E(x¯-X¯)3(y¯-Y¯)0 and μ21=E(x¯-X¯)2(y¯-Y¯)

(treg2(1)) ={λnSy2+λhSy(2)2}-λnρ012Sy2at(b3h)opt=ρ01SySx, (28)
(tg2(1)) =Y¯[α3(α3-1)2λnC12+α3λnρ01C0C1] (29)
[(tg2(1))]min ={λnSy2+λhSy(2)2}-λnρ012Sy2at(α3)opt=-ρ01C0C1 (30)
(tc2(1)) ={λnSy2+λhSy(2)2}22f3y¯h2|(y¯h,θ3)
+{λnC12}22f3θ32|(y¯h,θ3)
+Y¯{λnρ01C0C1}2f3y¯hθ3|(y¯h,θ3) (31)

where y¯h=Y¯+ψ0(y¯h-Y¯), θ3=1+ψ3(θ3-1), 0<ψ0,ψ3<1,

[(tc2(1))]min ={λnSy2+λhSy(2)2}-λnρ012Sy2atf3(2)(Y¯,1)
=f3θ3|(Y¯,1)=-X¯ρ01SySx (32)
(tr2(2)) =λnY¯(C22-ρ02C0C2) (33)
(tr2(2)) =Y¯2{λnC22+(λnC02+λhC0(2)2)-2λnρ02C0C2}
(tp2(2)) =λnY¯ρ02C0C2 (35)
(tp2(2)) =Y¯2{λnC22+(λnC02+λhC0(2)2)+2λnρ02C0C2} (36)
(treg2(2)) =βλn(δ30Sp2-δ21Syp) (37)

where β=sypSp2, δ30=E(p¯x-P¯x)3(y¯-Y¯)0 and δ21=E(p¯x-P¯x)2 (y¯-Y¯)

(treg2(2)) ={λnSy2+λhSy(2)2}-λnρ022Sy2at(b4h)opt=ρ02SySp (38)
(tg2(2)) =Y¯[α4(α4-1)2λnC22+α4λnρ02C0C2] (39)
[(tg2(2))]min ={λnSy2+λhSy(2)2}-λnρ022Sy2at(α4)opt=-ρ02C0C2 (40)
(tc2(2)) ={λnSy2+λhSy(2)2}22f4y¯h2|(y¯h,θ4)
+{λnC22}22f4θ42|(y¯h,θ4)
+Y¯{λnρ02C0C2}2f4y¯hθ4|(y¯h,4) (41)

where

y¯h=Y¯+ψ0(y¯h-Y¯),θ4=1+ψ4(θ4-1),0<ψ0,ψ4<1,

and

[(tc2(2))]min ={λnSy2+λhSy(2)2}-λnρ022Sy2atf4(2)(Y¯,1)
=f4θ4|(Y¯,1)=-P¯xρ02SySp (42)

Further, Sinha and Kumar (2014) extended this problem to propose the classes of estimators using population mean and proportion of auxiliary variate when non-response occurs only on study variate and studied their properties.

2 Proposed Estimators

Following the approach of Sinha and Kumar (2014) and using the information of known mean, proportion and rank of auxiliary variable, two situations are considered for wider families of estimators to estimate the population mean of study variate.

Situation I: The families of estimators proposed for the situation when non-response takes place on both study and auxiliary variates

𝒯1=y¯hg1(θ1,η1)whereθ1=x¯hX¯,η1=r¯xhR¯x (43)

and

𝒯2=y¯hg2(θ2,η1)whereθ2=p¯xhP¯x,η1=r¯xhR¯x (44)

such that gi(1,1)=1i=1,2.

Situation II: When non-response takes place only on study variate but complete response accessible on auxiliary variate, the families of estimators proposed for this situation are

𝒯3=y¯hg3(θ3,η2)whereθ3=x¯X¯,η2=r¯xR¯x (45)

and

𝒯4=y¯hg4(θ4,η2)whereθ4=p¯xP¯x,η2=r¯xR¯x (46)

such that gj(1,1)=1j=3,4.

The families of estimators [𝒯1,𝒯2] and [𝒯3,𝒯4] may be combine to study their properties and they may be defined as

𝒯i=y¯hgi(θi,η1);i=1,2 (47)

and

𝒯j=y¯hgj(θj,η2);j=3,4 (48)

such that gi(1,1)=gj(1,1)=1i=1,2;j=3,4.

It is to be noted here that the continuous functions gi(θi,η1) and gj(θj,η2) assume positive values in a bounded subsets 𝒟i and 𝒟j containing the point (1,1) on the real line. The first and second order partial derivatives of functions gi(θi,η1) and gj(θj,η2) with respect to (θi,η1) and (θj,η2) are assumed to be continuous and bounded in 𝒟i and 𝒟j respectively.

To obtain Bias and MSE of 𝒯i and 𝒯j, we assume for ease that the population size is large enough as compared to the sample size so that the finite population correction (f.p.c) terms may be ignored. Therefore, we define the following terms under the large sample approximation as

0 =y¯h-Y¯Y¯,1=x¯h-X¯X¯,2=p¯xh-P¯xP¯x,3=r¯xh-R¯xR¯x,
4 =x¯-X¯X¯,5=p¯x-P¯xP¯x,and6=r¯x-R¯xR¯x

such that E(i)=0,i=0,1,2,,6.

and the following terms are obtained up to the first order of approximation

E(ε02)=λnC02+λhC0(2)2,E(ε12)=λnC12+λhC1(2)2,E(ε22)=λnC22+λhC2(2)2,E(ε32)=λnC32+λhC3(2)2,E(ε42)=λnC12,E(ε52)=λnC22,E(ε62)=λnC32,E(01)=λnC01+λhC01(2),E(02)=λnC02+λhC02(2),E(03)=λnC03+λhC03(2),E(04)=λnC01,E(05)=λnC02,E(06)=λnC03,E(12)=λnC12+λhC12(2),E(13)=λnC13+λhC13(2),E(23)=λnC23+λhC23(2)E(45)=λnC12,E(46)=λnC13,E(56)=λnC23.

where

C02 =Sy2Y¯2,C12=Sx2X¯2,C22=Sp2P¯x2,C32=Sr2R¯x2,C01=SyxY¯X¯,
C02 =SypY¯P¯x,C03=SyrY¯R¯x,C12=SxpX¯P¯x,C13=SxrX¯R¯x,
C23 =SprP¯xR¯x,C0(2)2=Sy(2)2Y¯2,C1(2)2=Sx(2)2X¯2,C2(2)2=Sp(2)2P¯x2,
C3(2)2 =Sr(2)2R¯x2,C01(2)=Syx(2)Y¯X¯,C02(2)=Syp(2)Y¯P¯x,
C03(2) =Syr(2)Y¯R¯x,C12(2)=Sxp(2)X¯P¯x,C13(2)=Sxr(2)X¯R¯x,
C23(2) =Spr(2)P¯xR¯x,Syx=ρ01SySx,Syp=ρ02SySp,
Syr =ρ03SySr,Sxp=ρ12SxSp,Sxr=ρ13SxSr,
Spr =ρ23SpSr,Syx(2)=ρ01(2)Sy(2)Sx(2),
Syp(2) =ρ02(2)Sy(2)Sp(2),Syr(2)=ρ03(2)Sy(2)Sr(2),
Sxp(2) =ρ12(2)Sx(2)Sp(2),Sxr(2)=ρ13(2)Sx(2)Sr(2),
Spr(2) =ρ23(2)Sp(2)Sr(2).

Here (ρ01,ρ02,ρ03,ρ12,ρ13,ρ23) and (ρ01(2),ρ02(2),ρ03(2),ρ12(2), ρ13(2), ρ23(2)) denote the coefficient of correlation between (y,x),(y,p),(y,r), (x,p),(x,r),(p,r) respectively for complete and non-responding group of population while (ρ02,ρ12,ρ23) and (ρ02(2),ρ12(2),ρ23(2)) denote the coefficient of bi-serial correlation between (y,p), (x,p), (p,r) for complete and non-responding group of population.

Now, expanding the functions gi(θi,η1);i=1,2 and gj(θj,η2);j=3,4 about the point (θi,η1)=(θj,η2)=(1,1) by Taylor’s series up to the partial derivatives of second order and applying the condition gi(1,1)=gj(1,1)=1, we have

𝒯1 =Y¯[1+1g1(1)+3g1(2)+ε122g1(11)(θ1,η1)+ε322g1(22)(θ1,η1)
+13g1(12)(θ1,η1)+0+01g1(1)+03g1(2)] (49)
𝒯2 =Y¯[1+2g2(1)+3g2(2)+ε222g2(11)(θ2,η1)+ε322g2(22)(θ2,η1)
+23g2(12)(θ2,η1)+0+02g2(1)+03g2(2)] (50)
𝒯3 =Y¯[1+4g3(1)+6g3(2)+ε422g3(11)(θ3,η2)+ε622g3(22)(θ3,η2)
+46g3(12)(θ3,η2)+0+04g3(1)+06g3(2)] (51)
𝒯4 =Y¯[1+5g4(1)+6g4(2)+ε522g4(11)(θ4,η2)+ε622g4(22)(θ4,η2)]
+56g4(12)(θ4,η2)+0+05g4(1)]+06g4(2)] (52)

where

gi(1) =giθi|(1,1),gi(2)=giη1|(1,1),gi(11)(θ1,η1)=2giθi2,
gi(22)(θi,η1) =2giη12|(θi,η1),gi(12)(θi,η1)=2giθiη1|(θi,η1)
θi =1+ψi(θi-1),η1=1+ϕ1(η1-1),
 0<ψi,ϕ1<1,i=1,2.
gj(1) =gjθj|(1,1),gj(2)=gjη2|(1,1),
gj(11)(θj,η2) =2gjθj2|(θj,η2),
gj(22)(θj,η2) =2gjη22|(θj,η2),gj(12)(θj,η2)=2gjθjη2|(θj,η2)
θj =1+ψj(θj-1),η2=1+ϕ2(η2-1),
0 <ψj,ϕ2<1,j=3,4.

Since, it is assumed that the sample size is so large to justify the first degree of approximation and under the regularity conditions imposed on 𝒯i;i=1,2,,4, their Bias and MSE will always exist. Therefore, the bias and MSE of 𝒯i;i=1,2,,4 up to the first order of approximation [O(n-1)] are as follows

(𝒯1) =Y¯[(λnC12+λhC1(2)22)g1(11)(θ1,η1)
+(λnC32+λhC3(2)22)g1(22)(θ1,η1)
+{λnρ13C1C3+λhρ13(2)C1(2)C3(2)}g1(12)(θ1,η1)
+{λnρ01C0C1+λhρ01(2)C0(2)C1(2)}g1(1)
+{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}g1(2)] (53)
(𝒯1) =Y¯2[(λnC12+λhC1(2)22){g1(1)}2
+(λnC32+λhC3(2)22){g1(2)}2+(λnC02+λhC0(2)22)
+2{λnρ13C1C3+λhρ13(2)C1(2)C3(2)}g1(1)g1(2)
+2{λnρ01C0C1+λhρ01(2)C0(2)C1(2)}g1(1)
+2{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}g1(2)] (54)
(𝒯2) =Y¯[(λnC22+λhC2(2)22)g2(11)(θ2,η1)
+(λnC32+λhC3(2)22)g2(22)(θ2,η1)
+{λnρ23C2C3+λhρ23(2)C2(2)C3(2)}g2(12)(θ2,η1)
+{λnρ02C0C2+λhρ02(2)C0(2)C2(2)}g2(1)
+{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}g2(2)] (55)
(𝒯2) =Y¯2[(λnC22+λhC2(2)22){g2(1)}2
+(λnC32+λhC3(2)22){g2(2)}2+(λnC02+λhC0(2)22)
+2{λnρ23C2C3+λhρ23(2)C2(2)C3(2)}g2(1)g2(2)
+2{λnρ02C0C2+λhρ02(2)C0(2)C2(2)}g2(1)
+2{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}g2(2)] (56)
(𝒯3) =Y¯[(λnC122)g3(11)(θ3,η2)+(λnC322)g3(22)(θ3,η2)
+{λnρ13C1C3}g3(12)(θ3,η2)
+{λnρ01C0C1}g3(1)+{λnρ03C0C3}g3(2)] (57)
(𝒯3) =Y¯2[(λnC122){g3(1)}2+(λnC322){g3(2)}2
+(λnC02+λhC0(2)22)+2{λnρ13C1C3}g3(1)g3(2)
+2{λnρ01C0C1}g3(1)+2{λnρ03C0C3}g3(2)] (58)
(𝒯4) =Y¯[(λnC222)g4(11)(θ4,η2)+(λnC322)g4(22)(θ4,η2)
+{λnρ23C2C3}g4(12)(θ4,η2)+{λnρ02C0C2}g4(1)
+{λnρ03C0C3}g4(2)] (59)
(𝒯4) =Y¯2[(λnC222){g4(1)}2+(λnC322){g4(2)}2
+(λnC02+λhC0(2)22)+2{λnρ23C2C3}g4(1)g4(2)
+2{λnρ02C0C2}g4(1)+2{λnρ03C0C3}g4(2)] (60)

Using the principle of maxima and minima, partially differentiating the MSE’s of 𝒯i;i=1,2,,4 with respect to the corresponding gi(1) and gi(2), the optimum values of the functions can be obtained. Hence, the minimum MSE’s of 𝒯i;i=1,2,,4 along with the optimum conditions are given by

[(𝒯1)]min ={λnSy2+λhSy(2)2}-{λnρ01SySx+λhρ01(2)Sy(2)Sx(2)}2{λnSx2+λhSx(2)2}
-[{λnSx2+λhSx(2)2}{λnρ03SySr+λhρ03(2)Sy(2)Sr(2)}-{λnρ01SySx+λhρ01(2)Sy(2)Sx(2)}{λnρ13SxSr+λhρ13(2)Sx(2)Sr(2)}]2{λnSx2+λhSx(2)2}[{λnSx2+λhSx(2)2}{λnSr2+λhSr(2)2}-{λnρ13SxSr+λhρ13(2)Sx(2)Sr(2)}2] (61)

if

g1(1) =[{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}{λnρ13C1C3+λhρ13(2)C1(2)C3(2)}-{λnC32+λhC3(2)2}{λnρ01C0C1+λhρ01(2)C0(2)C1(2)}]{λnC12+λhC1(2)2}{λnC32+λhC3(2)2}-{λnρ13C1C3+λhρ13(2)C1(2)C3(2)}2

and

g1(2) =[{λnρ01C0C1+λhρ01(2)C0(2)C1(2)}{λnρ13C1C3+λhρ13(2)C1(2)C3(2)}-{λnC12+λhC1(2)2}{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}]{λnC12+λhC1(2)2}{λnC32+λhC3(2)2}-{λnρ13C1C3+λhρ13(2)C1(2)C3(2)}2
min ={λnSy2+λhSy(2)2}-{λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}2{λnSp2+λhSp(2)2}
-[{λnSp2+λhSp(2)2}{λnρ03SySr+λhρ03(2)Sy(2)Sr(2)}-{λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}{λnρ23SpSr+λhρ23(2)Sp(2)Sr(2)}]2{λnSp2+λhSp(2)2}[{λnSp2+λhSp(2)2}{λnSr2+λhSr(2)2}-{λnρ23SpSr+λhρ23(2)Sp(2)Sr(2)}2] (62)

if

g2(1) =[{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}{λnρ23C2C3+λhρ23(2)C2(2)C3(2)}-{λnC32+λhC3(2)2}{λnρ02SySp+λhρ02(2)Sy(2)Sp(2)}][{λnC22+λhC2(2)2}{λnC32+λhC3(2)2}-{λnρ23C2C3+λhρ23(2)C2(2)C3(2)}2]

and

g2(2) =[{λnρ02C0C2+λhρ02(2)C0(2)C2(2)}{λnρ23C2C3+λhρ23(2)C2(2)C3(2)}-{λnC22+λhC2(2)2}{λnρ03C0C3+λhρ03(2)C0(2)C3(2)}][{λnC22+λhC2(2)2}{λnC32+λhC3(2)2}-{λnρ23C2C3+λhρ23(2)C2(2)C3(2)}2]
[(𝒯3)]min ={λnSy2+λhSy(2)2}-λnρ012Sy2-λnSy2(ρ03-ρ01ρ13)21-ρ132 (63)

if

g3(1) =X¯SyY¯Sx(ρ03ρ13-ρ011-ρ132)

and

g3(2) =-R¯xSyY¯Sx(ρ03-ρ13ρ011-ρ132)
[(𝒯4)]min ={λnSy2+λhSy(2)2}-λnρ032Sy2-λnSy2(ρ03ρ23-ρ02)21-ρ232 (64)

if

g4(1) =P¯xSyY¯Sp(ρ03ρ23-ρ021-ρ232)

and

g4(2) =-R¯xSyY¯Sp(ρ02-ρ23ρ021-ρ232).

Since the values of gi(1) and gi(2), i=1,2,,4 involve the unknown parameters, therefore their values can be obtained by using the prior data or replacing with their consistent estimate, Reddy (1978), Srivastava and Jhajj (1983) and Koyuncu and Kadilar (2009) have shown that these estimates do not affect the minimum mean square error of the estimators up to the order n-1.

3 Extension of This Problem to the Two-Phase Sampling

In this segment, the problem is extended to the two-phase sampling for estimating the population mean Y¯ when non-response observes only on study variate (y) while mean X¯ and rank R¯x of auxiliary variate are unknown. In this situation, a larger sample of size n at first phase is chosen from the population of size 𝒩 using SRSWOR to estimate unknown population mean X¯ and rank R¯x. Let the estimates are x¯ and r¯x based on complete information available on n units of auxiliary variate thereafter a sample of second phase is drawn from the sample of first phase, which is used to obtain the obligatory information on study variate (y). Now, the three different families of estimators under these situations are as follows

𝒯c1 =f(y¯h,θ5)where θ5=x¯x¯[Khare and Sinha (2002)] (65)
𝒯c2 =g(y¯h,θ6)where θ6=p¯xp¯x[Sinha and Kumar (2014)] (66)

and

𝒯c3 =y¯hF(θ5,θ6)where θ5=x¯x¯andθ6=p¯xp¯x
[Sinha and Kumar (2014)] (67)

Following Khare and Sinha (2002) and Sinha and Kumar (2014), the proposed family of two-phase sampling estimators for estimating the population mean Y¯ using the estimates x¯ and r¯x is given by

𝒯c4=y¯hG(θ5,η3)where θ5=x¯x¯andη3=r¯xr¯x (68)

such that G(1,1)=1 and the function 𝒯c4 satisfy some regularity conditions required for the its expansion by Taylor’s series.

Proceeding from the previous section, some large approximations under SRSWOR are defined to obtain bias and MSE’s of 𝒯ci,i=1,2,,4 as

x¯=X¯(1+1),p¯x=P¯x(1+2)andr¯x=R¯x(1+3)

such that E(εi)=0;i=1,2,3,

E(12)=λC12,E(22)=λC22,E(32)=λC32,E(01)=λC01,E(02)=λC02,E(03)=λC03,E(11)=λC12,E(12)=λC12,E(13)=λC13,E(12)=λC12,E(12)=λC12,E(13)=λC13,E(13)=λC13,E(22)=λC22,E(23)=λC23,E(23)=λC23,E(23)=λC23,E(33)=λC32

where λ=1n-1N.

Now, expanding the function G(θ5,η3) by Taylor’s series about the point (θ5,η3)=(1,1) and using the condition G(1,1)=1, we have

𝒯c4 =Y¯[G(1,1)+(θ5-1)G(1)+(η3-1)G(2)+12(θ5-1)2G(11)(θ5,η3)
+12(η3-1)2G(22)(θ5,η3)+(θ5-1)(η3-1)
  G(11)(θ5,η3)G(12)(θ5,η3)] (69)

where

G(1)=Gθ5|(1,1),G(2)=Gθ3|(1,1),G(11)(θ5,η3)=2Gθ52|(θ5,η3),
G(22)(θ5,η3)=2Gη32|(θ5,η3),G(12)(θ5,η3)=2Gθ5η3|(θ5,η3)

and

η3=η3+ψ3(η3-1),0<ψ3<1.

The Bias and MSE of proposed family of estimators 𝒯c4 up to order n-1 are given as

(𝒯c4) =Y¯(λn-λ)[ρ01C0C1G(1)+ρ03C0C3G(2)+C122G(11)(θ5,η3)
+ρ13C1C3G(11)(θ5,η3)G(12)(θ5,η3)+C322G(22)(θ5,η3)] (70)
(𝒯c4) ={λnSy2+λhSy(2)2}+Y¯2(λn-λ)[C12G(1)2+C32G(2)2
+2ρ01C0C1G(1)+2ρ03C0C3G(2)+2ρ13C1C3G(1)G(2)] (71)

The MSE of proposed family of estimators 𝒯c4 will attain its minimum value when

G(1)=X¯SyY¯Sx(ρ03ρ13-ρ01)1-ρ132andG(2)=R¯xSyY¯Sr(ρ01ρ13-ρ03)1-ρ132

and the minimum MSE of 𝒯c4 is given by

[(𝒯c4)]min ={λnSy2+λhSy(2)2}-(λn-λ)ρ032Sy2
-(λn-λ)Sy2(ρ03ρ13-ρ01)21-ρ132. (72)

4 Calculation of n, n and h Under Fixed Cost 𝒞𝒞(0)

Let the total fixed cost apart from the overhead charge of the survey is 𝒞(0). The cost function is

𝒞(𝒯ci)=𝒞(1)n+n(𝒞(1)+𝒞(2)W1+𝒞(3)W2h) (73)

Here 𝒞(1)-Cost for observing and identifying an auxiliary variate.

𝒞(1)-Cost of sending a questionnaire or visiting the unit at second phase.

𝒞(2)-Cost for processing and collecting information on a unit of study variate y obtained from n1 responding units and 𝒞(3)-Cost for processing and collecting information on a sub-sampled unit of study variate y by interview basis.

The MSE(𝒯ci), i=1,2,3,4 can be expressed as given below:

(𝒯ci)=1nAi+1nBi+hnDi+terms not containingn,nandh. (74)

where Ai = coefficient of 1n terms, Bi = coefficient of 1n terms, Di = coefficient of hn terms.

In order to optimize MSE of the estimators and n, n and h for the fixed cost C𝒞(0), a function can be defined as

χ =1nAi+1nBi+hnDi
+δi[𝒞(1)n+n(𝒞(1)+𝒞(2)W1+𝒞(3)W2h)-𝒞(0)],
i =1,2,3,4 (75)

where δi is a Lagrange’s multipliers.

Differentiate the function χ with respect to n, n and h and equating them to zero, we have

n =Biδi𝒞(1) (76)
n =Ai+hDiδi(𝒞(1)+𝒞(2)W1+𝒞(3)W2h) (77)

and

nh =Diδi𝒞(3)W2 (78)

Solving (77) and (78), we get

hopt=Ai𝒞(3)W2(𝒞(1)+𝒞(2)W1)Di (79)

Substituting the obtained values of n, n and h in (73), we get

δi=Bi𝒞(1)+Ai(𝒞(1)+𝒞(2)W1)+𝒞(3)W2DiC0 (80)

Finally, the optimum value of MSE(𝒯ci) is given by:

[(𝒯ci)]opt =(Bi𝒞(1)+Ai(𝒞(1)+𝒞(2)W1)+𝒞(3)W2Di)2𝒞(0)
-Sy2𝒩 (81)

For th, the expected total cost is considered as

𝒞=n(𝒞(1)+𝒞(2)W1+𝒞(3)W2h) (82)

and

[(y¯h)]opt=(A0(𝒞(1)+𝒞(2)W1)+𝒞(3)W2D0)2𝒞(0)-Sy2𝒩 (83)

5 Calculation of n, n and h Under Specified Variance

Suppose 𝒱0′′ is the fixed variance for the estimator (𝒯ci), i=1,2,3,4 and let

𝒱0′′=1nAi+1nBi+hnDi-Sy2𝒩 (84)

Now consider a function to optimize the average total cost 𝒞(𝒯ci) for the fixed variance of the estimator 𝒯ci

χ={𝒞(1)n+n(𝒞(1)+𝒞(2)W1+𝒞(3)W2h)}+μi(𝒯ci)-𝒱0′′ (85)

where μi(i=1,2,3,4) is a Lagrange’s multiplier.

In order to optimize the cost function, differentiate χ w.r.t. n,nandh and equating them to zero, we have

n =μiBi𝒞(1), (86)
n =μi(Ai+hDi)(𝒞(1)+𝒞(2)W1+𝒞(3)W2h), (87)

and

nh =μiDi𝒞(3)W2. (88)

Solving (87) and (88) we get,

hopt=Ai𝒞(3)W2(𝒞(1)+𝒞(2)W1)Di (89)

Putting the values of n,nandh in (84), we have

μi=Bi𝒞(1)+Ai(𝒞(1)+𝒞(2)W1)+𝒞(3)W2Di𝒱0′′+Sy2𝒩 (90)

The optimum expected total cost incurred in attaining the fixed variance 𝒱0′′ for the families of estimators 𝒯ci is given by:

[𝒞(𝒯ci)]opt =(Bi𝒞(1)+Ai(𝒞(1)+𝒞(2)W1)+𝒞(3)W2Di)2𝒱0′′+Sy2𝒩;
i =1,2,3,4 (91)

For th, the optimum cost for fixed variance is given by

[𝒞(th)]opt=(A0(𝒞(1)+𝒞(2)W1)+𝒞(3)W2D0)2𝒱0′′+Sy2𝒩 (92)

6 Efficiency Comparisons

(i) From Equations (2) and (2), we get

(th)-[(𝒯1)]min
={Cov(y¯h,x¯h)}2{V(x¯h)}
  +[{V(x¯h)}{Cov(y¯h,r¯xh)}-{Cov(y¯h,x¯h)}{Cov(x¯h,r¯xh)}]2{V(x¯h)}[{V(x¯h)}{V(r¯xh)}-{Cov(x¯h,r¯xh)}2]>0

(ii) From Equations (2) and (12), we get

[(tc1(1))]min-[(𝒯1)]min
=[{V(x¯h)}{Cov(y¯h,r¯xh)}-{Cov(y¯h,x¯h)}{Cov(x¯h,r¯xh)}]2{V(x¯h)}[{V(x¯h)}{V(r¯xh)}-{Cov(x¯h,r¯xh)}2]>0

(iii) From Equations (2) and (22), we get

[(tc1(2))]min-[(𝒯1)]min>0

if

{Cov(y¯h,x¯h)}2{V(x¯h)}
  +[{V(x¯h)}{Cov(y¯h,r¯xh)}-{Cov(y¯h,x¯h)}{Cov(x¯h,r¯xh)}]2{V(x¯h)}[{V(x¯h)}{V(r¯xh)}-{Cov(x¯h,r¯xh)}2]
  >{Cov(y¯h,p¯xh)}2{V(p¯xh)}

(iv) From Equations (62) and (2), we get

(th)-[(𝒯2)]min
={Cov(y¯h,p¯xh)}2{V(p¯xh)}+[{V(p¯xh)}{Cov(y¯h,r¯xh)}-{Cov(y¯h,p¯xh)}{Cov(p¯xh,r¯xh)}]2{V(p¯xh)}[{V(p¯xh)}{V(r¯xh)}-{Cov(p¯xh,r¯xh)}2]>0

(v) From Equations (62) and (12), we get

[(tc1(1))]min-[(𝒯2)]min>0if
{Cov(y¯h,p¯xh)}2{V(p¯xh)}
  +[{V(p¯xh)}{Cov(y¯h,r¯xh)}-{Cov(y¯h,p¯xh)}{Cov(p¯xh,r¯xh)}]2{V(p¯xh)}[{V(p¯xh)}{V(r¯xh)}-{Cov(p¯xh,r¯xh)}2]
>{Cov(y¯h,x¯h)}2{V(x¯h)}

(vi) From Equations (62) and (22), we get

[(tc1(2))]min-[(𝒯2)]min
=[{V(p¯xh)}{Cov(y¯h,r¯xh)}-{Cov(y¯h,p¯xh)}{Cov(p¯xh,r¯xh)}]2{V(p¯xh)}[{V(p¯xh)}{V(r¯xh)}-{Cov(p¯xh,r¯xh)}2]>0

(vii) From Equations (63) and (2), we get

(th)-[(𝒯3)]min=λnρ012Sy2+λnSy2(ρ03-ρ01ρ13)21-ρ132>0

(viii) From Equations (63) and (12), we get

[(tc2(1))]min-[(𝒯3)]min=λnSy2(ρ03-ρ01ρ13)21-ρ132>0

(ix) From Equations (63) and (22), we get

[(tc2(2))]min-[(𝒯3)]min>0

if

λnρ012Sy2+λnSy2(ρ03-ρ01ρ13)21-ρ132-λnρ022Sy2>0

(x) From Equations (3) and (2), we get

(th)-[(𝒯4)]min=λnρ022Sy2+λnSy2(ρ03-ρ02ρ23)21-ρ232>0

(xi) From Equations (3) and (12), we get

[(tc2(1))]min-[(𝒯4)]min>0

if

λnρ022Sy2+λnSy2(ρ03-ρ02ρ23)21-ρ232-λnρ012Sy2>0

(xii) From Equations (3) and (22), we get

[(tc2(2))]min-[(𝒯4)]min=λnSy2(ρ03-ρ02ρ23)21-ρ232>0

where

V(x¯h) ={λnSx2+λhSx(2)2},{V(r¯xh)}={λnSr2+λhSr(2)2},
{V(p¯xh)} ={λnSp2+λhSp(2)2},
{Cov(y¯h,x¯h)} ={λnρ01SySx+λhρ01(2)Sy(2)Sx(2)},
{Cov(y¯h,r¯xh)} ={λnρ03SySr+λhρ03(2)Sy(2)Sr(2)},
{Cov(y¯h,p¯xh)} ={λnρ02SySp+λhρ02(2)Sy(2)Sp(2)},
{Cov(x¯h,r¯xh)} ={λnρ13SxSr+λhρ13(2)Sx(2)Sr(2)},
{Cov(p¯xh,r¯xh)} ={λnρ23SpSr+λhρ23(2)Sp(2)Sr(2)}.

For the efficiency comparisons of 𝒯c4 with respect to the relevant estimators, the minimum MSE of the estimators 𝒯ci;i=1,2,3 can be defined as follows

[(𝒯c1)]min ={λnSy2+λhSy(2)2}-(λn-λ)ρ012Sy2

at

f(2)(Y¯,1) =fθ5|(Y¯,1)=-X¯ρ01SySx, (93)
[(𝒯c2)]min ={λnSy2+λhSy(2)2}-(λn-λ)ρ022Sy2 (94)

at

g(2)(Y¯,1) =gθ6|(Y¯,1)=-P¯xρ02SySp, (95)
[(𝒯c3)]min ={λnSy2+λhSy(2)2}-(λn-λ)ρ022Sy2
-(λn-λ)Sy2(ρ02ρ12-ρ01)21-ρ122 (96)

at

F(1) =Fθ5|(1,1)=X¯SyY¯Sx(ρ02ρ12-ρ01)1-ρ122

and

F(2) =Fθ6|(1,1)=P¯xSyY¯Sp(ρ01ρ12-ρ02)1-ρ122

(xiii) From Equations (3) and (2), we get

[(th)]min-[(𝒯c4)]min
  =(λn-λ)ρ032Sy2+(λn-λ)Sy2(ρ03ρ13-ρ01)21-ρ132>0

(xiv) From Equations (3) and (12), we get

[(𝒯c1)]min-[(𝒯c4)]min>0

if

ρ032+(ρ03ρ13-ρ01)21-ρ132-ρ012>0

(xv) From Equations (3) and (94), we get

[(𝒯c2)]min-[(𝒯c4)]min>0

if

ρ032+(ρ03ρ13-ρ01)21-ρ132-ρ022>0

(xvi) From Equations (3) and (12), we get

[(𝒯c3)]min-[(𝒯c4)]min>0

if

  ρ032+(ρ03ρ13-ρ01)21-ρ132-ρ022-(ρ02ρ12-ρ01)21-ρ122>0

7 Empirical Study

An empirical study is carried out using real data sets of 109 village wise population of Baria (Urban) Police station Champua Tahsil, District-Orissa, India taken from Census Handbook of Orissa, 1981 published by Government of India. 25% villages (i.e. 27 villages) from upper part are considered to constitute non-respondents of the population to show the efficiency of suggested families of estimators.

Data 1: The study and auxiliary variates are as follows:

𝒚: Agricultural labours, 𝒙: Occupied houses
𝒑𝒙: Occupied houses more than 70, 𝒓𝒙: Rank of x

The parameters for data 1 are:

N=109 Y¯=41.2385 X¯=88.8624
Px¯=0.5229 R¯x=54.6789
n=30 Sy=46.64779 Sx=58.9933
Sp=0.50178 Sr=31.49570
λn=0.02416 Y¯(2)=51.7037 X¯(2)=108.56
P¯x(2)=0.7037 R¯x(2)=13.8148
W2=0.2477 Sy(2)=38.42857 Sx(2)=68.07029
Sp(2)=0.46532 Sr(2)=7.93259
ρ01=0.456 ρ02=0.426 ρ12=0.706
ρ03=-0.507 ρ13=-0.899
ρ01(2)=0.071 ρ02(2)=0.227 ρ12(2)=0.535
ρ03(2)=-0.273 ρ13(2)=-0.867
ρ23=-0.866 ρ23(2)=-0.797

Data 2: The study and auxiliary variates are as follows:

𝒚: Agricultural labours, 𝒙: Total population
𝒑𝒙: Population greater than 500, 𝒓𝒙: Rank of x

Table 1 Mean square errors and PRE (shown in parenthesis) of 𝒯1 and 𝒯2 with other estimators for data 1 and 2

For Data 1
Estimators h=4 h=3 h=2
th=w1y¯1+w2y¯2(r) 89.1518 (100%)* 76.9587 (100%) 64.7656 (100%)
tr1(1)=y¯hX¯x¯h 99.5688 (89.5%) 80.5597 (95.5%) 61.5506 (105.2%)
tp1(1)=y¯hx¯hX¯ 164.3870 (54.2%) 142.5310 (53.9%) 120.6750 (53.7%)
treg1(1)=y¯h+b1h(X¯-x¯h) 83.0203 (107.4%) 70.0188 (109.9%) 56.4732 (114.7%)
tg1(1)=y¯h(x¯hX¯)α1 83.0203 (107.4%) 70.0188 (109.9%) 56.4732 (114.7%)
(α1)opt=-0.3784 (α1)opt=-0.4479 (α1)opt=-0.561013
tc1(1)=f1(y¯h,θ1); θ1=x¯hX¯ 83.0203 (107.4%) 70.0188 (109.9%) 56.4732 (114.7%)
(f1(2))opt=-15.6038 (f1(2))opt=-18.4725 (f1(2))opt=-23.1353
tr1(2)=y¯hP¯xp¯xh 106.4870 (83.7%) 88.4612 (86.9%) 70.4351 (92%)
tp1(2)=y¯hp¯xhP¯x 214.2020 (41.6%) 185.6030 (41.5%) 157.0040 (41.3%)
treg1(2)=y¯h+b2h(P¯x-p¯xh) 78.9660 (112.9%) 67.1410 (114.6%) 55.1977 (117.3%)
tg1(2)=y¯h(p¯xhP¯x)a2 78.9660 (112.9%) 67.1410 (114.6%) 55.1977 (117.3%)
(a2)opt=-0.3782 (a2)opt=-0.40426 (a2)opt=-0.442094
tc1(2)=f2(y¯h,θ2); θ2=p¯xhP¯x 78.9660 (112.9%) 67.1410 (114.6%) 55.1977 (117.3%)
(f2(2))opt=-15.5985 (f2(2))opt=-16.6712 (f2(2))opt=-18.2313
(f5(2))opt=-0.3177 (f5(2))opt=-0.3153 (f5(2))opt=-0.3021
𝒯𝟏=𝐲¯𝐡𝐠𝟏(θ𝟏,η𝟏); θ1=x¯hX¯, θ1=r¯xhR¯x 73.0153 (121.5%) 61.7334 (124.6%) 50.4251 (128.4%)
(g1(1))opt=0.1367 (g1(1))opt=0.1223 (g1(1))opt=0.0950
(g1(2))opt=1.2131 (g1(2))opt=1.1719 (g1(2))opt=1.1184
𝒯𝟐=𝐲¯𝐡𝐠𝟐(θ𝟐,η𝟏); θ2=p¯xhP¯x, η1=r¯xhR¯x 73.273 0(121.7%) 61.9016 (124.3%) 50.5046 (128.2%)
(g2(1))opt=-0.0609 (g2(1))opt=-0.0459 (g2(1))opt=-0.0167
(g2(2))opt=0.9412 (g2(2))opt=0.9543 (g2(2))opt=0.9865

For Data 2
Estimators h=4 h=3 h=2

th=w1y¯1+w2y¯2(r) 89.1518 (100%) 76.9587 (100%) 64.7656 (110%)
tr1(1)=y¯hX¯x¯h 95.5120 (93.3%) 77.9377 (98.7%) 60.3634 (107.3%)
tp1(1)=y¯hx¯hX¯ 165.5510 (53.9%) 143.0470 (53.8%) 120.5440 (53.7%)
treg1(1)=y¯h+b1h(X¯-x¯h) 81.7425 (109.1%) 69.0576 (111.4%) 55.9540 (115.7%)
tg1(1)=y¯h(x¯hX¯)α1 81.7425 (109.1%) 69.0576 (111.4%) 55.9540 (115.7%)
(α1)opt=-0.4231 (α1)opt=-0.4854 (α1)opt=-0.5857
tc1(1)=f1(y¯h,θ1); θ1=x¯hX¯ 81.7425 (109.1%) 69.0576 (111.4%) 55.9540 (115.7%)
(f1(2))opt=-17.45 (f1(2))opt=-20.0173 (f1(2))opt=-24.1528
tr1(2)=y¯hP¯xp¯xh 159.7890 (55.8%) 129.9020 (59.2%) 100.0150 (64.8%)
tp1(2)=y¯hp¯xhP¯x 310.5190 (28.7%) 264.5480 (29.1%) 218.5760 (29.6%)
treg1(2)=y¯h+b2h(P¯x-p¯xh) 79.4260 (112.2%) 67.5372 (114%) 55.4718 (116.8%)
tg1(2)=y¯h(p¯xhP¯x)a2 79.4260 (112.2%) 67.5372 (114%) 55.4718 (116.8%)
(a2)opt=-0.2581 (a2)opt=-0.2799 (a2)opt=-0.3136
tc1(2)=f2(y¯h,θ2); θ2=p¯xhP¯x 79.4260 (112.2%) 67.5372 (114%) 55.4718 (116.8%)
(f2(2))opt=-10.6435 (f2(2))opt=-11.5423 (f2(2))opt=-12.9305
(f5(2))opt=-0.1922 (f5(2))opt=-0.1964 (f5(2))opt=-0.1988
𝒯𝟏=𝐲¯𝐡𝐠𝟏(θ𝟏,η𝟏); θ1=x¯hX¯, η1=r¯xhR¯x 73.9494 (120.6%) 62.5112 (123.1%) 51.0590 (126.8%)
(g1(1))opt=0.0480 (g1(1))opt=0.0375 (g1(1))opt=0.0189
(g1(2))opt=1.0835 (g1(2))opt=1.0531 (g1(2))opt=1.0146
𝒯𝟐=𝐲¯𝐡𝐠𝟐(θ𝟐,η𝟏); θ2=p¯xhP¯x, η1=r¯xhR¯x 73.7819 (120.8%) 62.4101 (123.3%) 51.0242 (126.9%)
(g2(1))opt=-0.0547 (g2(1))opt=-0.0477 (g2(1))opt=-0.0328
(g2(2))opt=0.8988 (g2(2))opt=0.9052 (g2(2))opt=0.9260

Table 2 Mean square errors and PRE (shown in parenthesis) of 𝒯3 and 𝒯4 with other estimators for data 1 and 2

For Data 1
Estimators h=4 h=3 h=2
th=w1y¯1+w2y¯2(r) 89.1518 (100%)* 76.9587 (100%) 64.7656 (100%)
tr2(1)=y¯hX¯x¯ 79.1207 (112.7%) 66.9277 (115%) 54.7346 (118.2%)
tp2(1)=y¯hx¯¯X¯ 135.3990 (65.8%) 123.2060 (62.5%) 111.0130 (58.3%)
treg2(1)=y¯h+b3h(X¯-x¯) 78.2200 (114%) 66.0270 (116.6%) 53.8339 (120.3%)
tg2(1)=y¯h(x¯X¯)α3 78.2200 (114%) 66.0270 (116.6%) 53.8339 (120.3%)
(α3)opt=-0.7770 (α3)opt=-0.7770 (α3)opt=-0.7770
tc2(1)=f3(y¯h,θ3); θ3=x¯¯X¯ 78.2200 (114%) 66.0270 (116.6%) 53.8339 (120.3%)
(f3(2))opt=-32.0414 (f3(2))opt=-32.0414 (f3(2))opt=-32.0414
tr2(2)=y¯hP¯xp¯x 88.9882 (100.2%) 76.7952 (100.2%) 64.6021 (100.2%)
tp2(2)=y¯hp¯x¯P¯x 164.9850 (54%) 152.7920 (50.4%) 140.5990 (46.1%)
treg2(2)=y¯h+b4h(P¯x-p¯x) 79.6111 (112%) 67.4180 (114.2%) 55.2250 (117.2%)
tg2(2)=y¯h(p¯xP¯x)α4 79.6111 (112%) 67.4180 (114.2%) 55.2250 (117.2%)
(α4)opt=-0.5022 (α4)opt=-0.5022 (α4)opt=-0.5022
tc2(2)=f4(y¯h,θ4); θ4=p¯xP¯x 79.6111 (112%) 67.4180 (114.2%) 55.2250 (117.2%)
(f4(2))opt=-20.7084 (f4(2))opt=-20.7084 (f4(2))opt=-20.7084
𝒯𝟑=𝐲¯𝐡𝐠𝟑(θ𝟑,η𝟐); θ3=x¯X¯, η2=r¯xR¯x 75.6380 (117.9%) 63.4450 (121.3%) 51.2519 (126.4%)
(g3(1))opt=-0.0018 (g3(1))opt=-0.0018 (g3(1))opt= -0.0018
(g3(2))opt=0.9937 (g3(2))opt=0.9937 (g3(2))opt=0.9937
𝒯𝟒=𝐲¯𝐡𝐠𝟒(θ𝟒,η𝟐); θ4=p¯xP¯x, η2=r¯xR¯x 75.6022 (117.9%) 63.4091 (121.4%) 51.2160 (126.4%)
(g4(1))opt=0.0616 (g4(1))opt=0.0616 (g4(1))opt=0.0616
(g4(2))opt=1.0845 (g4(2))opt=1.0845 (g4(2))opt=1.0845

For Data 2
Estimators h=4 h=4 h=4

th=w1y¯1+w2y¯2(r) 89.1518 (100%) 89.1518 (100%) 89.1518 (100%)
tr2(1)=y¯hX¯x¯ 79.3684 (112.3%) 79.3684 (112.3%) 79.3684 (112.3%)
tp2(1)=y¯hx¯¯X¯ 134.6190 (66.2%) 134.6190 (66.2%) 134.6190 (66.2%)
treg2(1)=y¯h+b3h(X¯-x¯) 78.4585 (113.7%) 78.4585 (113.7%) 78.4585 (113.7%)
tg2(1)=y¯h(x¯X¯)α3 78.4585 (113.7%) 78.4585 (113.7%) 78.4585 (113.7%)
(α3)opt=-0.7742 (α3)opt=-0.7742 (α3)opt=-0.7742
tc2(1)=f3(y¯h,θ3); θ3=x¯¯X¯ 78.4585 (113.7%) 78.4585 (113.7%) 78.4585 (113.7%)
(f3(2))opt=-31.9256 (f3(2))opt=-31.9256 (f3(2))opt=-31.9256
tr2(2)=y¯hP¯xp¯x 103.7000 (86%) 103.7000 (86%) 103.7000 (86%)
tp2(2)=y¯hp¯x¯P¯x 212.1900 (42%) 212.1900 (42%) 212.1900 (42%)
treg2(2)=y¯h+b4h(P¯x-p¯x) 78.4585 (113.6%) 78.4585 (113.6%) 78.4585 (113.6%)
tg2(2)=y¯h(p¯xP¯x)α4 78.4585 (113.6%) 78.4585 (113.6%) 78.4585 (113.6%)
(a4)opt=-0.3943 (α4)opt=-0.3943 (α4)opt=-0.3943
tc2(2)=f4(y¯h,θ4); θ4=p¯xP¯x 78.4585 (113.6%) 78.4585 (113.6%) 78.4585 (113.6%)
(f4(2))opt=-16.2587 (f4(2))opt=-16.2587 (f4(2))opt=-16.2587
𝒯𝟑=𝐲¯𝐡𝐠𝟑(θ𝟑,η𝟐); θ3=x¯X¯, η2=r¯xR¯x 76.1586 (117.1%) 76.1586 (117.1%) 76.1586 (117.1%)
(g3(1))opt=-0.0456 (g3(1))opt=-0.0456 (g3(1))opt=-0.0456
(g3(2))opt=0.9308 (g3(2))opt=0.9308 (g3(2))opt=0.9308
𝒯𝟒=𝐲¯𝐡𝐠𝟒(θ𝟒,η𝟐); θ4=p¯xP¯x, η2=r¯xR¯x 75.9604 (117.4%) 75.9604 (117.4%) 75.9604 (117.4%)
(g4(1))opt=-0.1004 (g4(1))opt=-0.1004 (g4(1))opt=-0.1004
(g4(2))opt=0.7881 (g4(2))opt=0.7881 (g4(2))opt=0.7881

Table 3 Mean square errors and PRE (shown in parentheses) of 𝒯ci; i=1,2,3,4 with th for data 1 and 2

For Data 1
Estimators h=4 h=3 h=2
th=w1y¯1+w2y¯2(r) 89.1518 (100%)* 76.9587 (100%) 64.7656 (100%)
𝒯c1=f(y¯h,θ5), θ5=x¯x¯ 79.7253 (112.5%) 67.5322 (114%) 55.3391 (117%)
f(2)=-32.0414 f(2)=-32.0414 f(2)=-32.0414
𝒯c2=g(y¯h,θ6) where θ6=p¯xp¯x 80.9248 (110.2%) 68.7317 (112%) 56.5387 (115%)
g(2)=-20.7084 g(2)=-20.7084 g(2)=-20.7084
𝒯c3=y¯hF(θ5,θ6); θ5=x¯x¯ and θ6=p¯xp¯x 78.7465 (113%) 66.5534 (115.6%) 54.3603 (119.1%)
F(1)=-0.5274 F(1)=-0.5274 F(1)=-0.5274
F(2)=-0.2446 F(2)=-0.2446 F(2)=-0.2446
𝒯𝐜𝟒=𝐲¯𝐡𝐆(θ𝟓,η𝟑); θ5=x¯x¯ and η3=r¯xr¯x 77.4988 (115%) 65.3057 (117.8%) 53.1126 (121.9%)
G(1)=-0.0018 G(1)=-0.0018 G(1)=-0.0018
G(2)=0.99374 G(2)=0.99374 G(2)=0.99374

For Data 2
Estimators h=4 h=3 h=2

th=w1y¯1+w2y¯2(r) 89.1518 (100%) 76.9587 (100%) 64.7656 (100%)
𝒯c1=f(y¯h,θ5), θ5=x¯x¯ 79.9309 (111.5%) 67.7378 (113.6%) 55.5447 (116.6%)
f(2)=-31.9256 f(2)=-31.9256 f(2)=-31.9256
𝒯c2=g(y¯h,θ6) where θ6=p¯xp¯x 80.9248 (110.2%) 68.7317 (120%) 56.5387 (114.6%)
g(2)=-15.3574 g(2)=-15.3574 g(2)=-15.3574
𝒯c3=y¯hF(θ5,θ6); θ5=x¯x¯ and θ6=p¯xp¯x 78.8845 (113%) 66.6915 (115.4%) 54.4984 (118.8%)
F(1)=-0.5142 F(1)=-0.5142 F(1)=-0.5142
F(2)=-0.1875 F(2)=-0.1875 F(2)=-0.1875
𝒯𝐜𝟒=𝐲¯𝐡𝐆(θ𝟓,η𝟑); θ5=x¯x¯ and η3=r¯xr¯x 77.9477 (114.4%) 65.7546 (117%) 53.5616 (120.9%)
G(1)=-0.0456 G(1)=-0.0456 G(1)=-0.0456
G(2)=0.9308 G(2)=0.9308 G(2)=0.9308

Table 4 PRE and expected cost of 𝒯c, i=1,2,3,4 with respect to 𝐭h for data 1 and 2

𝒞(1)=2.9, 𝒞(1)=25, 𝒞(2)=100, 𝒞(3)=250
𝒞(0)=Rs.3700 (fixed)
nopt. hopt. nopt. MSE
Estimators (Approx.) (Approx.) (Approx.) (Approx.) PRE
For Data 1 th 27 1.75 69.8493 100%
𝒯c1 25 1.51 83 63.4754 110%
𝒯c2 25 1.55 77 65.0922 107.3%
𝒯c3 24 1.49 88 62.0982 112.5%
𝒯𝐜𝟒 24 1.45 94 60.2736 115.9%
th 27 1.75 69.8493 100%
𝒯c1 25 1.51 82 63.7582 110%
For Data 2 𝒯c2 25 1.52 82 63.7582 110%
𝒯c3 24 1.49 88 62.2119 112.3%
𝒯𝐜𝟒 24 1.47 92 60.9387 114.6%
𝒞(1)=2.9, 𝒞(1)=25, 𝒞(2)=100, 𝒞(3)=250
𝒱0′′=72 (fixed)
nopt. hopt. nopt. Expected
Estimators (Approx.) (Approx.) (Approx.) cost in Rs.
For Data 1 th 27 1.75 3613.47
𝒯c1 22 1.36 75 3357.02
𝒯c2 23 1.49 71 3422.08
𝒯c3 22 1.27 79 3301.62
𝒯𝐜𝟒 21 1.18 82 3228.21
th 27 1.75 3613.47
𝒯c1 22 1.38 75 3368.40
For Data 2 𝒯c2 22 1.38 75 3368.40
𝒯c3 22 1.28 78 3306.19
𝒯𝐜𝟒 21 1.21 81 3254.97

The parameters for data 2 are:

N=109 Y¯=41.2385 X¯=485.92
Px¯=0.3761 R¯x=54.9633
n=30 Sy=46.64779 Sx=320.20928
Sp=0.48666 Sr=31.60372
λn=0.02416 Y¯(2)=51.7037 X¯(2)=595.15
P¯x(2)=0.4815 R¯x(2)=14.0000
W2=0.2477 Sy(2)=38.42857 Sx(2)=363.23119
Sp(2)=0.50918 Sr(2)=7.93725
ρ01=0.451 ρ02=0.451 ρ12=0.786
ρ03=-0.497 ρ13=-0.897
ρ01(2)=0.126 ρ02(2)=0.365 ρ12(2)=0.751
ρ03(2)=-0.269 ρ13(2)=-0.886
ρ23=-0.839 ρ23(2)=-0.866

Two different data sets are considered to demonstrate the efficiency of the suggested families of estimators, their minimum mean square errors are calculated along with relevant estimators at various levels of sub-sampling fractions. The percentage relative efficiency (PRE) of 𝒯i; i=1,2,,4 with respect to corresponding relevant estimators is calculated by

PRE=()(𝒯)|min.×100.

The minimum mean square errors and PRE of (𝒯1,𝒯2), (𝒯3,𝒯4) and 𝒯ci, i=1,2,3,4 with respect to th for data 1 and 2 are respectively given in Table 13 while the analysis of cost functions are given in Table 4.

8 Conclusion

Tables 1 and 2 exhibit that the suggested families of estimators 𝒯1, 𝒯2, 𝒯3 and 𝒯4 are more efficient than the corresponding estimators at all the sub-sampling fractions for both the data 1 and 2. The mean square errors of the suggested families of estimators are decreasing as the sub-sampling fraction increases for both the data 1 and 2. Similarly, in the case of two-phase sampling estimation, 𝒯c4 shows efficient results compared to the existing estimators th, 𝒯c1, 𝒯c2, 𝒯c3. From Table 4, it has been observed that the estimator 𝒯c4 is more efficient than the existing estimators th, 𝒯c1, 𝒯c2, 𝒯c3 for fixed cost while expected cost incurred for 𝒯c4 is less compared to expected cost incurred for existing estimators th, 𝒯c1, 𝒯c2, 𝒯c3. Therefore, the suggested families of estimators can be recommended on the account of theoretical and empirical studies discussed in the text.

References

[1] Cochran, W.G. (1940). The estimation of the yields of the cereal experiments by sampling for the ratio of grain to total produce, The Journal of Agricultural Science, 30(2), pp. 262–275.

[2] Hansen, M.H. and Hurwitz, W.N. (1946). The problem of non-response in sample surveys, Journal of the American Statistical Association, 41, pp. 517–529.

[3] Khare, B.B. (2003). Use of auxiliary information in sample surveys up to 2000- A review, Proc. Biotechnology and Science, India: M/S Centre of Bio-Mathematical Studies, pp. 76–87.

[4] Khare, B.B. and Srivastava, S. (1993). Estimation of population mean using auxiliary character in presence of non-response, National Academy Science Letters, 16, pp. 111–114.

[5] Khare, B.B. and Srivastava, S. (1995). Study of conventional and alternative two phase sampling ratio, product and regression estimators in presence of non-response, Proceedings of the National Academy of Sciences, 65(A) II, pp. 195–203.

[6] Khare, B.B. and Srivastava, S. (1997). Transformed ratio type estimators for the population mean in the presence of non-response, Communications in Statistics – Theory and Methods, 26(7), pp. 1779–1791.

[7] Khare, B.B. and Srivastava, S. (2000). Generalized estimators for population mean in presence of no response, International Journal of Mathematics and Statistics, 9, pp. 75–87.

[8] Khare, B.B. and Sinha, R.R. (2002). Estimation of the ratio of two population means using auxiliary character with unknown population mean in presence of no-response, Progress of Mathematics, B. H. U., 36, pp. 337–348.

[9] Khare, B.B. and Sinha, R.R. (2009). On class of estimators for population mean using multi-auxiliary characters in the presence of non-response, Statistics in Transition New Series, 10(1), pp. 3–14.

[10] Koyuncu, N. and Kadilar, C. (2009). Efficient Estimators for the Population mean, Hacettepe Journal of Mathematics and Statistics, 38(2), pp. 217–225.

[11] Rao, P.S.R.S. (1986). Ratio estimation with sub-sampling the non-respondents, Survey Methodology, 12, pp. 217–230.

[12] Rao, P.S.R.S. (1990). Regression estimators with sub-sampling of non-respondents, In-Data Quality Control, Theory and Pragmatics, (Eds.) Gunar E. Liepins and V.R.R. Uppuluri, Marcel Dekker, New York, pp. 191–208.

[13] Reddy, V.N. (1978). A study on the use of prior knowledge on certain population parameters in estimation, Sankhya C, 40, pp. 29–37.

[14] Singh, H.P. and Kumar, S. (2009). A general class of estimators of the population mean in survey sampling using auxiliary information with sub-sampling the non-respondents, The Korean Journal of Applied Statistics, 22(2), pp. 387–402.

[15] Sinha, R.R. and Kumar, V. (2011). Generalized Estimators for Population Mean with Sub Sampling the Non-Respondents, Aligarh Journal of Statistics, 31, pp. 53–62.

[16] Sinha, R.R. and Kumar, V. (2013). Improved Estimators for Population Mean using Attributes and Auxiliary Characters under Incomplete Information, International Journal of Mathematics and Statistics, 14, pp. 43–54.

[17] Sinha, R.R. and Kumar, V. (2014). Improved classes of estimators for population mean using information on auxiliary character under double sampling the non-respondents, National Academy Science Letters, 37(1), pp. 71–79.

[18] Srivastava, S.K. and Jhajj H.S. (1983). A class of estimators of population mean using multi-auxiliary information, Calcutta Statistical Association Bulletin, 32, pp. 47–56.

[19] Tripathi, T.P., Das, A.K. and Khare, B.B. (1994). Use of auxiliary information in sample surveys – A review, Aligarh Journal of Statistics, 14, pp. 79–134.

Biographies

images

R. R. Sinha is an Assistant Professor in the Department of Mathematics, Dr. B. R. Ambedkar National Institute of Technology, Jalandhar, India and obtained his Ph. D. Degree in “Sampling Techniques” from the Department of Statistics, Banaras Hindu University, Varanasi, India in 2001. He has guided one Ph. D. and three M. Phil. candidates. He has life membership of Indian Statistical Association and International Indian Statistical Association. Dr. Sinha has published more than 25 research papers in international/national journals and conferences and presented more than 22 research papers in international/national conferences. His area of specialization is Sampling Theory, Data Analysis and Inference. ORCID identifier number of Dr. R. R. Sinha is 0000-0001-6386-1973.

images

Bharti is a Ph. D. student at Dr. B. R. Ambedkar National Institute of Technology, Jalandhar since 2018. She has done her B.Sc. in Computer Science in 2015 from DAV College, Jalandhar (GNDU) and completed her M.Sc. in Mathematics in 2017 from DAV College, Jalandhar (GNDU). She has one year of teaching experience. Bharti is pursuing her doctoral degree in Mathematics at Dr. B. R. Ambedkar, National Institute of Technology, Jalandhar. Her doctoral degree is on Estimation of Parameters using Auxiliary Character under Complete and Incomplete Information.

Abstract

1 Introduction

2 Proposed Estimators

3 Extension of This Problem to the Two-Phase Sampling

4 Calculation of n, n and h Under Fixed Cost 𝒞𝒞(0)

5 Calculation of n, n and h Under Specified Variance

6 Efficiency Comparisons

7 Empirical Study

8 Conclusion

References

Biographies