Family of Estimators for Estimating Population Median using Auxiliary Information in Survey Sampling

Prayas Sharma*, Anupam Lata, Subhash Kumar Yadav and Muhammad Noor-ul-Amin

Department of Statistics, Babasaheb Bhimrao Ambedkar University, Lucknow, India
COMSATS University Islamabad-Lahore Campus, Pakistan
E-mail: prayassharma02@gmail.com; anupam.lata010@gmail.com; drskystats@gmail.com; nooramin.stats@gmail.com
*Corresponding Author

Received 07 June 2025; Accepted 08 August 2025

Abstract

In this article, we propose a new family of estimators for estimating the unknown population median of a study variable by utilizing auxiliary information under simple random sampling. The choice of the median, as opposed to the mean, is particularly advantageous in the presence of outliers or skewed distributions, where the mean may be unduly influenced. We derive the expressions for the bias and mean square error (MSE) of the proposed class of estimators up to the first order of approximation. Furthermore, we examine several notable subclasses within the proposed family and calculate their respective MSEs. To assess the efficiency and robustness of the proposed estimators, an empirical study is conducted using real-world data and benchmarked against existing estimators from the literature. The results of this empirical analysis demonstrate that the proposed estimators achieve lower MSEs, underscoring their practical relevance and effectiveness in survey sampling applications.

Keywords: Auxiliary variable, bias, mean square error, median, simple random sampling, study variable.

1 Introduction

The median is often preferred over the mean when dealing with non-normal distributions such as income, production and consumption, where outliers are commonly present. In such cases, the mean becomes less reliable due to its sensitivity to extreme values. To enhance the precision of estimators, auxiliary information can be effectively utilized in statistical analysis. Several methods – such as ratio, product, and regression techniques – have been developed to incorporate auxiliary information for estimating population parameters. Extensive research has been conducted on estimating parameters like the population mean and median using such information. [4] was among the first to introduce the concept of median estimation in survey sampling. Later, [8] was the first to address the problem of estimating the median using auxiliary information in survey sampling. Since then, a wide variety of estimators have been proposed under different sampling schemes to estimate the population median. [23] proposed an estimator for population median estimation and demonstrated its superiority over existing methods. [20] introduced a generalized estimator using auxiliary information and derived its bias and mean square error (MSE), showing that their estimator performed better in comparison. [16] suggested an estimator for population median estimation under two sampling (SRSWOR and stratified sampling). The estimator they proposed demonstrated superior performance compared to the available ones in the numerical analysis. In a similar way, [17] had given an estimator of difference type for evaluating population median by using sampling scheme (SRSWOR and Two-phase Sampling). [1] had put forward a new estimator and showed its potential through empirical and stimulation study. [22] proposed a ratio estimator based on the known quartiles of the auxiliary variable, and demonstrated its improved performance through both simulation and empirical analyses. [23] proposed an estimator aimed at estimating the population median, wherein the Asymptotic Optimum Estimator (AOE) was examined for each estimator class. [19] introduced a novel estimator for estimating the population mean within a stratified sampling framework and examined several of its subset forms. Empirical evidence substantiated the estimator’s efficiency. Furthermore, when the auxiliary variable x is negatively correlated with the study variable y, the product estimator is generally more appropriate than the ratio estimator. [10] had given product method of estimation for estimating population mean. In a similar way [12] and [13] had given alternative estimator and showed that their estimator had minimum MSE. [14] developed a more effective estimator than those of [13] and [25], suitable for cases where the correlation between y and x changes. Important references related to our work include [7, 21, 27, 18, 5] and [3].

The objective of this research is to construct more efficient estimators for median estimation under the influence of extreme observations, making use of auxiliary data. The uniqueness of this study lies in the following points –

1. The flexibility of the proposed estimator allows for the formulation of multiple estimator forms.

2. Rigorous empirical analysis has been undertaken to examine its behavior in scenarios involving significant skewness.

2 Terminologies

Let the population consist of N distinguishable units. Let yi be the study variable, and xi the primary auxiliary variable. Suppose zi is a transformed auxiliary variable, with the relation zi=f(xi), where f is a known function. A sample of size n is selected from the target population using the SRSWOR scheme, ensuring that each unit has an equal chance of selection without replacement. Let M^y be the sample medians corresponding to the population median My with density function fy(My) and (M^x,M^z) be the sample median corresponding to the population medians (Mx,Mz) with density functions {fx(Mx),fz(Mz)}. Let ρyx=ρ(M^y,M^x)=4P11(y,x)-1, ρyz=ρ(M^y,M^z)=4P11(y,z)-1 and ρxz=ρ(M^x,M^z)=4P11(x,z)-1, be the population correlation coefficients between sample medians represented by their respective subscripts, where P11(y,x)=P(yMyxMx),P11(y,z)=P(yMyzMz), and P11(x,z)=P(xMxzMz). To determine the attributes of estimators, we define the relative error terms as follows: Let e0=(M^y-My)My-1, e1=(M^x-Mx)Mx-1 such that E(ei)=0(i=0,1), E(e02)=λCMy2, E(e12)=λCMx2, E(e0e1)=λCMyx where CMyx=ρyxCMyCMx, CMy=1/[Myfy(My)], CMx=1/[Mxfx(Mx)], CMz=1/[Mzfz(Mz)], and λ=0.25(1n-1N).

3 Existing Estimators

In this section, we discuss some prominent estimators commonly employed for the estimation of the population median –

The estimator provided by [4] is given by:

M^0=M^y. (1)

M^0’s variance is given by:

V(M^0)=λMy2CMy2. (2)

The estimation methods outlined in this section are based predominantly on the use of a single auxiliary variable. The ratio estimator given by [8] as follows:

M^R=M^y(MxM^x). (3)

In survey sampling, the efficiency of an estimator often depends on the nature of the relationship between the study variable y and the auxiliary variable x. When y and x are linearly related, strongly correlated, and the regression line passes through the origin, ratio estimators perform optimally. Conversely, if the regression line does not passes through the origin, then it would be good to use the regression estimator. Up to the first order of approximation, the bias and MSE of M^R are given by:

Bias(M^R)λ{MyCMx2-MyCMyx},
MSE(M^R)λ{My2CMy2+My2CMx2-2My2CMyx}. (4)

The ratio estimator M^R exhibits greater efficiency compared to M^0 when ρyx exceeds 12CMxCMy.

[2] presented an estimator of the exponential ratio type, which is defined by

M^E=M^yexp(Mx-M^xMx+M^x). (5)

This approach is suitable when the regression line of y on x is linear and passes through the origin. The bias and MSE, up to the first-order approximation, are given by:

Bias(M^E)Myλ{38CMx2-12CMyx}

and

MSE(M^E)My2λ{CMy2+14CMx2-CMyx}. (6)

The estimator M^E based on the exponential ratio method outperforms estimators M^0 and M^R under the condition that ρyx>14CMxCMy and ρyx<34CMxCMy.

The idea of difference estimator was first introduced by [6] and is given by:

M^D0=M^y+d0(Mx-M^x), (7)

where d0 is constant.

The minimum MSE of (M^D0)min is as follows:

MSE(M^D0)minCMy2λ(My2-My2ρyx2). (8)

The value of d0 that minimizes the mean squared error is given by d0(opt)=MyρyxCMyMxCMx.

Since ρyx2 is always greater than 0, the minimum MSE of the estimator for M^D0 is consistently smaller than the MSE corresponding to M^0, M^R, M^E.

[5, 11] and [15] had given a few more estimators, which are as follows:

M^D1=d1My^+d2(Mx-M^x), (9)

here d1 and d2 are constant.

The minimum MSE of M^D1, at optimum value of their constant is given by:

MSE(M^D1)minMy2{1-B0A0B0-C02+B0}, (10)

The optimum values of d1 and d2 are as follows:

d1(opt)=B0A0B0-C02+B0,d2(opt)=MyMxC0A0B0-C02+B0,

where A0=λCMy2, B0=λCMx2, C0=λCMyx.

M^D2={d3M^y+(d4Mx-d4M^x)}(MxM^x-1), (11)

here, d3 and d4 are constant.

The minimum MSE of M^D2, at optimum value of their constant is given by:

MSE(M^D2)minMy2{1-A1B12+B1C12-2B1C1D1+ 2B1C1+B12-2B1D1+B1A1B1-D12+B1}, (12)

The optimum values of d3 and d4 are as follows:

d3(opt)=B1(C1-D1+1)A1B1-D12+B1,d4(opt)=MyMx(A1B1-C1D1+B1-D1)(A1B1-D12+B1),

where, A1=λ(CMy2+3CMx2-4CMyx), B1=λCMx2, C1=λ(CMx2-CMyx), D1=λ(2CMx2-CMyx).

M^D3={d5M^y+(d6Mx-d6M^x)}exp(Mx-M^xMx+M^x), (13)

here, d5 and d6 are constant.

The minimum MSE of M^D3, at optimum value of their constant is given by:

MSE(M^D3)minMy2{1-A2D22+B2C22-2C2D2E2+2B2C2+D22-2D2E2+B2A2B2-E22+B2}, (14)

The optimum values of d5 and d6 are as follows:

d5(opt) =(B2C2-D2E2+B2)(A2B2-E22+B2),
d6(opt) =MyMx(A2D2-C2E2+D2-E2)(A2B2-E22+B2),

where, A2=λ(CMy2+CMx2-2CMyx), B2=λCMx2, C2=λ(3CMx28-12CMyx), D2=λCMx2/2, E2=λ(CMx2-CMyx).

M^D4={d7M^y+(d8Mx-d8M^x)}exp(MxM^x-1-1) (15)

here, d7 and d8 are constant.

The minimum MSE of M^D4, at optimum value of their constant is given by:

MSE(M^D4)minMy2{1-A3B32+B3C32-2B3C3D3+B32+2B3C3-2B3D3-B3A3B3-D32+B3} (16)

The optimum values of d7 and d8 are as follows:

d7(opt)=B3(C3-D3+1)A3B3-D32+B3,d8(opt)=MyMx(A3B3-C3D3+B3-D3)(A3B3-D32+B3),

where, A3=λ(CMy2+4CMx2-4CMyx), B3=λCMx2, C3=λ(3CMx22-CMyx), D3=λ(2CMx2-CMyx).

[16] introduced the estimator for population median is as follows:

M^ppG =[m1M^y+(m2Mx-m2M^x)]
×[(aMx+baM^x+b)α1exp{α2(aMx-aM^x){a(γ-1)Mx+aM^x}+2b}] (17)

Here, unknown population parameters are a and b and scalar quantities are α1,α2 and γ, which can take values such as 0 and 1, where m1 and m2 are constants.

Bias(M^ppG)(m1-1)My+m2Myλ{32CMx2-CMy}+m2MxλCMx2

At optimum value, the MSE of M^ppG is given by:

MSE(M^ppG)min My21+λCMy2(1-ρyx2)
×[λCMy2(1-ρyx2)-λ2CMx44-λ2CMy2CMx2(1-ρyx2)] (18)

The m1 and m2’s optimum values are stipulated as:

m1(opt) =1-0.5λCMx21+λCMy2(1-ρyx2),
m2(opt) =MyMx[1+m1(opt){ρyxCMyCMx-2}]

[9] stipulated median estimator is given by:

M^sm =M^y{m12(MxM^x)+m13(M^xMx)}
×[αexp(Mx-M^xMx+M^x)+(1-α)exp(M^x-MxMx+M^x)], (19)

where m12 and m13 are constant.

Bias(M^sm) =My[(m12+m13-1)
+m12λ{(38+3α2)CMx2-(α+12)CMyx}
+m13λ{(32-α)CMyx+(38-α2)CMx2}] (20)

At optimum value, the MSE of M^sm is evaluated as:

MSE(M^sm)min=My2[1-A1A52-2A3A4A5+A2A42A1A2-A32] (21)

m12 and m13’s optimum value are given by:

m12(opt)=[A2A4-A3A5A1A2-A32],m13(opt)=[A1A5-A3A4A1A2-A32],

where

A1 =1+λ{CMy2+(α2+4α+1)CMx2-2(2α+1)CMyx},
A2 =1+λ{CMy2+(α2-4α+3)CMx2+2(3-2α)CMyx},
A3 =1+λ{CMy2+2(1-2α)CMyx+α2CMx2},
A4 =1+λ{(38+3α2)CMx2-(α+0.5)CMyx}

and

A5=1+λ{(32-α)CMyx+(38-α2)CMx2}.

The estimator proposed by [1] is given by:

M^y* =[α1M^y{12(MxM^x+M^xMx)}
+(α2Mx-α2M^x)]exp(Mx-M^xMx+M^x),
Bias(M^y*) =[α1(My+7λMyCMx28-0.5λMyCMyx)
+α2Mx(λCMx22)-My] (22)

where α1 andα2 are constant. At optimum value, the MSE of M^y* is stipulated as:

MSE(M^y*)min
=My2-My2[64+λCMx2(64+λ{25CMx2-16CMy2(ρyx2-1)})]64[1+λCMx2+λCMy2{1-ρyx2}] (23)

α1 and α2’s optimum values are given by:

α1(opt) =8+3λCMx28(1+λCMx2+CMy2(λ-λρyx2)),
α2(opt) =My(λCMx3+8CMyρyx+3λCMx2CMyρyx-4CMx(1+λCMy2(ρyx2-1)))8CMxMx(1+λCMx2+CMy2(λ-λρyx2)).

4 The Suggested Family of Estimators

In simple random sampling, we proposed a family of estimators for the population median of the study variable Y as:

M^tm =ψ[M^y+J1M^y(MzαM^z+(1-α)Mz)+J2(Mz-M^z)]
×(MzαM^z+(1-α)Mz)g (24)

where Mz=aMx+b and M^z=aM^x+b, Let a0, and b be either a real number or a function of known parameters of the auxiliary variable X, such as the median Q2, quartile deviation, standard deviation Sx, coefficient of variation Cx, skewness β1(x), kurtosis β2(x), or coefficient of correlation ρyx. Here, (J1,J2,ψ,a,b, and g) are constants, while J1 & J2 are determined so as to minimize the mean squared error (MSE) of the estimator M^tm.

Now expressing the generalized class of estimators (24) in terms of e’s, we get

M^tm =ψ{My(1+e0)+J1My(1+e0)(1+αβe1)-1-J2aMxe1}
  {1+αβe1}-g (25)

where β=aMxaMx+b, expanding the Equation (25), and neglecting terms of e’s having power greater than two and subtracting My on both sides we obtain –

M^tm-My =My[ψ{1-αβge1+0.5g(g+1)α2β2e12+e0-αβge1e0}
+ψJ1{1-(g+1)αβe1+(0.5g(g+1)+g+1)α2β2e12
+e0-(g+1)αβe1e0}-1-ψJ2aMx{e1-αβge12}], (26)

By taking the expectation of both sides of Equation (25), the first-order approximation of the bias of M^tm is derived as:

Bias(M^tm) =My[ψ{1+0.5g(g+1)α2β2λCMx2-αβgλCMyx}
+ψJ1{1+0.5g(g+1)α2β2λCMx2+(g+1)α2β2λCMx2
-(g+1)αβλCMyx}-1+ψJ2aMxαβgλCMx2]. (27)

To derive the MSE of M^tm, we square both sides of Equation (26) and take expectation, neglecting higher-order terms of e’s (i.e., terms with powers greater than two).

The MSE of M^tm to the first order of approximation is given by:

MSE(M^tm) =ψ2{J12A-2J1J2B+2J1C+J22D-2J2E+F}
-2ψ{J1G+J2H+I}+My2 (28)

where,

A =My2-4α(g+1)Λk3+k2+{(g+1)2Λ2α2k1+2(g+1)Λ2α2k1
+g(g+1)Λ2α2k1},
B =ak3-(2gaαΛk1+aαΛk1),
C =My2-2αΛk3(2g+1)+k2+α2Λ2k1(3g+1)(g+1),
D =a2k1,
E =ak3-2aαgΛk1,
F =My2+k2+(g2α2Λ2k1+g(g+1)α2Λ2k1)-4αgΛk3,
G =My2-α(g+1)Λk3+α2Λ2k1{(g+1)+0.5g(g+1)},
H =aαgΛk1,
I =My2-αgΛk3+0.5g(g+1)α2Λ2k1.

Now differentiating MSE of M^tm with respect to J1 & J2 we obtain the values of (J1)opt and (J2)opt as follows:

(J1)opt=(DG+BH)-ψ(CD-BE)ψ(AD-B2)

and

(J2)opt=(BG+AH)-ψ(BC-AE)ψ(AD-B2)

The resulting MSE (M^tm) at (J1)opt & (J2)opt is stipulated by:

MSE(M^tm)={ψ2L1-2ψL2+L3+(AD-B2)2My2}(AD-B2)2 (29)

Now we differentiate Equation (29) with respect to ψ and we obtain the value of (ψ)opt as follows:

(ψ)opt=L2L1

Now we obtain minimum MSE of M^tm by substituting value of ψ in Equation (29)

MSE(M^tm)min={L3-L22L1+(AD-B2)2My2}(AD-B2)2 (30)

where,

L1 =(CD-BE)2A-2(CD-BE)(BC-AE)B
-2(CD-BE)(AD-B2)C+(BC-AE)2D
+2(BC-AE)(AD-B2)E+(AD-B2)2F,
L2 =(DG+BH)(CD-BE)A-(CD-BE)(BG+AH)B
-(DG+HB)(BC-AE)B-(DG+BH)(AD-B2)C
+(BG+AH)(BC-AE)D+(BG+AH)(AD-B2)E
-(CD-BE)(AD-B2)G-(BC-AE)(AD-B2)H
+(AD-B2)2I
L3 =(DG+BH)2A-2(DG+BH)(BG+AH)B+(BG+AH)2D
-2(DG+BH)(AD-B2)G-2(BG+AH)(AD-B2)H

Now we obtain subsets from the generalized estimator M^tm by using suitable values of (J1,J2,ψ,a,b, and g) are as follows:

Tables 1, 2, 3 and 4 display several members of the proposed estimator family, obtained by assigning different values to the constants. From estimator M^tm1 to estimator M^tm36, the value of ψ is taken as 1 whereas the values of J1,J2,α and g keep changing and taking values 0 or 1 in Tables 1 and 2. In Table 4, J1 and J2 are constant whereas α and g keep changing and taking values 0 or 1. Table 1 consists of some generated estimator that are existing in our literature like M^tm1, M^tm3 and M^tm4 are same and given by [4]. Estimator M^tm2 was given by [8]. In Table 2, estimator M^tm17, M^tm19, and M^tm20 were given by [6].

Table 1 Some members generated from the generalized estimator M^tm

Estimator J1 J2 ψ α g MSE
M^tm1=M^y [4] 0 0 1 0 1 MSE(M^tm1)=k2
M^tm2=M^y(MxM^x)[8] 0 0 1 1 1 MSE(M^tm2)=k2+Λ2k1-2Λk3
M^tm3=M^y[4] 0 0 1 1 0 MSE(M^tm3)=k2
M^tm4=M^y[4] 0 0 1 0 0 MSE(M^tm4)=k2
M^tm5=2M^y+(Mz-Mz^) 1 1 1 0 1 MSE(M^tm5)=4k2+My2+a2k1-4ak3
M^tm6=[M^y+M^y(MxM^x)+(Mz-Mz^)](MxM^x) 1 1 1 1 1 MSE(M^tm6)=4k2+My2+a2k1-4ak3+21Λ2k1+8aΛk1-18Λk3
M^tm7=[M^y+M^y(MzM^z)+(Mz-M^z)] 1 1 1 1 0 MSE(M^tm7)=4k2+My2+a2k1-4ak3+3Λ2k1+2aΛk1-6Λk3
M^tm8=2M^y+(Mz-Mz^) 1 1 1 0 0 MSE(M^tm8)=4k2+My2+a2k1-4ak3
M^tm9=2M^y 1 0 1 0 1 MSE(M^tm9)=4k2+My2
M^tm10=[M^y+M^y(MzM^z)](MzM^z) 1 0 1 1 1 MSE(M^tm10)=4k2+My2+21Λ2k1-18Λk3
M^tm11=M^y+M^y(MzM^z) 1 0 1 1 0 MSE(M^tm11)=4k2+My2+3Λ2k1-6Λk3
M^tm12=2M^y 1 0 1 0 0 MSE(M^tm12)=4k2+My2
M^tm13=M^y+(Mz-Mz^) 0 1 1 0 1 MSE(M^tm13)=k2+a2k1-2ak3
M^tm14=[M^y+(Mz-M^z)](MzM^z) 0 1 1 1 1 MSE(M^tm14)=k2+a2k1-2Λk3+Λ2k1+2aΛk1-2ak3
M^tm15=M^y+(Mz-M^z) 0 1 1 1 0 MSE(M^tm15)=k2+a2k1-2ak3
M^tm16=M^y+(Mz-M^z) 0 1 1 0 0 MSE(M^tm16)=k2+a2k1-2ak3

Table 2 Some members generated from the generalized estimator M^tm

Estimator J1 J2 ψ α g MSE
M^tm17=M^y+J2(Mz-Mz^) [6] 0 J2 1 0 1 MSE(M^tm17)=k2+J22a2k1-2aJ2k3
M^tm18=[M^y+J2(Mz-Mz^)](MzM^z) 0 J2 1 1 1 MSE(M^tm18)=k2+J22a2k1-2aJ2k3+Λ2k1-2Λk3+2J2aΛk1
M^tm19=M^y+J2(Mz-M^z) [6] 0 J2 1 1 0 MSE(M^tm19)=k2+J22a2k1-2aJ2k3
M^tm20=M^y+J2(Mz-M^z) [6] 0 J2 1 0 0 MSE(M^tm20)=k2+J22a2k1-2aJ2k3
M^tm21=M^y+J1M^y J1 0 1 0 1 MSE(M^tm21)=k2+J12k2+J12My2+2J1k2
M^tm22=[M^y+J1M^y(MzM^z)](MzM^z) J1 0 1 1 1 MSE(M^tm22)=k2+J12k2+J12My2+2J1k2-8ΛJ12k3+10J12Λ2k1-8ΛJ1k3-2Λk3+10J1Λ2k1+Λ2k1
M^tm23=M^y+J1M^y(MzM^z) J1 0 1 1 0 MSE(M^tm23)=k2+J12k2+J12My2+2J1k2-4ΛJ12k3+3J12Λ2k1-2ΛJ1k3
M^tm24=M^y+J1M^y J1 0 1 0 0 MSE(M^tm24)=k2+J12k2+J12My2+2J1k2
M^tm25=2M^y+J2(Mz-M^z) 1 J2 1 0 1 MSE(M^tm25)=My2+4k2+J22a2k1-4aJ2k3
M^tm26=[M^y+M^y(MzM^z)+J2(Mz-M^z)](MzM^z) 1 J2 1 1 1 MSE(M^tm26)=My2+4k2+J22a2k1-4aJ2k3-18Λk3+21Λ2k1+8J2aΛk1
M^tm27=[M^y+M^y(MzM^z)+J2(Mz-M^z)] 1 J2 1 1 0 MSE(M^tm27)=My2+4k2+J22a2k1-4aJ2k3-6Λk3+3Λ2k1+2J2aΛk1
M^tm28=2M^y+J2(Mz-M^z) 1 J2 1 0 0 MSE(M^tm28)=My2+4k2+J22a2k1-4aJ2k3

Table 3 Some members generated from the generalized estimator M^tm

Estimator J1 J2 ψ α g
M^tm29=M^y+J1M^y+(Mz-M^z) J1 1 1 0 1 MSE(M^tm29)=k2+J12k2+J12My2+2J1k2-2aJ1k3+a2k1-2ak3
M^tm30=[M^y+J1M^y(MzM^z)+(Mz-M^z)](MzM^z) J1 1 1 1 1 MSE(M^tm30)=J12My2-8ΛJ12k3+J12k2+a2k1+10Λ2J12k1+6J1aΛk1-2J1ak3-5Λ2k1+k2-8J1Λk3+16J1Λ2k1+2J1k2+2aΛk1-2ak3-2Λk3
M^tm31=M^y+J1M^y(MzM^z)+(Mz-M^z) J1 1 1 1 0 MSE(M^tm31)=k2+J12k2+J12My2+2J1k2-2aJ1k3+a2k1-2ak3-4ΛJ12k3+3J12Λ2k1+2J1aΛk1-2J1Λk3
M^tm32=[M^y+J1M^y+(Mz-M^z)] J1 1 1 0 0 MSE(M^tm32)=k2+J12k2+J12My2+2J1k2-2aJ1k3+a2k1-2ak3
M^tm33=[(1+J1)M^y+J2(Mz-M^z)] J1 J2 1 0 0 MSE(M^tm33)=k2+J12k2+J12My2+2J1k2-2aJ2k3+a2J22k1-2aJ1J2k3

Table 4 Some members generated from the generalized estimator M^tm

Estimator J1 J2 ψ α g
M^tm34=[M^y+J1M^y(MzM^z)+J2(Mz-M^z)] J1 J2 1 1 0 MSE(M^tm34)=J12My2-4J12Λk3+J12k2+3J12Λ2k1-2J1J2ak3+2J1J2aΛk1-2J1Λk3+2J1k2+J22a2k1-2J2ak3+k2
M^tm35=[(1+J1)M^y+J2(Mz-M^z)] J1 J2 1 0 1 MSE(M^tm35)=k2+J12k2+J12My2+2J1k2-2aJ2k3+a2J22k1-2aJ1J2k3
M^tm36=[M^y+J1M^y(MzM^z)+J2(Mz-M^z)](MzM^z) J1 J2 1 1 1 MSE(M^tm36)=J12My2-8J12Λk3+J12k2+10J12Λ2k1-2J1J2ak3+6J1J2aΛk1-8J1Λk3+2J1k2+10J1Λ2k1+J22a2k1-2J2ak3+k2+2J2aΛk1+Λ2k1-2Λk3

5 Efficiency Comparisons

This section provides a comparative analysis of the Mean Squared Error (MSE) of the proposed estimator relative to the existing estimators discussed in the paper.

From Equations (2) and (18) we get,

Var(M^0)-MSE(M^ppG)min =4λk2(1-ρyx2)(CMy2+CMx2)
+4k2ρyx2+λ2My2CMx40 (31)

From Equations (4) and (18) we get,

MSE(M^R)-MSE(M^ppG)min =λMy2(CMy2+CMx2-2CMyx)
-My21+λCMy2(1-ρyx2)[λCMy2(1-ρyx2)
-14λ2CMx4-λ2CMy2CMx2(1-ρyx2)]0

from above condition we write it as:

λMy2(CMx2+λCMx44)+k2ρyx2
  >2λMy2CMyx-λk2(1-ρyx2)(2CMx2+CMy2-2CMyx) (32)

From Equations (6) and (18) we get,

MSE(M^E)-MSE(M^ppG)min =My2λ{CMy2+14CMx2-CMyx}
-My21+λCMy2(1-ρyx2)[λCMy2(1-ρyx2)
-14λ2CMx4-λ2CMy2CMx2(1-ρyx2)]0

from above condition we write it as:

14λMy2(CMx2+λCMx4)+k2ρyx2
  >λMy2CMyx-λk2(1-ρyx2)(5CMx24+CMy2-CMyx) (33)

From Equations (8) and (18) we get,

MSE(M^D0)min-MSE(M^ppG)min
  =My2CMy2λ(1-ρyx2)-My21+λCMy2(1-ρyx2)
  ×[λCMy2(1-ρyx2)-14λ2CMx4-λ2CMy2CMx2(1-ρyx2)]0

from above condition we write it as:

λk2(1-ρyx2)[CMy2(1-ρyx2)+CMx2]+My2λ2CMx440 (34)

From Equations (18) and (23) we get,

MSE(M^ppG)min-MSE(My*)min
  =My21+λCMy2(1-ρyx2)[λCMy2(1-ρyx2)-14λ2CMx4
  -λ2CMy2CMx2(1-ρyx2)]-M2y
  +My2[64+λCMx2(64+λ(25CMx2-16CMy2(ρyx2-1)))]64[1+λCMx2+λCMy2(1-ρyx2)]0.

from above condition we write it as:

64[k2(1-ρyx2)(1-λCMx2)-λ2My2CMx44][λCMx2+k4]-64λMy2CMx2
+λMy2CMx2k4[64+λ{25CMx2+16CMy2(1-ρyx2)}]0. (35)

where k4=1+λCMy2(1-ρyx2).

From Equations (23) and (28) we get,

MSE(My*)min-MSE(M^tm)
  =My2-My2[64+λCMx2(64+λ{25CMx2-16CMy2(ρyx2-1)})]64[1+λCMx2+λCMy2{1-ρyx2}]
  -ψ2{J12A-2J1J2B+2J1C+J22D-2J2E+F}
  +2ψ{J1G+J2H+I}-My20

from above condition we write it as:

64[2ψ(J1G+J2H+I)
-ψ2(J12A-2J1J2B+2J1C+J22D-2J2E+F)](k4+λCMx2)
-My2[64+λCMx2(64+λ(25CMx2-16CMy2(ρyx2-1)))]0 (36)

Under the above conditions, the proposed class of estimators outperforms the existing class of estimators. To validate the practical applicability of these conditions, a computational study has been carried out.

6 Empirical Study

To assess the efficiency of the proposed family of estimators, four real-world population datasets have been considered. A comprehensive summary of these datasets is provided in the corresponding Table 5.

For Population 1: Source[24]

Let y=No. of fish caught by fishermen in the year 1995 and x=No. of fish caught by fishermen in the year 1964.

For Population 2: Source [24]

Let y=No. of fish caught by fishermen in the year 1995 and x=No. of fish caught by fishermen in the year 1993.

For Population 3 Source[26]

Table 5 Summary of population parameters for empirical study

Population 1 Population 2 Population 3 Population 4
N 69 69 144 51
n 17 17 10 11
λ 0.01108 0.01108 0.02327 0.01782
My 2068 2068 2023 25.80
Mx 2011 2307 64659 25.60
fy(My) 0.00014 0.00014 0.00024 0.0728
fx(Mx) 0.00014 0.00013 0.00001 0.1080
ρyx 0.1505 0.3166 0.86110 0.9956
R 0.97243 1.11557 31.96193 0.99224
CMy 3.45399 3.45399 2.05965 0.53242
CMx 3.55189 3.33433 1.54658 0.36168
CMyx 1.84636 3.6462 2.74295 0.19172

Let y=No. of members of faculty and x=No. of students in four different colleges of 36 districts in Punjab.

For Population 4 Source [26]

Let y=Prices of oil in current week of 2017 and x=Prices of oil in previous week of 2017.

In Table 5, for each of the four population datasets, all relevant parameters have been clearly specified. Here, the study and auxiliary variables have been selected to be highly correlated, ensuring that the proposed family of estimators performs effectively. Populations 3 and 4 exhibit a high ρyx value, indicating strong correlation between the study and auxiliary variables. We now explain the population parameters presented in Table 5. Here, N denotes the size of the respective population, and n represents the sample size drawn from the population using Simple Random Sampling Without Replacement (SRSWOR), λ=0.25(1n-1N) = Finite population correction factor, My = Population median of the study variable of their respective population, Mx = Population median of the auxiliary variable of their respective population. Here as N, n then nN-1f (correction factor) and we are assuming as N the distribution of (X,Y) approaches to continuous distribution with marginal densities fy(My) of Y and fx(Mx) of X. This assumption is necessary for super population model framework for treating values of Y and X as a realization of N independent observations of the population from a continuous distribution. In addition to this we assumed that fy(My) and fx(Mx) are positive. ρyx=ρ(M^y,M^x) = Coefficient of correlation between sample median of auxiliary variable X and study variable Y of their respective population, R=MxMy, CMy=[Myfy(My)]-1 = Coefficient of variation of the median (study variable) of their respective population, CMx=[Mxfx(Mx)]-1 = Coefficient of variation of the median(auxiliary variable) of their respective population, CMyx=ρyxCMyCMx.

Table 6 Variances/MSEs/Minimum MSEs of different estimators

Estimators Population 1 Population 2 Population 3 Population 4
Var(M^0) 565443.60 565443.60 403886.96 3.36
MSE(M^R) 988372.80 746752.60 109312.95 0.36
MSE(M^E) 627420.20 524362.10 199667.99 1.47
MSE(M^0)min 552636.13 508766.01 104407.52 0.02
MSE(M^D1)min 489395.24 454675.78 101810.20 0.02
MSE(M^D2)min 480458.30 447982.60 101661.14 0.02
MSE(M^D3)min 471131.80 439763.40 100200.79 0.02
MSE(M^ppG)min 402459.30 384146.80 93055.80 0.02
MSE(My*)min 394518.94 376541.31 90648.43 0.02
MSE(M^tm)min 25864.52 35256.30 65459.24 0.27
MSE(M^t1)min 565443.56 565443.56 403886.95 3.36
MSE(M^t2)min 987359.05 746139.48 109315.01 0.47
MSE(M^t3)min 565443.56 565443.56 403886.95 3.36
MSE(M^t4)min 565443.56 565443.56 403886.95 3.36
MSE(M^t5)min 6509560.65 6316763.13 47174020.10 674.96
MSE(M^t6)min 19791542.30 16604742.20 76369028.30 670.73
MSE(M^t7)min 8356199.96 7446185.01 53568701.98 667.73
MSE(M^t8)min 6509560.65 631673.13 47174020.10 674.96
MSE(M^t9)min 6538398.27 6538398.27 5708076.84 679.09
MSE(M^t10)min 17496808.51 14477040.81 5789494.31 669.15
MSE(M^t11)min 7804144.74 7080486.03 4824361.01 670.43
MSE(M^t12)min 6538398.00 6538398.00 5708077.00 679.09
MSE(M^t13)min 621705.20 536598.58 50216719.70 1.48
MSE(M^t14)min 1624513.54 1304628.63 57200545.50 0.03
MSE(M^t15)min 621705.20 536598.58 50216719.71 1.48
MSE(M^t16)min 621705.20 536598.58 50216719.71 1.48
MSE(M^t17)min 552803.84 509406.65 104592.86 0.02
MSE(M^t18)min 1089935.86 879691.81 120502.73 0.02

Table 7 Variances/MSEs/Minimum MSEs of different estimators

Estimators Population 1 Population 2 Population 3 Population 4
MSE(M^t19)min 556518.39 510803.00 104410.65 0.02
MSE(M^t20)min 552803.84 509406.65 104592.86 0.02
MSE(M^t21)min 499439.16 499942.45 386574.59 3.36
MSE(M^t22)min 22585.47 33042.82 62870.99 0.47
MSE(M^t23)min 531203.92 541840.35 399312.41 3.36
MSE(M^t24)min 499439.16 499942.45 386574.59 3.36
MSE(M^t25)min 6503074.91 6381057.77 4824724.21 669.09
MSE(M^t26)min 17924182.87 15082738.31 6148968.81 670.92
MSE(M^t27)min 8026616.93 7301804.55 4448459.64 664.65
MSE(M^t28)min 6503075.00 6381058.00 4824724.00 669.09
MSE(M^t29)min 565439.12 491594.25 50407053.60 1.48
MSE(M^t30)min 4707854.50 4049840.40 53656773.20 8.02
MSE(M^t31)min 540612.56 476446.52 50239156.42 1.48
MSE(M^t32)min 565439.12 491594.24 50407053.58 1.48
MSE(M^t33)min 489395.24 454675.78 101810.16 0.02
MSE(M^t34)min 482783.90 447402.51 83830.11 0.02
MSE(M^t35)min 489395.24 454675.78 101810.16 0.02
MSE(M^t36)min 28013.04 18201.03 24093.33 0.01

In Tables 6 and 7, the Mean Squared Error (MSE) values of the estimators generated from the proposed family have been recorded for Populations 1, 2, 3, and 4. These values were obtained by substituting the population parameters from 5 into the MSE expressions of the estimators given in Tables 1, 2, 3 and 4.

7 Discussion

• To mitigate the influence of outliers, we have utilized the median rather than the mean, as the mean is known to be sensitive to extreme observations in the population.

• Analysis of Tables 6 and 7 leads us to the conclusion that estimators that rely on auxiliary information are more effective. The Table 6 clearly indicates that estimators M^D1,M^D2,M^D3,M^ppG, and My* exhibit higher efficiency compared to estimator M^0.

• The members of the class of estimators M^ti for i=1,3,4 obtained from the family M^tm, are equally efficient as M^0, while M^t2 is equally efficient as M^R. Furthermore, the estimators M^ti for i = 17,19,20,23,31 exhibit equal efficiency among themselves, whereas the estimators M^ti for i = 21,24,33,34,35 are more efficient than the usual unbiased estimator M^y.

• In the estimation of the population median My, the sample median is known to exhibit bias, particularly in small sample sizes, often resulting in skewed estimates. To mitigate this bias and enhance the efficiency of the estimator, we incorporate a set of constants (J1,J2,ψ,a,b, and g). These constants are specifically chosen to adjust the estimator such that it more accurately approximates the true population median, while simultaneously reducing variance and improving overall precision. The estimator that incorporates all these constants proves to be the most efficient, achieving the minimum mean squared error (MSE) under optimal conditions. This performance is followed by estimators that use unit values for the constants ψ=1, α=1, g=1, which exhibit relatively higher MSEs.

• The strong correlation between the study variable and the auxiliary variable contributes significantly to the improved performance of the proposed estimator. Furthermore, for all the estimators considered, the mean squared error (MSE) consistently decreases as the sample size increases.

• The estimators presented in the tables that incorporate auxiliary information – specifically the population median Mx and the correlation coefficient ρyx – demonstrate the lowest mean squared errors (MSEs). As shown in Tables 6 and 7, the estimators M^22 and M^36 exhibit significantly lower MSEs compared to the conventional estimators M^y, M^R, and M^E.

• Among all the estimators that are present in the Tables 6 and 7, M^36 is the best of all for the particular values of constant. Hence we say (ψ=1,α=1 and g=1) are the suitable values for obtaining minimum MSEs among the different subsets of these constant.

• As a potential direction for future work, the proposed approach could be extended and evaluated within the context of stratified sampling.

8 Conclusion

We have proposed a family of estimators for estimating the population median of a study variable using auxiliary information under simple random sampling without replacement (SRSWOR). The proposed estimators are designed to perform effectively even in the presence of skewed distributions or datasets containing outliers. Furthermore, it is observed that our proposed family includes several well-known estimators as special cases – for instance, the conventional unbiased estimator of the population median based on the study variable [4], and the auxiliary variable-based median estimator proposed by [8]. In addition to encompassing existing estimators, several new estimators have also been derived from the proposed family. For these estimators, we have computed the bias and mean squared error (MSE) up to the first order of approximation. The proposed family of estimators is advantageous in that the statistical properties – such as bias and MSE – of individual estimators within the class can be easily obtained from the general form. We ultimately come to the conclusion that there is always a chance to generate estimators from the suggested estimators M^ti (i=1,2,3…36) and M^tm that are superior to the current estimators. The suggested family of estimators is tested against existing estimators using four distinct datasets of size n1=69, n2=69, n3=144, and n4=51. Using normal distributions, the population density functions for X and Y are also computed. In the empirical study, it has been shown that our estimator M^36 is better than the existing estimator as it has minimum MSE among all the existing estimator and confirming higher accuracy in real-world survey.

References

[1] Ahmad, S., Masood, S., Alomair, A. M., and Alomair, M. A. (2024). Improved modified estimator for estimation of median using auxiliary information under simple random sampling. Scientific Reports, 14(1):16504.

[2] Bahl, S. and Tuteja, R. (1991). Ratio and product type exponential estimators. Journal of Information and Optimization Sciences, 12(1):159–164.

[3] Baig, A., Masood, S., and Ahmed Tarray, T. (2020). Improved class of difference-type estimators for population median in survey sampling. Communications in Statistics-Theory and Methods, 49(23):5778–5793.

[4] Gross, S. (1980). Median estimation in sample surveys. Proceedings of the Section on Survey Research Methods, pages 181–184.

[5] Gupta, S., Shabbir, J., and Ahmad, S. (2008). Estimation of median in two-phase sampling using two auxiliary variables. Communications in Statistics-Theory and Methods, 37(11):1815–1822.

[6] Hansen, M. H., Hurwitz, W. N., and Madow, W. G. (1953). Sample survey methods and theory. vol. i. methods and applications.

[7] Hussain, M. A., Javed, M., Zohaib, M., Shongwe, S. C., Awais, M., Zaagan, A. A., and Irfan, M. (2024). Estimation of population median using bivariate auxiliary information in simple random sampling. Heliyon, 10(7).

[8] Kuk, A. Y. and Mak, T. (1989). Median estimation in the presence of auxiliary information. Journal of the Royal Statistical Society: Series B (Methodological), 51(2):261–269.

[9] Muneer, S., Khalil, A., Shabbir, J., and Narjis, G. (2022). Efficient estimation of population median using supplementary variable. Scientia Iranica, 29(1):265–274.

[10] Murthy, M. (1964). Product method of estimation. Sankhyā: The Indian Journal of Statistics, Series A, pages 69–74.

[11] Rao, T. (1991). On certain methods of improving ratio and regression estimators. Communications in Statistics-Theory and Methods, 20(10):3325–3340.

[12] Reddy, V. N. (1973). On ratio and product methods of estimation. Sankhyā: The Indian Journal of Statistics, Series B, pages 307–316.

[13] Reddy, V. N. (1974). On a transformed ratio method of estimation. Sankhyā C, 36:59–70.

[14] Sahai, A. and Ray, S. (1980). An efficient estimator using auxiliary information. Metrika, 27(1):271–275.

[15] Shabbir, J. and Gupta, S. (2015). A note on generalized exponential type estimator for population variance in survey sampling. Revista Colombiana de Estadística, 38(2):385–397.

[16] Shabbir, J. and Gupta, S. (2017). A generalized class of difference type estimators for population median in survey sampling. Hacettepe Journal of Mathematics and Statistics, 46(5):1015–1028.

[17] Shabbir, J., Gupta, S., and Narjis, G. (2022). On improved class of difference type estimators for population median in survey sampling. Communications in Statistics-Theory and Methods, 51(10):3334–3354.

[18] Shahzad, U., Al-Noor, N. H., Hanif, M., and Sajjad, I. (2021). An exponential family of median-based estimators for mean estimation with simple random sampling scheme. Communications in Statistics-Theory and Methods, 50(20):4890–4899.

[19] Sharma, P. and Singh, R. (2013). Efficient estimator of population mean in stratified random sampling using auxiliary attribute. World Applied Sciences Journal, 27(12):1786–1791.

[20] Sharma, P. and Singh, R. (2015). Generalized class of estimators for population median using auxiliary information. Hacettepe Journal of Mathematics and Statistics, 44(2):443–453.

[21] Singh, G. N., Pandey, A. K., and Singh, C. (2022). Generalized estimation strategy for mean estimation on current occasion in two-occasion rotation patterns. Communications in Statistics-Simulation and Computation, 51(4):1661–1684.

[22] Singh, H. P., Sidhu, S. S., and Singh, S. (2006). Median estimation with known interquartile range of auxiliary variable. International Journal of Applied Mathematics and Statistics, 4:68–80.

[23] Singh, H. P. and Solanki, R. S. (2013). Some classes of estimators for the population median using auxiliary information. Communications in Statistics-Theory and Methods, 42(23):4222–4238.

[24] Singh, S. (2003). Advanced Sampling Theory With Applications: How Michael “Selected” Amy, volume 2. Springer Science & Business Media.

[25] Srivastava, S. K. (1967). An estimator using auxiliary information in sample surveys. Calcutta Statistical Association Bulletin, 16(2–3): 121–132.

[26] Statistics, B. O. (2009). Punjab development statistics.

[27] Subzar, M., Lone, S. A., Ekpenyong, E. J., Salam, A., Aslam, M., Raja, T., and Almutlak, S. A. (2023). Efficient class of ratio cum median estimators for estimating the population median. PLOS One, 18(2):e0274690.

Biographies

Prayas Sharma is currently working as Assistant Professor in the Department of Statistics, Babasaheb Bhimrao Ambedkar University, Lucknow. Dr. Sharma holds a Bachelor’s degree in Computer Science & Statistics, Masters and Doctorate degree in Statistics from Banaras Hindu University, Varanasi, India. Dr. Sharma has good knowledge of Statistics, Artificial Intelligence and Machine Learning, Business Analytics & Research Methodology along with strong computational & programming skills.

He has more than 11 years of academic experience, both in the domain of teaching and research. His research interest includes Survey Sampling, Estimation Procedures using Auxiliary Information and Measurement Errors, Predictive Modelling, Business Analytics and Operations Research. Dr. Sharma has published more than 65 research papers in reputed National & International journals along with one book and two chapters in book internationally published. He has more than 800 citations with H-Index 17 & I index of 23. Dr. Sharma has a keen interest in reading, writing and publishing, he is serving 7 reputed journals as editor/associate editor and more than 30 journals as reviewer and reviewed more than 150 research papers.

Anupam Lata is research scholar in the Department of Statistics, Babasaheb Bhimrao Ambedkar University, Lucknow. She has completed Masters in Statistics and pursuing the research in the area of sampling theory.

Subash Kumar Yadav is a faculty in the Department of Statistics at Babasaheb Bhimrao Ambedkar University Lucknow, U.P., India. He earned his M.Sc. and Ph.D. degrees in Statistics from Lucknow University and qualified the National Eligibility Test. Dr. Yadav has published more than 120 papers in SCOPUS/WoS indexed national and international journals of repute and two books from an international publisher. He is a referee for 20 reputed international journals. He has presented papers in more than 20 national and international conferences and also delivered more than 70 invited talks in several conferences and chaired sessions in different national and international conferences. He has been awarded best paper award twice and awarded four timed the Research and Academic Excellence award by his institution.

Muhammad Noor ul Amin is an Associate Professor of Statistics at COMSATS University Islamabad, Lahore Campus. He specializes in statistical quality control, data science, and advanced applied statistics. His research interests include adaptive and robust control charts, ranked set sampling, and machine learning applications in forecasting and health data.