Inference Based on Type-II Hybrid Censored Data from a Pareto Distribution

Çağatay Çetinkaya

Bingöl University, Faculty of Economics & Administrative Sciences, Department of Business Administration, 12000, Bingöl, Turkey

E-mail: ccetinkaya@cu.edu.tr

Received 12 July 2020; Accepted 14 October 2020; Publication 26 December 2020

Abstract

The Pareto distribution takes part in life-testing experiments as a finite range distribution. In this study, inference studies for the scale and shape parameters of the Pareto distribution under type-II hybrid censoring scheme are considered. The main reason for choosing this censoring scheme is its advantage of guaranteeing at least particular failures to be observed by the end of the experiment. Maximum likelihood and Bayes estimation methods are used with their approximate confidence intervals. Proposed estimation methods are compared numerically based on simulation studies. A numerical example is also used to illustrate the theoretical outcomes.

Keywords: Bayesian inference, maximum-likelihood, Pareto distribution, hybrid censoring.

1 Introduction

In reliability theory, many lifetime datasets are modelled with probability distributions under different ways. These probability distributions are mostly defined on [0,) range and such as exponential, Weibull, lognormal, gamma distributions can be specified as mostly used distributions among them. On the other hand, in many cases the lifetime distributions should be considered on a finite range due to specific conditions of the items or components. For instance, strength, length, pressure, temperature, time, voltage, weight of a material may take any value on a finite range. Further, censored or truncated cases reduce lifetimes of items on a finite range. In this way, finite range probability distributions have importance in many reliability problems. For instance, Pareto distribution by Abdel-Ghaly et al. [1], beta distribution by Gupta and Gupta [2], Topp-Leone distribution by Ghitany et al. [3] were considered in the context of reliability.

On the other hand, it is known that the lifetimes of components or units may not be recorded exactly always. In most cases, units are lost or removed from the experiments before they failed so censored datasets are observed. In reliability theory, there are many different censoring schemes (CS) such as Type-I censoring, Type-II censoring and hybrid censoring which is a mixture of Type-I and Type II and introduced by Epstein [4] known as the Type-I hybrid censoring scheme in the literature. However, similar to conventional Type-I CS, Type-I hybrid censoring scheme has disadvantage when only few failures occurring up to the pre-fixed time T. For this reason, Childs et al. [5] proposed another hybrid censoring scheme known as the Type-II hybrid censoring scheme as described in the following. Firstly, the experiment starts with n independent and identical units. Then, the experiment is terminated at the random time T*=max{XR:n,T} where R<n and T are prefixed numbers. Thus, the Type-II hybrid CS guaranteed at least R failures on test. Based on many probability distribution, Type-II hybrid CS is considered. Among them, Banerjee and Kundu [6] for Weibull distribution, Ganguly et al. [7] for two-parameter exponential distribution can be given as reference to motivation of this study.

As a finite range distribution, the Pareto distribution with scale parameter k and shape parameter σ is defined with the following density, distribution and survival functions

f(x;k,σ)=σkσx-(σ+1) (1)

and

F(x;k,σ)=1-(kx)σ,S(x;k,σ)=(kx)σ (2)

where x>k and k,σ>0.

The Pareto distribution is considered for modelling lifetime datasets in many engineering problems due to its range depended to its scale parameter. Under censored cases, many inference studies for Pareto distribution take part in literature. Recently, Cheng and Zhao [8] studied estimation of Pareto distribution under hybrid Type-I and the progressive Type-II hybrid censored samples. Fu et al. [9] considered objective Bayesian inference under progressive Type-II censoring. Cheng et al. [10] Type-II hybrid censored studied exact inference of the distribution for doubly Type-II and the progressive Type-II censored samples. The Type-II hybrid CS only considered by Balakrishnan and Shafay [11] for the Bayesian prediction intervals.

The aim of this paper is to obtain the parameter estimations of a Pareto model under Type-II hybrid censoring. Importance of this CS is let the test to terminate whenever a pre-specified number of observations has been obtained and prespecified time point for the duration of experiment has reached. On contrast to Type-I hybrid CS, the test provides both of the failure points. For this purpose, we consider parameter estimations of the Pareto distribution under Type-II hybrid CS. It should be note that previous inferential studies based on progressive Type-II hybrid samples [9, 11] can be reduced to the Type-II hybrid CS by taking the number of removed live units at each failure as zero. However, in these previous studies, Bayesian estimation and comparing the estimation methods do not take part. Based on this matter, we used the maximum likelihood (MLE) and Bayes estimation methods to obtain estimation of the shape and scale parameters of the Pareto distribution under Type-II hybrid CS. As approximate confidence intervals of estimates, bootstrap method is used for MLEs and compared for frequentist coverage probability with Bayes credible intervals.

2 Model Description and Maximum Likelihood Estimation

Let suppose that X be a positive random variable from the Pareto(k,σ) distribution. Firstly, we describe the observed data under the Type-II hybrid censoring scheme with a known prefixed failure number R and a prefixed test terminate time T. In this case, we observe the following two cases of observations.

Case I: {x1:n<x2:n<<xR:n} if T<xR:n

Case II: {x1:n<x2:n<<xd:n<T<xd+1:n} if xR:n<T and Rd<n.

where x1:n<x2:n<<xn:n denotes the ordered random sample from the Pareto(k,σ) sample. It should be note that we do not observe xd+1:n but xd:n<T<xd+1:n means d-th failure occurs before T and there is no failure between xd:n and T. Based on these cases, the associated likelihood function of the observed data can be obtained as in the following

L(θ|𝑿)=n!(n-m)!i=1mf(xi:n;θ)[1-F(U;θ)]n-m (3)

where m=R and U=xR:n for the Case I (T<xR:n) and m=d, U=T for the Case II (xR:n<T).

For our problem, let 𝑿=(x1:n,x2:n,,xm:n) denotes a hybrid censored sample from the Pareto(k,σ) distribution. Then, the corresponding log-likelihood function of the observed sample, denoted by l(k,σ), can be obtained as in the following

l(k,σ)m(lnσ+σlnk)-(σ+1)i=1mlnxi:n+σ(n-m)[lnk-lnU] (4)

From (4), the MLEs of the k and σ, denoted by k^ML and σ^ML, are the values that maximizing l(k,σ) with respect to k and σ, respectively. Here, it is clearly seen that the log-likelihood function l(k,σ) is an increasing function of k. Therefore the k^ML is the first order statistics of the observed sample. That is k^ML=x1:n. Then, the MLE of the shape parameter σ can be obtained as

σ^ML=mi=1mlnxi:n-(n-m)[lnx1:n-lnU]-mlnx1:n (5)

In addition to estimates, confidence intervals for the parameters are needed. However, the Pareto distribution does not belong to a regular family of distributions since the support of the distribution depends on its scale parameter k and Fisher information matrix is not a positive definite matrix in the case of σR/n [4]. Therefore, asymptotic variances of the parameters cannot be determined by using the standard theory of the Fisher information matrix and a confidence interval by using the asymptotic normality of MLEs cannot be obtained for Pareto parameters. Hence, we can propose bootstrap percentile method (boot-p) as suggested by Efron and Tibshirani [12] as an alternative approximate confidence interval.

2.1 Bootstrap Confidence Interval for MLEs

In this subsection, parametric bootstrap confidence intervals are considered for the MLEs of the parameters. Since the asymptotic variances of the parameters cannot be determined by using the standard theory of the Fisher information matrix, bootstrap percentile method (boot-p) will be used in place of bootstrap-t confidence interval that require a regular family in its construction.

The following algorithm is proposed to generate parametric bootstrap samples, as suggested by Efron and Tibshirani [7].

Step 1: Generate a random sample (x1,x2,,xm) from Pareto(k,σ).

Step 2: After computing the MLEs of all parameters k and σ, generate independent bootstrap sample (x1*,x2*,,xm*) from Pareto(k^,σ^). Then, compute the MLEs of all parameters based on the bootstrap samples, denoted by k^* and σ^*.

Step 3: Repeat Step 2 B times and obtain sets of bootstrap estimates of k and σ, say k^i* and σ^i* where i=1,2,,B.

By using these bootstrap samples, compute (k^*(γ/2),k^*(1-γ/2)) where k^*(γ) is the γ-percentile of k^i*, i=1,2,,B, that is a number such that

1Bi=1BI(k^i*k^*(γ)),0<γ<1

where I() is the indicator function. Similarly, (σ^*(γ/2),σ^*(1-γ/2)) can be obtained easily.

3 Bayesian Estimation

In this section, we consider the Bayesian estimations of the parameters for our problem. It is known that selection of prior distributions for the parameters play an important role in Bayesian estimations. In the previous studies, various prior distributions have been proposed for the unknown parameters of a particular distribution of interest. In specific of Pareto distribution, Fu et al. [4] compared different independent priors for the Pareto parameters and recommended π(k,σ)=1/kσ as the reference pripor. Here, 0<k<x1:n and σ>0. In a similar manner with progressive type-II censored Pareto data by Fu et al. [9], posterior distributions based on type-II hybrid censored case can be obtained as follows.

Based on the given likelihood function of the observed sample and the reference prior π(k,σ), the joint posterior density of the scale and shape parameters can be obtained as in the following

π(k,σ|x1,x2,,xm)L(k,σ|x1,x2,,xm)π(k,σ)
  σmkσnexp{-σ(i=1mlnxi:n+(n-m)lnU)}1kσ

where 0<k<x1:n and σ>0. Then, the marginal densities of the parameters are derived by integrating out the nuisance parameter in the joint posterior density. Firstly, the marginal posterior density of the shape parameter σ is obtained as a gamma distribution such as GA(m-1,i=1mlnxi:n+(n-m)lnU-nlnx1:n) as given in the following

π(σ|𝑿) =0x1:nπ(k,σ|x1,x2,,xm)𝑑k
0x1:nσm-1kσn-1exp{-σ(i=1mlnxi:n+(n-m)lnU)}𝑑k
σm-2exp{-σ(i=1mlnxi:n+(n-m)lnU-nlnx1:n)}

and equally

π(σ|𝑿)GA(m-1,i=1mlnxi:n+(n-m)lnU-nlnx1:n) (6)

For the scale parameter k,

π(k|𝑿) =0σm-1kσn-1exp{-σ(i=1mlnxi:n+(n-m)lnU)}𝑑σ
1k(i=1mlnxi:n+(n-m)lnU-nlnk)m

with the normalization constant

C=1n(m-1)(i=1mlnxi:n+(n-m)lnU-nlnx1:n)m-1

and the quantile function of the posterior distribution of k can be obtained as

k=F-1(u)=exp{i=1mlnxi:n+(n-m)lnU-(ucn(m-1))11-mn} (7)

where u denotes a random variable from the uniform distribution U(0,1).

4 Simulation Studies

In this section, we provide some simulation studies to evaluate the performances of the MLE and Bayesian methods for parameters. The biases and mean squared errors (MSE) of the estimates are used for comparisons. Average confidence lengths and frequentist coverage probability for α=0.95 are reported for the approximate confidence intervals. We can decide that an estimate is good or bad depend on whether the frequentist coverage probability of α credible interval or confidence interval close to α or not (Fu et al. [9]). Therefore, let kπ(α;𝑿) and σπ(α;𝑿) be the posterior α quantile of k and σ for given censored dataset. Thus, the credible intervals for k and σ are obtained as (kπ(α/2;𝑿), kπ(1-α/2;𝑿)) and (σπ(α/2;𝑿), σπ(1-α/2; 𝑿)). Then, the frequentist coverage probability of a one-sided credible interval for k and σ is given as in the following (see Guan et al. [13])

Qπ(α;k) =P(k,σ)(0kkπ(α;𝑿))
Qπ(α;σ) =P(k,σ)(0σσπ(α;𝑿))

where k(α;𝑿) and σ(α;𝑿) is a random variable. Then, corresponding Bayesian estimations are calculated by using the Equations (6) and (7) as proposed estimators. Frequentist coverage probabilities Qπ(α;k) and Qπ(α;σ) are estimated by the relative frequency

{k<kπ(α;𝑿)}/N

where k<kπ(α;𝑿) is the proportion of the true value less than the posterior α quantiles of k (Fu et al. [9]). Similar calculations are needed for Qπ(α;σ).

By taking arbitrary parameter values (k,σ)=(2.5,1.5), different sample sizes such as n=30 and n=40, for each sample sizes R=20,25,28 and R=30,35,38, respectively and for different prefixed time T=4 and T=8 we generated N=100000 samples from the Pareto(k,σ) distribution by using the posterior densities which is given in Equations (6) and (7). For each replication we used B=500 bootstrap intervals.

All obtained results are reported in Table 1 for n=30 and in Table 2 for n=40. As expected, the biases and MSEs decreasing with parallel to increasing sample size. Bayes estimates have smaller bias but MLEs have smaller MSEs. As Fu et al. [4] mentioned before, frequest coverage probabilities of k become 1.00 is obviously seems unreasonable. However, it can be said that the MLEs of the scale parameter is an order statistics. So, all the estimates on each iteration provide the k<kπ(α;𝑿) condition so this fact can explain the reason of this result.

Table 1 Biases and MSEs (within parentheses) of the estimates in the first rows with their average confidence length and coverage probabilities (within parentheses) in the second rows for n=30, T=4 and T=8 with various R

R T k^ML k^B σ^ML σ^B
20 4 0.0573 (0.0034) 0.0014 (0.0073) 0.1587 (0.1482) 0.0734 (0.2759)
0.2044 (1.00) 0.2198 (0.95) 1.2163 (0.90) 1.4087 (0.95)
8 0.0566 (0.0033) 0.0020 (0.0069) 0.1063 (0.1094) 0.0404 (0.2104)
0.2091 (1.00) 0.2187 (0.95) 1.3277 (0.99) 1.2288 (0.95)
25 4 0.0570 (0.0034) 0.0007 (0.0071) 0.1300 (0.1206) 0.0671 (0.2243)
0.2059 (1.00) 0.2151 (0.95) 0.9293 (0.94) 1.2476 (0.95)
8 0.0570 (0.0034) 0.0010 (0.0071) 0.1137 (0.1067) 0.0531 (0.2053)
0.2072 (1.00) 0.2159 (0.95) 1.2397 (0.98) 1.2162 (0.95)
28 4 0.0568 (0.0034) 0.0005 (0.0069) 0.1147 (0.1050) 0.0588 (0.1961)
0.2068 (1.00) 0.2134 (0.95) 1.0669 (0.97) 1.1708 (0.95)
8 0.0570 (0.0035) 0.0005 (0.0069) 0.1136 (0.1034) 0.0578 (0.1942)
0.2069 (1.00) 0.2134 (0.95) 1.2177 (0.97) 1.1692 (0.95)

Table 2 Biases and MSEs(within parentheses) of the estimates in the first rows with their average confidence length and coverage probabilities(within parentheses) in the second rows for n = 40, T = 4 and T = 8 with various R

R T k^ML k^B σ^ML σ^B
30 4 0.0431 (0.0020) 0.0003 (0.0039) 0.1102 (0.0948) 0.0637 (0.1800)
0.1530 (1.00) 0.1598 (0.95) 0.7534 (0.91) 1.1296 (0.95)
8 0.0431 (0.0019) 0.0005 (0.0039) 0.0819 (0.0786) 0.0420 (0.1520)
0.1552 (1.00) 0.1606 (0.95) 1.0576 (0.98) 1.0543 (0.95)
35 4 0.0428 (0.0019) 0.0005 (0.0037) 0.0926 (0.0809) 0.0502 (0.1484)
0.1538 (1.00) 0.1586 (0.95) 0.8051 (0.95) 1.0374 (0.95)
8 0.0428 (0.0019) 0.0004 (0.0037) 0.0876 (0.0764) 0.0457 (0.1426)
0.1541 (1.00) 0.1588 (0.95) 1.0159 (0.96) 1.0299 (0.95)
38 4 0.0427 (0.0019) 0.0004 (0.0037) 0.0858 (0.0731) 0.0480 (0.1352)
0.1540 (1.00) 0.1578 (0.95) 0.9189 (0.97) 0.9927 (0.95)
8 0.0427 (0.0019) 0.0004 (0.0037) 0.0857 (0.0728) 0.0478 (0.1349)
0.1540 (1.00) 0.1578 (0.95) 0.9796 (0.97) 0.9925 (0.95)

Further, we see that bootstrap confidence intervals have smaller lengths than Bayesian methods in all cases. All the frequentist coverage probabilities are observed close to α.

Table 3 Estimates (first rows) and length of their corresponding credible intervals (second rows) for the data of lifetimes of the steel specimens for different R and T

R T k^ML k^B σ^ML σ^B
16 80 51.0000 50.6380 1.3325 1.0391
6.9561 7.8087 1.0413 1.2571
100 51.0000 48.5424 1.3325 1.1821
7.5527 7.8087 1.1821 1.2571
120 51.0000 50.8495 1.3325 1.1361
7.4172 7.8087 1.4854 1.2571
18 80 51.0000 49.1921 1.7060 1.3722
5.9629 6.0803 1.9067 1.5240
100 51.0000 49.5787 1.7872 1.4933
5.3507 5.7696 2.0093 1.5568
120 51.0000 49.3915 1.7571 1.1414
5.4763 5.8625 1.7361 1.5306

5 Numerical Example

In this section, a real dataset is analyzed to illustrate the use of our proposed estimation methods. The data from Crowder [14] which is the lifetimes of the steel specimens tested at 38.5 stress level is used. Observations of this data set are given as 60, 51, 83, 140, 109, 106, 119, 76, 68, 67, 111, 57, 69, 75, 122, 128, 95, 87, 82, 132. The truncated form of this data was also used by Juvairiyya and Anilkumar [15] and fitted by Pareto distribution. We first fit this data to Pareto distribution under uncensored case. The MLEs of the parameters are obtained as 51 and 1.8334, respectively. Then, the corresponding Kolmogorov–Simirnov test statistic and associated p-value are obtained as 0.35 and 0.1745, respectively. Thus, one cannot reject the null hypothesis that the data set comes from the Pareto distribution. We obtained the proposed estimates and their credible intervals for different time point T=80,100,120 and R=16,18. The results are presented in Table 3.

We observed that lengths of credible intervals for both methods are decreasing with increasing time point T and censored failure numbers R. For scale parameter k, ML method gives shorter lengths in all cases. For the shape parameter σ, Bayesian credible intervals are getting shorter with increasing T and R.

6 Conclusions

In this paper, maximum likelihood and Bayesian estimated of the parameters of the Pareto distribution based on the Type-II hybrid censoring scheme is obtained. Since the distribution does not belong to regular family of distributions a bootstrap confidence interval is proposed for maximum likelihood estimates and an objective Bayesian analysis which is proposed by Fu et al. [9] is used under this censoring scheme. Simulation results perform consistent results and provide target coverage probabilities.

Acknowledgement

The author would like to thank the anonymous reviewers for their valuable comments and suggestions which were helpful in improving the paper.

References

[1] Abdel-Ghaly, A. A., Attia, A. F., & Aly, H. M. (1998). Estimation of the parameters of pareto distribution and the reliability function using accelerated life testing with censoring. Communications in Statistics-Simulation and Computation, 27(2), 469–484.

[2] Gupta, P. L., & Gupta, R. C. (2000). The monotonicity of the reliability measures of the beta distribution. Applied Mathematics Letters, 13(5), 5–9.

[3] Ghitany, M. E., Kotz, S., & Xie, M. (2005). On some reliability measures and their stochastic orderings for the Topp–Leone distribution. Journal of Applied Statistics, 32(7), 715–722.

[4] Epstein, B. (1954). Truncated life tests in the exponential case. The Annals of Mathematical Statistics, 555–564.

[5] Childs, A., Chandrasekar, B., Balakrishnan, N., Kundu, D. (2003). Exact likelihood inference based on type-I and type-II hybrid censored samples from the exponential distribution. Annals of the Institute of Statistical Mathematics, 55(2), 319–330.

[6] Banerjee, A., & Kundu, D. (2008). Inference based on type-II hybrid censored data from a Weibull distribution. IEEE Transactions on reliability, 57(2), 369–378.

[7] Ganguly, A., Mitra, S., Samanta, D., & Kundu, D. (2012). Exact inference for the two-parameter exponential distribution under Type-II hybrid censoring. Journal of Statistical Planning and Inference, 142(3), 613–625.

[8] Cheng, C., Zhao, H. (2016). Exact inferences of the two-parameter exponential distribution and Pareto distribution with hybrid censored data. Pacific Journal of Applied Mathematics, 8(1), 65.

[9] Fu, J., Xu, A., Tang, Y. (2012). Objective Bayesian analysis of Pareto distribution under progressive Type-II censoring. Statistics and Probability Letters, 82(10), 1829–1836.

[10] Cheng, C., Chen. J., Bai, J. (2013). Exact inferences of the two-parameter exponential distribution and Pareto distribution with censored data. Journal of Applied Statistics, 40(7), 1464–1479.

[11] Balakrishnan, N., Shafay, A.R. (2012). One-and two-sample Bayesian prediction intervals based on Type-II hybrid censored data. Communications in Statistics-Theory and Methods, 41(9), 1511–1531.

[12] Efron, B., Tibshirani, R.J. (1994). An introduction to the bootstrap, CRC press.

[13] Crowder, M. (2000). Tests for a family of survival models based on extremes. In Recent Advances in Reliability Theory (pp. 307–321). Birkhäuser, Boston, MA.

[14] Guan, Q., Tang, Y., & Xu, A. (2013). Objective Bayesian analysis for bivariate Marshall–Olkin exponential distribution. Computational Statistics & Data Analysis, 64, 299–313.

[15] Juvairiyya, R. M., & Anilkumar, P. (2019). Estimation of Stress-Strength Reliability for the Pareto Distribution Based on Upper Record Values. Statistica, 78(4), 397–409.

Biography

images

Çağatay Çetinkaya has received his B.Sc. in Statistics from Anadolu University, Turkey; M.Sc. and Ph.D. degrees in Statistics from Çukurova University, Turkey. His research areas are distribution theory, censored data, reliability theory, order statistics and statistical computing. Dr. Çetinkaya is currently studying as a research assistant in Department of Business Administration of Bingöl University, Turkey.

Abstract

1 Introduction

2 Model Description and Maximum Likelihood Estimation

2.1 Bootstrap Confidence Interval for MLEs

3 Bayesian Estimation

4 Simulation Studies

5 Numerical Example

6 Conclusions

Acknowledgement

References

Biography