EXACT DISTRIBUTION OF HAT VALUES AND IDENTIFICATION OF LEVERAGE POINTS
Keywords:
Centered Hat Values, Hat Matrix, Beta-Distribution, Moments, Leverage Points, Outliers, X-Space.Abstract
This paper proposed the exact distribution of centered hat values of the hat matrix of predictors in multiple linear regression analysis. The authors adopted the relationship proposed by Belsey et al. (1980) between the centered hat values and the F-ratio and we showed that the derived density function of the centered hat values followed Beta distribution B(p-1, n-p) and it lies between 1/n<=h<=1 . Moreover, the first two moments of the distribution are derived and we established the upper and lower limits of the centered hat values. Moreover, the shape of the density function of hat values is also visualized and the authors computed the percentage points of centered hat values at 5% and 1% significance level for different sample sizes and predictors. Finally, the authors proposed two approaches. The first approach helps to identify the leverage points in multiple linear regression analysis in the X-space based on the test of significance and the second approach scrutinized the leverage points as well as the outliers. The proposed approaches were numerically illustrated and the results were compared the traditional approach.
Downloads
References
Belsley, D. A., Kuh, E. and Welsch, R. E. (2005). Regression Diagnostics:
Identifying Influential Data and Sources Of Collinearity, Wiley-Interscience.,
Vol. 571.
Chatterjee, S. and Hadi, A. S. (2009). Sensitivity Analysis in Linear
Regression, Wiley, Vol. 327.
Chave, A. D. and Thomson, D. J. (2003). A bounded influence regression
estimator based on the statistics of the hat matrix, J. Roy. Statist. Soc. C,
(3), p. 307-322.
Diaz-Garcia, J.A. and Gonzalez- Faras, G. (2004). A note on the Cook's
distance, J. Statist. Plan. Inference, 120, p. 119-136.
Dodge, Y. and Hadi, A. S. (1999). Simple graphs and bounds for the elements
of the hat matrix, J. Appl.Statist., 26(7), p. 817-823.
Handschin, E., Schweppe, F. C., Kohlas, J. and Fiechter, A. (1975). Bad data
analysis for power system state estimation. Power Apparatus and Systems,
IEEE Transact, 94(2), p. 329-337.
Hoaglin, D. C. and Welsch, R. E. (1978). The hat matrix in regression and
ANOVA, The Amer. Statist, 32(1), p. 17-22.
Huang, Y., Kuo, M. and Wang, T. (2007). Pair-perturbation influence
functions and local influence in PCA, Comp. Statist. and Data Analysis, 51, p.
-5899
Krasker, W. S. and Welsch, R. E. (1982). Efficient bounded-influence
regression estimation, J.Amer.Statist. Assoc., 77, p. 595-604.
Mallows, Colin L. (1975). On some topics in robustness, Unpublished
memorandum, Bell Telephone Laboratories, Murray Hill, NJ.
Prendergast, L.A. (2005). Influence functions for sliced inverse regression.
Scandinavian J. Statist., 32, p. 385-404.
Prendergast, L.A. (2006). Detecting influential observations in Sliced Inverse
Regression analysis, Austral. and New Zealand J. Statist., 48, p. 285-304.
Pynnonen, Seppo (2010). Joint distribution of a linear transformation of OLS
regression residuals with general spherical error distribution. Working Paper,
Department of Mathematics and Statistics, University of Vaasa.
Ullah, M.A. and Pasha, G.R. (2009). The Origin and development of
influence measures in regression, Pak. J. Statist, Vol. 25(3), p. 295-307.