Predictive Modelling: An Assessment Through Validation Techniques
DOI:
https://doi.org/10.13052/jrss0974-8024.1513Keywords:
Cross validation, prediction error rate, linear and non-linear modelAbstract
In this investigation, various statistical models were fitted on simulated symmetric and asymmetric data. Fitting of models was carried out with the help of various libraries in R studio, and various selection criteria were also used while fitting of models. In order to evaluate different validation techniques the simulated data was divided in training and testing data set and various functions in R were developed for the purpose of validation. Coefficient summary revealed that all statistical models were statistically significant across both symmetric as well as asymmetric distributions. In preliminary analysis TFEM (Type First Exponential Model) was found out to be the best linear model across both symmetric and asymmetric distributions with lower values of RMSE, MAE, BIAS, AIC and BIC. Among non-linear models, Haung model was found out to be best model across both the distributions as it has lower values of RMSE, MAE etc. Different validation techniques were used in the present study. Lower rates of prediction error in comparison to its counter parts, 5-folded cross validation performed better across all the statistical models.
Downloads
References
Bazán, J.L., Bolfarine, H., Branco, M.D. (2010): A framework for skew-probit links in binary regression. Commun. Stat. Theory Methods 39, 678–697.
Bennett, P. N. (2003): Using Asymmetric Distributions to Improve Text Classifier Probability Estimates. In SIGIR ’03: Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, 111–118.
Biging (eds) Forest Simulation Systems, Proc. of IUFRO Conf., 2–5 Nov. 1988. Univ. Calif., Div. Agric. and Nat. Res., Bulletin 1927, pp. 81–88.
Burk, T.E. (1990): Prediction error evaluation: preliminary results. In L.C. Wensel and G.S. Chen, M.H., Dey, D.K., Shao, Q.M. (1999): A new skewed link model for dichotomous quantal response data. J. Am. Stat. Assoc. 94(448), 1172–1186
Efron, B. and Gong, G., (1983): A leisurely look at the bootstrap, the jackknife and crossvalidation. Amer. Statist. 37:36–48.
Feng, C., Wang, H., Lu, N., Chen, T., He, H., Lu, Y. (2014): Log-transformation and its implications for data analysis. Shanghai Arch. Psychiatry, 26, 105.
Hassani, H., Yeganegi, M.R., Khan, A., Silva, E.S. (2020): The effect of data transformation on singular spectrum analysis for forecasting. Signals, 1, 2.
Hsu, C.-w., Chang, C.-c., and Lin, C.-j. (2010): A Practical Guide to Support Vector Classification.
Hirsch, R.P. (1991) Validation samples. Biometrics 47:1193–1194.
Kato, T., Omachi, S., and Aso, H. (2002): Asymmetric Gaussian and Its Application to Pattern Recognition. In Structural, Syntactic, and Statistical Pattern Recognition, volume 2396 of Lecture Notes in Computer Science. 405–413.
Kowalski J, Tu XM. (2007): Modern Applied U Statistics. New York: Wiley.
Larson, S. (1931): The shrinkage of the coefficient of multiple correlations. Journal of Educational Psychology, 22(1): 45–55.
Mosteller, F. and Turkey, J.W. (1968): Data Analysis, Including Statistics. In Handbook of Social Psycholog, Addison-Wesley. pp. 601–720.
Shifley, S.R. (1987): A generalized system of models forecasting Central States growth. USDA For. Serv., Res. Pap. NC-279. 10 p.
Snee, R. D. (1977): Validation of regression models: Methods and examples. Technometrics, 19: 415–428.
Tang W, He H, Tu XM. (2012): Applied categorical and count data analysis. FL: Chapman & Hall/CRC .
Tarp-Johansen, M.J., Skovsgaard, J.P., Madsen, S.F., Johannsen, V.K. and Skovgaard, I. (1996): Compatible stem taper and stem volume functions for oak in Denmark. Annales des Sciences Forestières, in press.
Vanclay, J. K. (1994): Modelling forest growth: Application to mixed tropical forests. CAB International, Wallingford.
Weisberg, S. (1985): Applied Linear Regression, 2nd ed. Wiley, NY, xiv+324 pp.