Research on Outlier Detection for High-Dimensional Data Based on PPCLOF
DOI:
https://doi.org/10.13052/jwe1540-9589.2038Keywords:
Outlier detection, high-dimensional data, PPC, LOFAbstract
Aiming at the “dimension disaster” problem encountered in the outlier detection of high-dimensional data, this paper uses the projection pursuit algorithm to perform non-linear dimensionality reduction on high-dimensional data by calculating the phase relationship between dimensions. According to the sample points obtained by dimensionality reduction, the LOF (Local Outlier Factor) algorithm is applied to calculate the outlier factor to obtain the relevant outlier data. In order to improve the calculation accuracy and efficiency of the LOF algorithm, clustering method is used to cut the outlier calculation data to reduce the amount of calculation. Experiments on real-world and artificial datasets, compared with the existing algorithms, demonstrated the effectiveness and efficiency of the proposed algorithm.
Downloads
References
Agovic Amrudin, Banerjee Arindam, Ganguly Auroop, Pro-topopescu Vladimir, ‘Anomaly detection using manifold embedding and its applications in transportation corridors’, Intelligent Data Analysis, vol. 13, no. 3, pp. 435–455, 2009.
Mejia A F, Nebel M B, Eloyan A, et al., ‘PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data’, Biostatistics, vol. 18, no. 3, pp. 521–536, 2017.
Ju F, Sun Y, Gao J, et al., ‘Image outlier detection and feature extraction via L1-norm-based 2D probabilistic PCA’, IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 4834–4846, 2015.
MLA Sheri, Ahmad Muqeem, et al, ‘Background Subtraction using Gaussian-Bernoulli Restricted Boltzmann Machine’, IET Image Processing, vol. 12, no. 9, 2018.
Lin, J., B. Wu, and W. Chen, ‘Adaptive Detection and Preprocessing Method for Abnormal Wind Speed of Wind Farm Based on Deep Boltzmann Machine’, Electrotechnical Society, pp. 205–212, 2018.
Shenglian L, ‘Research of Distance-based Outliers Detection’, Computer Engineering and Applications, vol. 40, no. 33, pp. 73–75,94, 2004.
Chunsheng Li, Shu Yu, Xiaogang Liu, ‘Research on outlier detection algorithm based on improved distance sum’, Computer Technology and Development, vol. 29, no. 3, pp. 97–100, 2019.
Shou Zhaoyu, et al., ‘Outlier detection with enhanced angle-based outlier factor in high-dimensional data stream’, International Journal of Innovative Computing Information and Control, vol. 14, no. 5, pp. 1633–1651, 2018.
Rehage, et al., ‘An angle-based multivariate functional pseudo-depth for shape outlier detection’, Journal of Multivariate Analysis: An International Journal, pp. 325–340, 2016.
Tran L, Fan L, Shahabi C, ‘Distance-based outlier detection in data streams’, Proceedings of the Vldb Endowment, vol. 9, no. 12, pp. 1089–1100, 2016.
Shaikh, Salman Ahmed, and H. Kitagawa, ‘Top-k Outlier Detection from Uncertain Data’, International Journal of Automation & Computing, vol. 11, no. 2, pp. 128–142, 2014.
Liang Shaoyi, Han Deqiang, ‘Outlier detection based on neighborhood chain’, Countrol and Decision, vol. 34, no. 7, pp. 1433–1440, 2019.
Henrion, Marc, et al., ‘CASOS: a subspace method for anomaly detection in high dimensional astronomical databases’, Statistical Analysis & Data Mining the Asa Data Science Journal, vol. 6, no. 1, pp. 53–72, 2013.
Shao J, Wang X, Yang Q, et al., ‘Synchronization-based scalable subspace clustering of high-dimensional data’, Knowledge and Information Systems, vol. 52, no. 1, pp. 83–111, 2017.
Ma, H., Y. Hu, and H. Shi, ‘Fault Detection and Identification Based on the Neighborhood Standardized Local Outlier Factor Method’, Industrial & Engineering Chemistry Research, vol. 52, no. 6, pp. 2389–2402, 2013.
Friedman JH, Tukey JW, ‘A projection pursuit algorithm for exploratory data analysis’, IEEE Transactions on computers, vol. 100, no. 9, pp. 881–890, 1974.
Ni Changjian, Cui Peng, ‘Projection pursuit dynamic clustering model’, Journal of Systems Engineering, vol. 22, no. 6, pp. 634–638, 2007.
Xiong Pin, Lou Wengao, ‘Determination and analysis of reasonable values of key parameters in projection pursuit modeling’, Computer Engineering and Applications, vol. 52, no. 9, pp. 50–55, 2016.
Lou Wengao, Qiao Long, ‘New exploration and empirical research on projection pursuit classification modeling theory’, Mathematical Statistics and Management, vol. 34, no. 1, pp. 47–58, 2015.
Wang Jiayang, Li Zuoyong, ‘Projection pursuit of taboo search optimization and its application in water resources evaluation’, Journal of Chengdu University of Information Technology, pp. 715–718, 2006.
M. M. Breunig, H. P. Kriegel, R. T. Ng, J. Sander, ‘LOF: Identifying Density-based Local Outliers’, SIGMOD, 2000.
Igor Kononenko, Bojan Cestnik. ‘UCI Machine Learning Repository,’ Available: http://archive.ics.uci.edu/ml/datasets/Lymphography.
Ahmed, Mohiuddin, and A. Naser, ‘A novel approach for outlier detection and clustering improvement’, Industrial Electronics & Applications IEEE, 2013.
Jiang S., Li Q., ‘Clustering-Based Outlier Detection Method’, Fifth International Conference on Fuzzy Systems and Knowledge Discovery, pp. 429–433, 2008.