Research on Outlier Detection for High-Dimensional Data Based on PPCLOF

Authors

  • Chen Chen Department of Military logistics, Army Logistics University of PLA, Chongqing, China
  • Kaiwen Luo Department of Military logistics, Army Logistics University of PLA, Chongqing, China
  • Lan Min College of Management Science, Chengdu University of Technology, Chengdu, China
  • Shenglin Li College of Artificial Intelligence, Southwest University, Chongqing 400715, China

DOI:

https://doi.org/10.13052/jwe1540-9589.2038

Keywords:

Outlier detection, high-dimensional data, PPC, LOF

Abstract

Aiming at the “dimension disaster” problem encountered in the outlier detection of high-dimensional data, this paper uses the projection pursuit algorithm to perform non-linear dimensionality reduction on high-dimensional data by calculating the phase relationship between dimensions. According to the sample points obtained by dimensionality reduction, the LOF (Local Outlier Factor) algorithm is applied to calculate the outlier factor to obtain the relevant outlier data. In order to improve the calculation accuracy and efficiency of the LOF algorithm, clustering method is used to cut the outlier calculation data to reduce the amount of calculation. Experiments on real-world and artificial datasets, compared with the existing algorithms, demonstrated the effectiveness and efficiency of the proposed algorithm.

Downloads

Download data is not yet available.

Author Biographies

Chen Chen, Department of Military logistics, Army Logistics University of PLA, Chongqing, China

Chen Chen is a PH.D. student at the Army Logistics University of PLA since autumn 2017. She attended the Chongqing University, majoring in Software Engineering where she received her B.Sc. in 2014. Chen then went on to purchase an M.SC. in Computer Science and Technology from Logistical Engineering University, Chongqing, China, in 2017. Chen is now mainly focusing on logistics informatization, information management and information system.

Kaiwen Luo, Department of Military logistics, Army Logistics University of PLA, Chongqing, China

Kaiwen Luo is a Associate professor of the Army Logistic University, Chongqing, China. He received Ph.D. degree in Logistics informatization from Logistical Engineering University, Chongqing, China in 2016. His research focuses on logistics informatization and intelligent logistics equipment.

Lan Min, College of Management Science, Chengdu University of Technology, Chengdu, China

Lan Min is a professor of Chengdu University of Technology, Chengdu, China. Her research focuses on Mathematics and Applied Mathematics (Advanced Mathematics Education and Research).

Shenglin Li, College of Artificial Intelligence, Southwest University, Chongqing 400715, China

Shenglin Li received his B.Sc. degrees in Mathematics and M.Sc. degrees in Computer Science and Technology from Southwest China Normal University, Chongqing. And Ph.D. degree in Logistics Informatization from Logistical Engineering University, Chongqing, China. He is a professor of Southwest University. His research focuses on Intelligent Science and Technology, Data Science and Big data Technology.

References

Agovic Amrudin, Banerjee Arindam, Ganguly Auroop, Pro-topopescu Vladimir, ‘Anomaly detection using manifold embedding and its applications in transportation corridors’, Intelligent Data Analysis, vol. 13, no. 3, pp. 435–455, 2009.

Mejia A F, Nebel M B, Eloyan A, et al., ‘PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data’, Biostatistics, vol. 18, no. 3, pp. 521–536, 2017.

Ju F, Sun Y, Gao J, et al., ‘Image outlier detection and feature extraction via L1-norm-based 2D probabilistic PCA’, IEEE Transactions on Image Processing, vol. 24, no. 12, pp. 4834–4846, 2015.

MLA Sheri, Ahmad Muqeem, et al, ‘Background Subtraction using Gaussian-Bernoulli Restricted Boltzmann Machine’, IET Image Processing, vol. 12, no. 9, 2018.

Lin, J., B. Wu, and W. Chen, ‘Adaptive Detection and Preprocessing Method for Abnormal Wind Speed of Wind Farm Based on Deep Boltzmann Machine’, Electrotechnical Society, pp. 205–212, 2018.

Shenglian L, ‘Research of Distance-based Outliers Detection’, Computer Engineering and Applications, vol. 40, no. 33, pp. 73–75,94, 2004.

Chunsheng Li, Shu Yu, Xiaogang Liu, ‘Research on outlier detection algorithm based on improved distance sum’, Computer Technology and Development, vol. 29, no. 3, pp. 97–100, 2019.

Shou Zhaoyu, et al., ‘Outlier detection with enhanced angle-based outlier factor in high-dimensional data stream’, International Journal of Innovative Computing Information and Control, vol. 14, no. 5, pp. 1633–1651, 2018.

Rehage, et al., ‘An angle-based multivariate functional pseudo-depth for shape outlier detection’, Journal of Multivariate Analysis: An International Journal, pp. 325–340, 2016.

Tran L, Fan L, Shahabi C, ‘Distance-based outlier detection in data streams’, Proceedings of the Vldb Endowment, vol. 9, no. 12, pp. 1089–1100, 2016.

Shaikh, Salman Ahmed, and H. Kitagawa, ‘Top-k Outlier Detection from Uncertain Data’, International Journal of Automation & Computing, vol. 11, no. 2, pp. 128–142, 2014.

Liang Shaoyi, Han Deqiang, ‘Outlier detection based on neighborhood chain’, Countrol and Decision, vol. 34, no. 7, pp. 1433–1440, 2019.

Henrion, Marc, et al., ‘CASOS: a subspace method for anomaly detection in high dimensional astronomical databases’, Statistical Analysis & Data Mining the Asa Data Science Journal, vol. 6, no. 1, pp. 53–72, 2013.

Shao J, Wang X, Yang Q, et al., ‘Synchronization-based scalable subspace clustering of high-dimensional data’, Knowledge and Information Systems, vol. 52, no. 1, pp. 83–111, 2017.

Ma, H., Y. Hu, and H. Shi, ‘Fault Detection and Identification Based on the Neighborhood Standardized Local Outlier Factor Method’, Industrial & Engineering Chemistry Research, vol. 52, no. 6, pp. 2389–2402, 2013.

Friedman JH, Tukey JW, ‘A projection pursuit algorithm for exploratory data analysis’, IEEE Transactions on computers, vol. 100, no. 9, pp. 881–890, 1974.

Ni Changjian, Cui Peng, ‘Projection pursuit dynamic clustering model’, Journal of Systems Engineering, vol. 22, no. 6, pp. 634–638, 2007.

Xiong Pin, Lou Wengao, ‘Determination and analysis of reasonable values of key parameters in projection pursuit modeling’, Computer Engineering and Applications, vol. 52, no. 9, pp. 50–55, 2016.

Lou Wengao, Qiao Long, ‘New exploration and empirical research on projection pursuit classification modeling theory’, Mathematical Statistics and Management, vol. 34, no. 1, pp. 47–58, 2015.

Wang Jiayang, Li Zuoyong, ‘Projection pursuit of taboo search optimization and its application in water resources evaluation’, Journal of Chengdu University of Information Technology, pp. 715–718, 2006.

M. M. Breunig, H. P. Kriegel, R. T. Ng, J. Sander, ‘LOF: Identifying Density-based Local Outliers’, SIGMOD, 2000.

Igor Kononenko, Bojan Cestnik. ‘UCI Machine Learning Repository,’ Available: http://archive.ics.uci.edu/ml/datasets/Lymphography.

Ahmed, Mohiuddin, and A. Naser, ‘A novel approach for outlier detection and clustering improvement’, Industrial Electronics & Applications IEEE, 2013.

Jiang S., Li Q., ‘Clustering-Based Outlier Detection Method’, Fifth International Conference on Fuzzy Systems and Knowledge Discovery, pp. 429–433, 2008.

Published

2021-06-09

Issue

Section

Articles