Comparison of Machine Learning Based on Category Theory
DOI:
https://doi.org/10.13052/jwe1540-9589.2213Keywords:
Web engineering, Big data, Machine Learning, preprocessing, category theory, AccuracyAbstract
In recent years, machine learning has been widely used in data analysis of network engineering. The increasing types of model and data enhance the complexity of machine learning. In this paper, we propose a mathematical structure based on category theory as a combination of machine learning that combines multiple theories of data mining. We aim to study machine learning from the perspective of classification theory. Category theory utilizes mathematical language to connect the various structures of machine learning. We implement the representation of machine learning with category theory. In the experimental section, slice categories and functors are introduced in detail to model the data preprocessing. We use functors to preprocess the benchmark dataset and evaluate the accuracy of nine machine learning models. A key contribution is the representation of slice categories. This study provides a structural perspective of machine learning and a general method for the combination of category theory and machine learning.
Downloads
References
M. Hofmann, R. Klinkenberg, RapidMiner: Data Mining use Cases and Business Analytics Applications. CRC Press, 2016.
S. Ray, “A quick review of machine learning algorithms,” in 2019 International conference on machine learning, big data, cloud and parallel computing (COMITCon). IEEE, 2019, pp. 35–39.
X. Li, H. Liu, W. Wang, Y. Zheng, H. Lv, Z. Lv, “Big data analysis of the internet of things in the digital twins of smart city based on deep learning,” Future Generation Computer Systems, vol. 128, pp. 167–177, 2022.
I. Y, Ko, A. Srivastava, M. Mrissa, “Scalable and dynamic big data processing and service provision in edge cloud environments,” Journal of Web Engineering, vol. 21, no. 1, 2022.
S. Nickolas, K. Shobha, “Efficient pre-processing techniques for improving classifiers performance,” Journal of Web Engineering, vol. 21, no. 2, 2022.
N. Deepa, Q.-V. Pham, D. C. Nguyen, S. Bhattacharya, B. Prabadevi, T. R. Gadekallu, P. K. R. Maddikunta, F. Fang, P. N. Pathirana, “A survey on blockchain for big data: Approaches, opportunities, and future directions,” Future Generation Computer Systems, Vol. 131, pp. 209–226, 2022.
G. Alves, M. Amblard, F. Bernier, M. Couceiro, A. Napoli, “Reducing unintended bias of ml models on tabular and textual data,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2021, pp. 1–10.
C. Zhu, Q. Zhang, L. Cao, A. Abrahamyan, “Mix2vec: Unsupervised mixed data representation,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2020, pp. 118–127.
M. Hofmann, R. Klinkenberg, RapidMiner: Data Mining Use Cases and Business Analytics Applications. CRC Press, 2016.
W. Yi, I. V. Gerasimov, S. A. Kuzmin, H. He, “A category theory approach to phytometric system conceptual modeling,” in 2018 IEEE Conference of Russian Young Researchers in Electrical and Electronic Engineering (EIConRus). IEEE, 2018, pp. 391–393.
W. Lu, X. Jiang, X. Liu, Q. Qi, P. Scott, “Modeling the integration between specifications and verification for cylindricity based on category theory,” Measurement Science and Technology, vol. 21, no. 11, p. 115107, 2010.
J. Culbertson, K. Sturtz, “Bayesian machine learning via category theory,” arXiv preprint arXiv:1312.1445, 2013.
G. S. Cruttwell, B. Gavranovic´, N. Ghani, P. Wilson, F. Zanasi, “Categorical foundations of gradient-based learning,” in European Symposium on Programming. Cham: Springer, 2022, pp. 1–28.
K. Kamiya, J. Welliaveetil, “A category theory framework for Bayesian learning,” arXiv preprint arXiv:2111.14293, 2021.
D. Shiebler, B. Gavranoviæ, P. Wilson, “Category theory in machine learning,” arXiv preprint arXiv:2106.07032, 2021.
D. W. Hosmer Jr, S. Lemeshow, R. X. Sturdivant, Applied Logistic Regression. John Wiley & Sons, 2013, vol. 398.
J. Chen, Y. Yin, L. Han, F. Zhao, “Optimization approaches for parameters of svm,” in Proceedings of the 11th International Conference on Modelling, Identification and Control (ICMIC2019). Springer, 2020, pp. 575–583.
C. Kennedy and J. Griffith, “Using markup language to differentiate between reliable and unreliable news,” in 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 2020, pp. 619–625.
H. Guan, Y. Zhang, B. Ma, J. Li, C. Wang, “A generalized optimization embedded framework of undersampling ensembles for imbalanced classification,” in 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), 2021, pp. 1–10.
R. S. Khairy, A. S. Hussein, H. ALRikabi, “The detection of counterfeit banknotes using ensemble learning techniques of adaboost and voting,” International Journal of Intelligent Engineering and Systems, vol. 14, no. 1, pp. 326–339, 2021.
M. M. Ali, R. Anwar, A. F. Yousef, B. Li, A. Luvisi, L. D. Bellis, A. Aprile, F. Chen, “Influence of bagging on the development and quality of fruits,” Plants, vol. 10, no. 2, p. 358, 2021.
T. Zhang, W. He, H. Zheng, Y. Cui, H. Song, and S. Fu, “Satellite-based ground PM2.5
estimation u sing a gradient boosting decision tree,” Chemosphere, vol. 268, p. 128801, 2021.
T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, pp. 785–794.
J. A. Goguen, “A categorical manifesto,” Mathematical Structures in Computer Science, vol. 1, no. 1, pp. 49–67, 1991.
N. Tsuchiya, S. Taguchi, H. Saigo, “Using category theory to assess the relationship between consciousness and integrated information theory,” Neuroscience Research, vol. 107, pp. 1–7, 2016.
A. Fernández, S. Garcia, F. Herrera, N. V. Chawla, “SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary,” Journal Of Artificial Intelligence Research, vol. 61, pp. 863–905, 2018.
J. de la Calleja, O. Fuentes, “A distance-based over-sampling method for learning from imbalanced data sets,” in Proceedings of the Twentieth International Florida Artificial Intelligence, vol. 3, 2007, pp. 634–635.
Y. Dong, X. Wang, “A new over-sampling approach: Random-SMOTE for learning from imbalanced data sets,” in Knowledge Science, Engineering and Management, eds H. Xiong and W. B. Lee, Berlin, Heidelberg: Springer, 2011, pp. 343–352.
H. Lee, J. Kim, S. Kim, “Gaussian-based smote algorithm for solving skewed class distributions,” Int. J. Fuzzy Logic and Intelligent Systems, vol. 17, pp. 229–234, 2017.
R. Eberhart, J. Kennedy, “Particle swarm optimization,” in Proceedings of the IEEE International Conference on Neural Networks, vol. 4. Citeseer, 1995, pp. 1942–1948.
C. Wang, C. Deng, S. Wang, “Imbalance-xgboost: Leveraging weighted and focal losses for binary label-imbalanced classification with xgboost,” Pattern Recognition Letters, vol. 136, pp. 190–197, 2020.