Instantaneous Approach for Evaluating the Initial Centers in the Agricultural Databases Using K-Means Clustering Algorithm
DOI:
https://doi.org/10.13052/jmm1550-4646.1813Keywords:
Data segmentation, clustering, agricultural databases, K-means, random selection of cluster centres, frequency of the attribute valuesAbstract
Clustering algorithms are most probably and widely used analysis method for grouping agricultural data with high similarity. For example, one of the most widely used approaches in previous study is K-means, which is simpler, more versatile, and easier to understand and formulate. The only disadvantage of the K-means algorithm has always been that the predetermined set of cluster centres must be prepared ahead of time and provided as feedback. This paper addresses the issue of estimating cluster random centres for data segmentation and proposes a new method for locating appropriate random centres based on the frequency of attribute values. As a consequence of calculating cluster random centres, the number of iterations required to achieve optimum clusters in K-means will be reduced, as will the time required to shape the final clusters. The experimental findings show that our approach is efficient at estimating the right random cluster centres that indicate a fair separation of objects in the given database. The technique observation and comparative test results showed that the new strategy does not use present manual cluster centres, is more efficient in determining the original cluster centres, and therefore more successful in terms of time to converge the actual clusters especially in agricultural data bases.
Downloads
References
LNC.Prakash K, Dr.K.Anuradha “Optimal Feature Selection for multi valued Attributes using Transaction Weights as Utility Scale” proceedings of second international conference on computational intelligence and informatics (ICCII-2017), ISBN 978-981-10-8227-6, ISBN 978-981-10-8228-3, doi.org/10.1007/978-981-10-8228-3.
Hong Yu, Zhanguo Liu, GuoyinWang “An automatic method to determine the number of clusters using decision-theoretic rough set”, International Journal of Approximate Reasoning, 55 pp.101–115, 2014, http://dx.doi.org/10.1016/j.ijar.2013.03.018.
H. Yu, Z.G. Liu, G.Y. Wang, “Automatically determining the number of clusters using decision-theoretic rough set”, in: J. Yao et al. (Eds.), Proceedings of the 6th International Conference on Rough Sets and Knowledge Technology, RSKT’11, LNCS, vol. 6954, 2011,pp.504–513, doi.org/10.1016/j.ijar.2013.03.018
J. Grabmeier, A. Rudolph, “Techniques of cluster algorithms in data mining”, Data Mining and Knowledge Discovery, vol 6 (4), 2002, pp 303–360. DOI: 10.1023/A: 1016308404627.
G. Peters, F. Crespo, P. Lingras, R. Weber, “Soft clustering – fuzzy and rough approaches and their extensions and derivatives”, International Journal of Approximate Reasoning, 2012.Volume 54, Issue 2, 2013, pp 307-322 http://dx.doi.org/10.1016/j.ijar.2012.10.003.
Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao, and James Bezdek,” Automatically Determining the Number of Clusters in Unlabeled Data Sets”, IEEE transactions on knowledge and data engineering, vol. 21, no. 3, 2009, pp 335-350, DOI: 10.1109/TKDE.2008.158.
Narayana G. Surya, and D. Vasumathi. "Clustering for high dimensional categorical data based on text similarity." In Proceedings of the 2nd International Conference on Communication and Information Processing (ICCIP-2016) pp.17-21. 2016. ISBN: 978-1-4503-4819-5
R. Xu and D. Wunsch II, “Survey of Clustering Algorithms,” IEEE Transaction, Neural Networks, vol. 16, no. 3, pp. 645-678, 2005. DOI: 10.1109/TNN.2005.845141
P. Guo, C. Chen, and M. Lyu, “Cluster Number Selection for a Small Set of Samples Using the Bayesian Ying-Yang Model,” IEEE Trans. Neural Networks, vol. 13, no. 3, pp. 757-763, 2002. doi.org/10.1109/TNN.2002.1000144
G. Milligan and M. Cooper, “An Examination of Procedures for Determining the Number of Clusters in a Data Set”,Psychometrika, vol. 50, pp. 159-179, 1985. doi: 10.1007/BF02294245
Chiang M.MT., Mirkin B. “Experiments for the Number of Clusters in K-Means. In: Neves J., Santos M.F., Machado J.M. (eds) Progress in Artificial Intelligence. EPIA 2007. Lecture Notes in Computer Science, vol 4874. Springer, Berlin, Heidelberg doi: doi.org/10.1007/978-3-540-77002-2_33
Dimitriadou, E., Dolničar, S. & Weingessel, A. “An examination of indexes for determining the number of clusters in binary data sets”. Psychometrika 67, pp 137–159 (2002). https://doi.org/10.1007/BF02294713.
N. Otsu, “A Threshold Selection Method from Gray-level, Histograms,” IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979. doi: 10.1109/TSMC.1979.4310076.
X.L. Xie, G. Beni, “A validity measure for fuzzy clustering”, IEEE Trans. On Pattern Analysis and Machine Intelligence 13 (8) (1991), pp 841–847. doi: 10.1109/34.85677.
J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Elsevier Inc, 2011.
Junhui Wang, “Consistent selection of the number of clusters via cross validation”, Biometrika, Volume 97, Issue 4, 2010, pp 893–904, https://doi.org/10.1093/biomet/asq061.
Tibshirani R, Walther G, Hastie T. “Estimating the number of clusters in a dataset via the gap statistic”, [J]. Journal of the Royal Statistical Society, Series B, 2001, 63: 411-423. https://doi.org/10.1111/1467-9868.00293
MacQueen, J.B, Some Methods for classification and Analysis of Multivariate Observations .In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1,281{297, 1967.
Ucirvine machine learning repository: http://archive.ics.uci.edu/ml/.
S. K. Khan and A. Ahmad. “Cluster center initialization algorithm for k means clustering. Pattern Recognition Letters”, 25:1293–1302, 2004. DOI: 10.1016/j.patrec.2004.04.007.
Sangita O., Dhanamma J. (2011) ‘An Improved K-Means Clustering Approach for Teaching Evaluation”. In: Unnikrishnan S., Surve S., Bhoir D. (eds) Advances in Computing, Communication and Control. ICAC3 2011. Communications in Computer and Information Science, vol 125. Springer, Berlin, Heidelber, doi:https://doi.org/ 10.1007/978-3-642-18440-6_13.
Pen~a, J.M., Lozano, J.A., Larra~naga, P., 1999. “An empirical comparison of four initialization methods for the K-means algorithm”. Pattern Recognition Lett. 20, 1027– 1040. doi.org/10.1016/S0167-8655(99)00069-0.
Shehroz S. Khan, Amir Ahmad, “Cluster center initialization algorithm for K-means clustering”, Pattern Recognition Letters 25 (2004), pp 1293–1302. doi:10.1016/j.patrec.2004.04.007.
YANG Zhengwu, HUO Hong, FANG Tao, “Automatically Finding the Number of Clusters Based on Simulated Annealing”, J. Shanghai Jiao Tong Univ. (Sci.), 2017, 22(2): 139-147 doi: 10.1007/s12204-017-1813-9.
Ping Guo, C. L. Philip Chen, Michael R. Lyu, “Cluster Number Selection for a Small Set of Samples Using the Bayesian Ying–Yang Model”, IEEE Transactions on Neural Networks Volume: 13 , Issue:3 , 2002, pp 757 –763, doi: 10.1109/TNN.2002.1000144
G. Peters, F. Crespo, P. Lingras, R. Weber, “Soft clustering – fuzzy and rough approaches and their extensions and derivatives”, International Journal of Approximate Reasoning,2012. dx.doi.org/10.1016/j.ijar.2012 .10.003.
M. Ester, H. Kriegel, J. Sander, X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, in: E. Simoudis, J. Han, U.M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, 1996, pp. 226–231.
I. Foster, T. Freeman, K. Keahey, D. Scheftner, B. Sotomayor, X. Zhang, “Virtual clusters for grid communities”, in: Sixth IEEE International Symposium on Cluster Computing and the Grid, CCGRID’06, 2006, pp. 513–520. doi: 10.1109/CCGRID.2006.108.
Still, S., Bialek, W. 2004. “How many clusters? An information-theoretic perspective. Neural Computation, 16(12), 2483-2506. doi.org/10.1162/ 0899766042321751
Kapil, S., Chawla, M., & Ansari, M. D. (2016, December). On K-means data clustering algorithm with genetic algorithm. In 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 202-206). IEEE.
Agarwal, M., Bohat, V. K., Ansari, M. D., Sinha, A., Gupta, S. K., & Garg, D. (2019, December). A convolution neural network based approach to detect the disease in corn crop. In 2019 IEEE 9th international conference on advanced computing (IACC) (pp. 176-181). IEEE.