Instantaneous Approach for Evaluating the Initial Centers in the Agricultural Databases Using K-Means Clustering Algorithm

LNC. Prakash K; G. Surya Narayana; Mohd Dilshad Ansari; Vinit Kumar Gunjan

doi:10.13052/jmm1550-4646.1813

Authors

LNC. Prakash K CVR College of Engineering , Hyderabad, India
G. Surya Narayana Vardhman College of Engineering, Shamshabad, Hyderabad, India https://orcid.org/0000-0002-5552-3971
Mohd Dilshad Ansari CMR College of Engineering & Technology, Hyderabad, India https://orcid.org/0000-0002-2637-2975
Vinit Kumar Gunjan CMR Institute of Technology, Hyderabad, India https://orcid.org/0000-0002-3222-4186

DOI:

https://doi.org/10.13052/jmm1550-4646.1813

Keywords:

Data segmentation, clustering, agricultural databases, K-means, random selection of cluster centres, frequency of the attribute values

Abstract

Clustering algorithms are most probably and widely used analysis method for grouping agricultural data with high similarity. For example, one of the most widely used approaches in previous study is K-means, which is simpler, more versatile, and easier to understand and formulate. The only disadvantage of the K-means algorithm has always been that the predetermined set of cluster centres must be prepared ahead of time and provided as feedback. This paper addresses the issue of estimating cluster random centres for data segmentation and proposes a new method for locating appropriate random centres based on the frequency of attribute values. As a consequence of calculating cluster random centres, the number of iterations required to achieve optimum clusters in K-means will be reduced, as will the time required to shape the final clusters. The experimental findings show that our approach is efficient at estimating the right random cluster centres that indicate a fair separation of objects in the given database. The technique observation and comparative test results showed that the new strategy does not use present manual cluster centres, is more efficient in determining the original cluster centres, and therefore more successful in terms of time to converge the actual clusters especially in agricultural data bases.

Downloads

Download data is not yet available.

Author Biographies

LNC. Prakash K, CVR College of Engineering , Hyderabad, India

LNC Prakash, K., awarded doctorate in Computer Science & Engineering from JNTU Hyderabad, A State Government University, Hyderabad, India, He has more than 21 years of Teaching and 10 years of Research experience. He has 10 research publications in reputed journals which are indexed by SCI, SCOPUS and UGC. He guided 13 UG projects and 8 PG projects. He has filed 5 Indian patents, 1 international patent and wrote 1 Book. He has professional memberships of IE. He is currently working as an Associate Professor in the Department of Computer Science and Engineering, CVR college of Engineering, Hyderabad, India.

G. Surya Narayana, Vardhman College of Engineering, Shamshabad, Hyderabad, India

G. Suryanarayana, awarded doctorate in Computer Science & Engineering from JNTUH, Hyderabad, India. He has more than 12 years of Teaching and 7 years of Research experience. He has 22 research publications in reputed journals which are indexed by SCIE, SCOPUS and UGC. He guided 20 UG projects and 10 PG projects. He has filed 6 Indian patents and wrote 2 Books. His research interests are, Data Mining, Artificial Intelligence, Machine Learning.. He is currently working as an Associate Professor, department of CSE, Vardhaman College of Engineering, Hyderabad, India.

Mohd Dilshad Ansari, CMR College of Engineering & Technology, Hyderabad, India

Mohd Dilshad Ansari is currently working as an Assistant Professor in the Department of Computer Science & Engineering at CMR College of Engineering & Technology, Hyderabad, India. He received his M.Tech and Ph.D. in Computer Science & Engineering from Jaypee University of Information Technology, Waknaghat, Solan, HP, India in 2011 and 2018 respectively. His research interest includes Digital & Fuzzy Image Processing, Artificial Intelligence & Machine Learning, IoT and Cloud Computing.

References

LNC.Prakash K, Dr.K.Anuradha “Optimal Feature Selection for multi valued Attributes using Transaction Weights as Utility Scale” proceedings of second international conference on computational intelligence and informatics (ICCII-2017), ISBN 978-981-10-8227-6, ISBN 978-981-10-8228-3, doi.org/10.1007/978-981-10-8228-3.

Hong Yu, Zhanguo Liu, GuoyinWang “An automatic method to determine the number of clusters using decision-theoretic rough set”, International Journal of Approximate Reasoning, 55 pp.101–115, 2014, http://dx.doi.org/10.1016/j.ijar.2013.03.018.

H. Yu, Z.G. Liu, G.Y. Wang, “Automatically determining the number of clusters using decision-theoretic rough set”, in: J. Yao et al. (Eds.), Proceedings of the 6th International Conference on Rough Sets and Knowledge Technology, RSKT’11, LNCS, vol. 6954, 2011,pp.504–513, doi.org/10.1016/j.ijar.2013.03.018

J. Grabmeier, A. Rudolph, “Techniques of cluster algorithms in data mining”, Data Mining and Knowledge Discovery, vol 6 (4), 2002, pp 303–360. DOI: 10.1023/A: 1016308404627.

G. Peters, F. Crespo, P. Lingras, R. Weber, “Soft clustering – fuzzy and rough approaches and their extensions and derivatives”, International Journal of Approximate Reasoning, 2012.Volume 54, Issue 2, 2013, pp 307-322 http://dx.doi.org/10.1016/j.ijar.2012.10.003.

Liang Wang, Christopher Leckie, Kotagiri Ramamohanarao, and James Bezdek,” Automatically Determining the Number of Clusters in Unlabeled Data Sets”, IEEE transactions on knowledge and data engineering, vol. 21, no. 3, 2009, pp 335-350, DOI: 10.1109/TKDE.2008.158.

Narayana G. Surya, and D. Vasumathi. "Clustering for high dimensional categorical data based on text similarity." In Proceedings of the 2nd International Conference on Communication and Information Processing (ICCIP-2016) pp.17-21. 2016. ISBN: 978-1-4503-4819-5

R. Xu and D. Wunsch II, “Survey of Clustering Algorithms,” IEEE Transaction, Neural Networks, vol. 16, no. 3, pp. 645-678, 2005. DOI: 10.1109/TNN.2005.845141

P. Guo, C. Chen, and M. Lyu, “Cluster Number Selection for a Small Set of Samples Using the Bayesian Ying-Yang Model,” IEEE Trans. Neural Networks, vol. 13, no. 3, pp. 757-763, 2002. doi.org/10.1109/TNN.2002.1000144

G. Milligan and M. Cooper, “An Examination of Procedures for Determining the Number of Clusters in a Data Set”,Psychometrika, vol. 50, pp. 159-179, 1985. doi: 10.1007/BF02294245

Chiang M.MT., Mirkin B. “Experiments for the Number of Clusters in K-Means. In: Neves J., Santos M.F., Machado J.M. (eds) Progress in Artificial Intelligence. EPIA 2007. Lecture Notes in Computer Science, vol 4874. Springer, Berlin, Heidelberg doi: doi.org/10.1007/978-3-540-77002-2_33

Dimitriadou, E., Dolničar, S. & Weingessel, A. “An examination of indexes for determining the number of clusters in binary data sets”. Psychometrika 67, pp 137–159 (2002). https://doi.org/10.1007/BF02294713.

N. Otsu, “A Threshold Selection Method from Gray-level, Histograms,” IEEE Trans. Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62-66, 1979. doi: 10.1109/TSMC.1979.4310076.

X.L. Xie, G. Beni, “A validity measure for fuzzy clustering”, IEEE Trans. On Pattern Analysis and Machine Intelligence 13 (8) (1991), pp 841–847. doi: 10.1109/34.85677.

J. Han, M. Kamber, and J. Pei, Data Mining: Concepts and Techniques, 3rd ed. Elsevier Inc, 2011.

Junhui Wang, “Consistent selection of the number of clusters via cross validation”, Biometrika, Volume 97, Issue 4, 2010, pp 893–904, https://doi.org/10.1093/biomet/asq061.

Tibshirani R, Walther G, Hastie T. “Estimating the number of clusters in a dataset via the gap statistic”, [J]. Journal of the Royal Statistical Society, Series B, 2001, 63: 411-423. https://doi.org/10.1111/1467-9868.00293

MacQueen, J.B, Some Methods for classification and Analysis of Multivariate Observations .In: Proceedings of 5-th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, University of California Press, 1,281{297, 1967.

Ucirvine machine learning repository: http://archive.ics.uci.edu/ml/.

S. K. Khan and A. Ahmad. “Cluster center initialization algorithm for k means clustering. Pattern Recognition Letters”, 25:1293–1302, 2004. DOI: 10.1016/j.patrec.2004.04.007.

Sangita O., Dhanamma J. (2011) ‘An Improved K-Means Clustering Approach for Teaching Evaluation”. In: Unnikrishnan S., Surve S., Bhoir D. (eds) Advances in Computing, Communication and Control. ICAC3 2011. Communications in Computer and Information Science, vol 125. Springer, Berlin, Heidelber, doi:https://doi.org/ 10.1007/978-3-642-18440-6_13.

Pen~a, J.M., Lozano, J.A., Larra~naga, P., 1999. “An empirical comparison of four initialization methods for the K-means algorithm”. Pattern Recognition Lett. 20, 1027– 1040. doi.org/10.1016/S0167-8655(99)00069-0.

Shehroz S. Khan, Amir Ahmad, “Cluster center initialization algorithm for K-means clustering”, Pattern Recognition Letters 25 (2004), pp 1293–1302. doi:10.1016/j.patrec.2004.04.007.

YANG Zhengwu, HUO Hong, FANG Tao, “Automatically Finding the Number of Clusters Based on Simulated Annealing”, J. Shanghai Jiao Tong Univ. (Sci.), 2017, 22(2): 139-147 doi: 10.1007/s12204-017-1813-9.

Ping Guo, C. L. Philip Chen, Michael R. Lyu, “Cluster Number Selection for a Small Set of Samples Using the Bayesian Ying–Yang Model”, IEEE Transactions on Neural Networks Volume: 13 , Issue:3 , 2002, pp 757 –763, doi: 10.1109/TNN.2002.1000144

G. Peters, F. Crespo, P. Lingras, R. Weber, “Soft clustering – fuzzy and rough approaches and their extensions and derivatives”, International Journal of Approximate Reasoning,2012. dx.doi.org/10.1016/j.ijar.2012 .10.003.

M. Ester, H. Kriegel, J. Sander, X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise”, in: E. Simoudis, J. Han, U.M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, KDD’96, 1996, pp. 226–231.

I. Foster, T. Freeman, K. Keahey, D. Scheftner, B. Sotomayor, X. Zhang, “Virtual clusters for grid communities”, in: Sixth IEEE International Symposium on Cluster Computing and the Grid, CCGRID’06, 2006, pp. 513–520. doi: 10.1109/CCGRID.2006.108.

Still, S., Bialek, W. 2004. “How many clusters? An information-theoretic perspective. Neural Computation, 16(12), 2483-2506. doi.org/10.1162/ 0899766042321751

Kapil, S., Chawla, M., & Ansari, M. D. (2016, December). On K-means data clustering algorithm with genetic algorithm. In 2016 Fourth International Conference on Parallel, Distributed and Grid Computing (PDGC) (pp. 202-206). IEEE.

Agarwal, M., Bohat, V. K., Ansari, M. D., Sinha, A., Gupta, S. K., & Garg, D. (2019, December). A convolution neural network based approach to detect the disease in corn crop. In 2019 IEEE 9th international conference on advanced computing (IACC) (pp. 176-181). IEEE.

Instantaneous Approach for Evaluating the Initial Centers in the Agricultural Databases Using K-Means Clustering Algorithm

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

LNC. Prakash K, CVR College of Engineering , Hyderabad, India

G. Surya Narayana, Vardhman College of Engineering, Shamshabad, Hyderabad, India

Mohd Dilshad Ansari, CMR College of Engineering & Technology, Hyderabad, India

References

Downloads

Published

How to Cite

Issue

Section

IEEE Xplore

interview

splissue

award

2020 Best Paper Award

issn

cover

Open Access

Make a Submission

subreq

indexed

logo