Classifying Agricultural Crop Pest Data Using Hadoop MapReduce Based C5.0 Algorithm


  • R. Revathy Department of Computer Applications, Kalasalingam University, Krishnankoil-626126, Tamil Nadu, India
  • S. Balamurali Department of Computer Applications, Kalasalingam University, Krishnankoil-626126, Tamil Nadu, India
  • R. Lawrance Department of Computer Applications, Ayya Nadar Janaki Ammal College, Sivakasi-626124, Tamil Nadu, India



Data Mining, Data cleaning, Relief feature selector, MapReduce based C5.0 Classification


Data mining is a methodology of exploring and processing large pre-existing databases in order to find the hidden information. In agriculture sector, data mining can help farmers to develop yield. Crops can be protected from vertebrate pests and diseases by predicting and enhancing crop cultivation through efficient data mining methods. The main aim of this research is to classify agricultural crop pests which are categorized by different colours. This research includes data cleaning, feature selection and execution of C5.0 algorithm using map reduce. Data cleaning has taken away the noisy data in crop pest data that offers improved accuracy. In feature selection, Relief filter is applied for selecting particular attributes of the crop pest data set instead of using full attribute set. It performs choosing attributes by calculating the attribute weights based upon distances. As the size of the pest dataset has attained terabyte range, typical data mining techniques cannot process the big data at logical time. Hadoop MapReduce programming model has been put into practice to compact with huge data set. It is a software framework for distributed processing of large amount of data. This research work proposed MapReduce implementation of C5.0 decision tree algorithm that gives more accurate result rapidly and holding less memory of huge crop pest data set.



Download data is not yet available.

Author Biographies

R. Revathy, Department of Computer Applications, Kalasalingam University, Krishnankoil-626126, Tamil Nadu, India

R. Revathy received B.Sc., M.Sc. and M.Phil. in computer science from Madurai Kamarajar University, Tamil Nadu. She is pursuing Ph.D. in the department of computer applications at Kalsaslingam University, Krishnankoil, Tamil Nadu since 2017. Her current research interests include data mining and machine learning algorithms.

S. Balamurali, Department of Computer Applications, Kalasalingam University, Krishnankoil-626126, Tamil Nadu, India

S. Balamurali is a Professor of Statistics and Director of Computer Applications at the Kalasalingam Academy of Research and Education. He received his undergraduate, postgraduate and doctoral degrees in Statistics from Bharathiar University, India. His research interests include applied statistics, data mining, network security and bioinformatics.

R. Lawrance, Department of Computer Applications, Ayya Nadar Janaki Ammal College, Sivakasi-626124, Tamil Nadu, India

R. Lawrance has received B.Sc. & M.Sc. degree in Computer Science from St. Joseph’s College, Trichy in 1993 & 1998, M.Phil. Computer Science from M.S. University in 2003 and Ph.D. degree from the Vinayaka Missons University in 2011. He has joined Ayya Nadar Janaki Ammal College since 1998 as an assistant Professor. From 2011 onwards, he has been working as a Director in the Department of Computer Applications. His current research interest lies in data mining and machine learning Algorithms. He has produced 24 M.Phil. Scholars and one Ph.D. Scholar and guiding for 7 Ph.D. Scholars. He has published 25 National level conferences, 36 International level conferences and 8 International level Journals.


Jinubala, V., and Lawrance, R., “Analysis of Missing Data and Imputation on Agriculture Data With Predictive Mean Matching Method”, International Journal of Science and Applied Information Technology, Volume 5, Issue 1, 2016, pp: 01–04.

Rosario, F, S., and Thangadurai, K., “RELIEF: Feature Selection Approach”, International Journal of Innovative Research & Development, Volume 4, Issue 11, October 2015.

Krishna Kumar, V, S., Kiruthika, P., “An Overview of Classification Algorithm in Data Mining”, International Journal of Advanced Research in Computer and Communication Engineering, Volume 4, Issue 12, 2015, pp: 255–257.

Yang, T., HiongNgu, H, A., “Implementation of Decision Tree Using Hadoop Map Reduce”, International Journal of Biomedical Data Mining, Volume 6, Issue 1, 2016, pp: 1–4.

Dai, W., Ji, W., “A MapReduce Implementation of C4.5 Decision Tree Algorithm”, International Journal of Database Theory and Application, Volume 7, Issue 1, 2014, pp: 50–60.

Sutha, S., Tamilselvi, J, J., “A Review of Feature Selection Algorithms for Data Mining Techniques”, International Journal on Computer Science and Engineering, Volume 7, Issue 6, 2015, pp: 62–67.

Hen J. and Kamber M., “Data Mining: Concepts and Techniques”, Second Edition, ELSEVIER Publications, ISBN: 978-81-312-0535-81, 2005.

Bikku, T., Rao, S, N., Akepogu, R, A., “ Hadoop based Feature Selection and Decision Making Models on Big Data”, International Journal of Science and Technology, Volume 9, Issue 10, 2016, pp: 1–6.

Glory, A, H., Nithya, R., Jeyapaul, I, S., “Comparing C4.5 and MST Classifier Using MapReduce”, International Research Journal of Engineering and Technology, Volume 2, Issue 2, 2015, pp: 1–4.

Patil, N., Lathi, R.,Chitre, V., Comparison of C5.0 & CART Classification algorithms using pruning Technique”, International Journal of Engineering Research& Technology, Volume.1, Issue 4, 2012, pp: 1–5.

Prajapati, V., “Big Data Analytics with R and Hadoop”, First Edition 2013.

Verma, S., Badhe, V., “Survey on Big Data and Mining Algorithm”, International Journal of Scientific Research in Science, Engineering and Technology, Volume.2, Issue 2, April 2016, pp: 1338–1344.

Singh, S., and Gupta, P., “Comparative Study Id3, Cart and C4.5 Decision Tree Algorithm: A Survey”, International Journal of Advanced Information Science and Technology, Volume 27, Issue 27, 2014, pp: 97–103.

Revathi, P., Revathi, R., Hemalatha, M., “Comparative Study of Knowledge in Crop Diseases Using Machine Learning Techniques”, International Journal of Computer Science and Information Technologies, Volume 2, Issue 5, 2011, pp: 2180–2182.

HSSINA, B., MERBOUHA, A., EZZIKOURI, H., and ERRITALI, M., “A Comparative study of decision tree ID3 and C4.5”, International Journal of Advanced Computer Science and Applications, Special Issue on Advances in Vehicular Ad Hoc Networking and Applications, 2014, pp: 13–19.

Joshi, K, K., “Indian Agriculture Land through Decision Tree in Data Mining” International Journal of Core Engineering and Management, Volume 1, Issue 5, 2014, pp: 93–103.

Rajeswari, S., Suthendran, K., and Rajakumar, K., “A Smart Agricultural Model by Integrating IoT, Mobile and Cloud-based Big Data Analytics”, International Journal of Pure and Applied Mathematics,Volume 118, pp: 365–369, 2018.