Classifying Agricultural Crop Pest Data Using Hadoop MapReduce Based C5.0 Algorithm
Keywords:Data Mining, Data cleaning, Relief feature selector, MapReduce based C5.0 Classification
Data mining is a methodology of exploring and processing large pre-existing databases in order to find the hidden information. In agriculture sector, data mining can help farmers to develop yield. Crops can be protected from vertebrate pests and diseases by predicting and enhancing crop cultivation through efficient data mining methods. The main aim of this research is to classify agricultural crop pests which are categorized by different colours. This research includes data cleaning, feature selection and execution of C5.0 algorithm using map reduce. Data cleaning has taken away the noisy data in crop pest data that offers improved accuracy. In feature selection, Relief filter is applied for selecting particular attributes of the crop pest data set instead of using full attribute set. It performs choosing attributes by calculating the attribute weights based upon distances. As the size of the pest dataset has attained terabyte range, typical data mining techniques cannot process the big data at logical time. Hadoop MapReduce programming model has been put into practice to compact with huge data set. It is a software framework for distributed processing of large amount of data. This research work proposed MapReduce implementation of C5.0 decision tree algorithm that gives more accurate result rapidly and holding less memory of huge crop pest data set.
Jinubala, V., and Lawrance, R., “Analysis of Missing Data and Imputation on Agriculture Data With Predictive Mean Matching Method”, International Journal of Science and Applied Information Technology, Volume 5, Issue 1, 2016, pp: 01–04.
Rosario, F, S., and Thangadurai, K., “RELIEF: Feature Selection Approach”, International Journal of Innovative Research & Development, Volume 4, Issue 11, October 2015.
Krishna Kumar, V, S., Kiruthika, P., “An Overview of Classification Algorithm in Data Mining”, International Journal of Advanced Research in Computer and Communication Engineering, Volume 4, Issue 12, 2015, pp: 255–257.
Yang, T., HiongNgu, H, A., “Implementation of Decision Tree Using Hadoop Map Reduce”, International Journal of Biomedical Data Mining, Volume 6, Issue 1, 2016, pp: 1–4.
Dai, W., Ji, W., “A MapReduce Implementation of C4.5 Decision Tree Algorithm”, International Journal of Database Theory and Application, Volume 7, Issue 1, 2014, pp: 50–60.
Sutha, S., Tamilselvi, J, J., “A Review of Feature Selection Algorithms for Data Mining Techniques”, International Journal on Computer Science and Engineering, Volume 7, Issue 6, 2015, pp: 62–67.
Hen J. and Kamber M., “Data Mining: Concepts and Techniques”, Second Edition, ELSEVIER Publications, ISBN: 978-81-312-0535-81, 2005.
Bikku, T., Rao, S, N., Akepogu, R, A., “ Hadoop based Feature Selection and Decision Making Models on Big Data”, International Journal of Science and Technology, Volume 9, Issue 10, 2016, pp: 1–6.
Glory, A, H., Nithya, R., Jeyapaul, I, S., “Comparing C4.5 and MST Classifier Using MapReduce”, International Research Journal of Engineering and Technology, Volume 2, Issue 2, 2015, pp: 1–4.
Patil, N., Lathi, R.,Chitre, V., Comparison of C5.0 & CART Classification algorithms using pruning Technique”, International Journal of Engineering Research& Technology, Volume.1, Issue 4, 2012, pp: 1–5.
Prajapati, V., “Big Data Analytics with R and Hadoop”, First Edition 2013.
Verma, S., Badhe, V., “Survey on Big Data and Mining Algorithm”, International Journal of Scientific Research in Science, Engineering and Technology, Volume.2, Issue 2, April 2016, pp: 1338–1344.
Singh, S., and Gupta, P., “Comparative Study Id3, Cart and C4.5 Decision Tree Algorithm: A Survey”, International Journal of Advanced Information Science and Technology, Volume 27, Issue 27, 2014, pp: 97–103.
Revathi, P., Revathi, R., Hemalatha, M., “Comparative Study of Knowledge in Crop Diseases Using Machine Learning Techniques”, International Journal of Computer Science and Information Technologies, Volume 2, Issue 5, 2011, pp: 2180–2182.
HSSINA, B., MERBOUHA, A., EZZIKOURI, H., and ERRITALI, M., “A Comparative study of decision tree ID3 and C4.5”, International Journal of Advanced Computer Science and Applications, Special Issue on Advances in Vehicular Ad Hoc Networking and Applications, 2014, pp: 13–19.
Joshi, K, K., “Indian Agriculture Land through Decision Tree in Data Mining” International Journal of Core Engineering and Management, Volume 1, Issue 5, 2014, pp: 93–103.
Rajeswari, S., Suthendran, K., and Rajakumar, K., “A Smart Agricultural Model by Integrating IoT, Mobile and Cloud-based Big Data Analytics”, International Journal of Pure and Applied Mathematics,Volume 118, pp: 365–369, 2018.