Prediction of Wheat Yield Through Soil Nutrient: Machine Learning and Feature Selection Approaches

Authors

  • Afshan Tabassum Department of Statistics and Computer Science, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India
  • Manish Sharma Department of Statistics and Computer Science, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India
  • Bupesh Kumar Department of Plant Breeding and Genetics, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India
  • Sudhakar Dwivedi Department of Agricultural Economics and ABM, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India
  • Lalit Mohan Gupta Department of Forest Products and Utilization, Faculty of Horticulture and forestry, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India
  • Sanjay Guleria Department of Biochemistry, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India

DOI:

https://doi.org/10.13052/jrss0974-8024.19110

Keywords:

Wheat yield, soil parameters, machine learning, feature selection, random forest

Abstract

Wheat productivity is greatly influenced by soil nutrient variability, especially in areas with agroclimatic diversity like Jammu. The study analyzed a comprehensive dataset comprising 5,196 soil samples and corresponding wheat yield records from the districts of Jammu, Rajouri, and Kathua in Jammu region. This work predicted wheat yield from soil factors using machine learning (ML) models. The ML models used include Random Forest, Gradient Boosting, Support Vector Regression, and Decision Trees, in conjunction with embedded and wrapper-based feature selection methods. The soil variables analyzed in this study included pH, EC, OC, N, P, K, S, Cu, Zn, Mn, and Fe. Among the tested machine learning models, Random Forest yielded the highest predictive accuracy, with RMSE = 2.6570, MAE = 2.1578, and MAPE = 44.87%. Recursive Feature Elimination identified an optimal subset of 10 soil predictors, with S, Mn, Zn, and EC emerging as the most influential variables for wheat yield estimation. In all models, sulfur (S), manganese (Mn), electrical conductivity (EC), and zinc (Zn) were consistently found to be the most significant predictors. In comparison to other models, Random Forest and Support Vector Machines generated more reliable and broadly applicable predictions, according to stability study using k-fold cross-validation. The study highlights the effectiveness of machine learning techniques, particularly Random Forest, in predicting wheat yield from soil parameters. The consistent importance of micronutrients like S, Mn, and Zn underscores the need for micronutrient-focused soil management strategies. These findings demonstrate the usefulness of data-driven approaches in heterogeneous soil and climatic conditions.

Downloads

Download data is not yet available.

Author Biographies

Afshan Tabassum, Department of Statistics and Computer Science, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India

Afshan Tabassum is presently pursuing Ph.D. in Statistics at the Division of Statistics and Computer Sciences, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (SKUAST-Jammu), J&K, India. She completed her B.Sc. from Sher-e-Kashmir University of Agricultural Sciences and Technology of Kashmir (SKUAST-Kashmir) and obtained her M.Sc. in Statistics from Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (SKUAST-Jammu). Her research interests include statistical modeling, machine learning, and data analysis in agriculture. Her doctoral research is focused on “Machine Learning Methods and Feature Selection Techniques for Estimation of Wheat Production of Jammu Region.” Her work emphasizes the application of advanced statistical and machine learning techniques to improve prediction accuracy and identify key factors influencing wheat production.

Manish Sharma, Department of Statistics and Computer Science, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India

Manish Sharma is working as Professor and Head in the Division of Statistics and Computer Sciences, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (SKUAST-Jammu), J&K, India. He obtained his M.Sc. degree as a Gold Medalist and received fellowships during his M.Sc. program, including the IDP-NAHEP Fellowship and the Eurostat Fellowship. Dr. Sharma has more than 25 years of experience in teaching, research, and academic administration. His area of specialization is Sampling and Statistical Modeling. He has presented several research papers in national and international conferences, delivered numerous lectures and invited talks, and has also organized training programs and conferences. He has guided several M.Sc. and Ph.D. students as Major Advisor and has been actively involved in various administrative responsibilities at SKUAST-Jammu. Dr. Sharma also serves as a reviewer and executive member of statistics journals and has undertaken international academic visits to the University of Dodoma, Tanzania, and the University of Calabria, Italy.

Bupesh Kumar, Department of Plant Breeding and Genetics, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India

Bupesh Kumar is working as Professor in the Division of Plant Breeding and Genetics, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (SKUAST-Jammu), Chatha, J&K, India. He has more than 16 years of professional experience in teaching, research, and extension since June 2007. Dr. Kumar specializes in Cereal Breeding, and his research interests focus on the genetic enhancement of cereal crops through conventional and biotechnological approaches. He has published more than 55 research papers in referred journals and has been actively involved in several research projects as Co-Principal Investigator, including projects on molecular analysis of Basmati rice germplasm purity and evaluation of high-yielding farmer varieties in the Jammu region. He has guided postgraduate and doctoral students and has actively contributed to teaching several core courses in genetics and plant breeding. His research contributions include association with the development of cereal crop varieties, evaluation of rice hybrids, breeder seed production, and participation in national varietal trials. Dr. Kumar has also been actively involved in extension and capacity-building activities, promoting awareness on Intellectual Property Rights in Agriculture and disseminating improved agricultural technologies to farmers and field functionaries. In recognition of his academic and research contributions, he was conferred with the Excellence in Research/Teaching Award by the Indian Society of Genetics, Biotechnology Research and Development, Agra in 2020.

Sudhakar Dwivedi, Department of Agricultural Economics and ABM, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India

Sudhakar Dwivedi is working as Associate Professor in the Division of Agricultural Economics and Agribusiness Management, Faculty of Agriculture, Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (SKUAST-Jammu), Chatha, J&K, India, and is presently serving as Director, Student Welfare at the university. He completed his B.Sc. (Ag.) and M.Sc. (Ag.) in Agricultural Economics from Agra University, Uttar Pradesh, securing first division, and obtained his Ph.D. in Agricultural Economics from Dr. B. R. Ambedkar University, Agra. Dr. Dwivedi has extensive experience in teaching, research, and academic administration, with long experience in postgraduate and undergraduate teaching. He has guided several postgraduate and doctoral students and has also served as a member of various research advisory committees. His research interests focus on agricultural economics, resource use efficiency, crop economics, and agricultural marketing. He has worked as Principal Investigator and Co-Principal Investigator in several externally funded and university research projects related to agricultural development, cropping patterns, and marketing management. He has also delivered radio talks on agricultural economics and farmer-related issues and has actively participated in extension activities. Dr. Dwivedi also contributes to academic publishing as Editor-in-Chief of Agro-Economist: An International Journal and as a member of editorial boards of reputed journals. In recognition of his academic and research contributions, he has received several honours including the Eminent Scientist Award, Scientist of the Year Award, and fellowships from ICSSR and Dr. B. R. Ambedkar University.

Lalit Mohan Gupta, Department of Forest Products and Utilization, Faculty of Horticulture and forestry, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India

Lalit Mohan Gupta is working as Professor and Head in the Division of Forest Products and Utilization at Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (SKUAST-Jammu), J&K, India. He holds a Ph.D. and has around 20 years of teaching and research experience. His areas of specialization include medicinal and aromatic plants and forest products utilization. Dr. Gupta has been actively involved in teaching several undergraduate and postgraduate courses related to forestry, plantation crops, medicinal and aromatic plants, seed technology, and post-harvest technology. He has contributed significantly to research with around 35 publications in national and international journals, along with book chapters, manuals, and extension publications. He has also handled several research projects as Principal Investigator and Co-Principal Investigator, funded by agencies such as the National Medicinal Plants Board, New Delhi. Dr. Gupta is actively associated with various professional bodies and is a life member of several scientific societies, including the Indian Society of Agroforestry, Medicinal and Aromatic Plant Association of India, Indian Ecological Society, SIDAVES, and the Society of Community Mobilization for Sustainable Development.

Sanjay Guleria, Department of Biochemistry, Faculty of Basic Sciences, Sher-e-Kashmir University of Agricultural Sciences & Technology of Jammu, Main Campus, Chatha, Jammu – 180009, India

Sanjay Guleria is working as Professor and Former Head in the Division of Biochemistry and Dean, Faculty of Basic Sciences at the Sher-e-Kashmir University of Agricultural Sciences and Technology of Jammu (SKUAST-Jammu), Chatha, J&K, India. He completed his B.Sc. (Chemistry Honours) from H.P. University, Shimla, securing third rank, and obtained his M.Sc. degree as a Gold Medalist with merit scholarships. He also qualified the National Eligibility Test (NET) conducted by ICAR. Dr. Guleria has more than 24 years of teaching and research experience. His research specialization includes isolation and characterization of bioactive molecules from medicinal plants, green synthesis of nanoparticles, nano-encapsulation, biotransformation of plant extracts, and metabolic engineering of microbes for useful compounds. He has received several prestigious recognitions including the DST Young Scientist Award and the CREST Award of the Department of Biotechnology, Government of India. He has authored 10 books/book chapters/manuals, handled research projects as Principal Investigator, and serves as a reviewer for several reputed international journals. He has also worked as a Visiting Research Scientist at Rensselaer Polytechnic Institute, USA, and as a Visiting Professor at the University of Melbourne, Australia. In addition, he has held administrative responsibilities including In-charge of the Counselling and Placement Cell and HR Executive Officer at the Career Development Center, SKUAST-Jammu.

References

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.

Chlingaryan, A., Sukkarieh, S., and Whelan, B. (2018). Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and Electronics in Agriculture, 151, 61–69. https://doi.org/10.1016/j.compag.2018.05.012.

Department of Agriculture, Jammu. (2025). Annual Agricultural Report 2025 – Jammu Division. Directorate of Agriculture, Jammu & Kashmir.

Everingham, Y., Sexton, J., Skocaj, D., and Inman-Bamber, G. (2016). Accurate prediction of sugarcane yield using a random forest algorithm. Agronomy for Sustainable Development, 36(2), 27. https://doi.org/10.1007/s13593-016-0354-z.

Fageria, N. K., Baligar, V. C., and Clark, R. B. (2002). Micronutrients in crop production. Advances in Agronomy, 77, 185–268. https://doi.org/10.1016/S0065-2113(02)77015-6.

Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451.

Government of India. (2025). Agricultural Statistics at a Glance 2025. Ministry of Agriculture and Farmers Welfare. https://agricoop.nic.in.

Gonzalez-Sanchez, A., Frausto-Solis, J., and Ojeda-Bustamante, W. (2014). Predictive ability of machine learning methods for massive crop yield prediction. Spanish Journal of Agricultural Research, 12(2), 313–328. https://doi.org/10.5424/sjar/2014122-4690.

Gupta, V., Sharma, A., and Chauhan, R. (2022). Ensemble learning models for soil fertility and yield forecasting. Journal of Soil Science and Plant Nutrition, 22(4), 2571–2584.

Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using SVMs. Machine Learning, 46(1), 389–422. https://doi.org/10.1023/A:1012487302797.

Guyon, I., and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf.

Khaki, S., and Wang, L. (2019). Crop yield prediction using deep neural networks. Frontiers in Plant Science, 10, 621. https://doi.org/10.3389/fpls.2019.00621.

Khaki, S., Wang, L., and Archontoulis, S. V. (2020). A CNN-RNN framework for crop yield prediction. Frontiers in Plant Science, 11, 1759. https://doi.org/10.3389/fpls.2020.01759.

Kumar, S., and Bhatia, H. (2021). Integration of soil and climatic variables for machine learning-based yield estimation. Precision Agriculture, 22(5), 1342–1360.

Kruseman, G., Borman, G., Nusser, C., Markovic, M., and Balkovic, J. (2025). What do we know about the future of wheat? FAO–AGRIS.

Kumar, R., Singh, S. K., and Pathak, R. (2022). Predictive modeling of wheat yield using Random Forest and soil nutrient data. Indian Journal of Agricultural Sciences, 92(6), 772–778.

Lamorski, K., Pachepsky, Y. A., Sławiński, C., and Walczak, R. T. (2008). Using support vector machines to develop pedotransfer functions for water retention of soils in Poland. Soil Science Society of America Journal, 72(5), 1243–1247.

LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.

Lundberg, S. M., Erion, G. G., and Lee, S.-I. (2018). Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888. https://arxiv.org/abs/1802.03888.

Merdun, H., Çınar, Ö., Meral, R., and Apan, M. (2006). Comparison of artificial neural network and regression pedotransfer functions for prediction of soil water retention and saturated hydraulic conductivity. Soil and Tillage Research, 90(1–2), 108–116.

Misra, R. K., Singh, A., and Sharma, V. (2016). Prediction of wheat yield using decision tree algorithms under different fertilization regimes. Journal of Agronomy, 15(2), 58–64.

Misra, P., and Srivastava, A. K. (2021). Soil fertility-based wheat yield prediction using feature selection and machine learning. Journal of the Indian Society of Soil Science, 69(1), 44–51.

Raza, M. M., Qureshi, W. S., and Khan, A. (2022). Feature optimization for wheat yield estimation using machine learning and remote sensing. Computers and Electronics in Agriculture, 198, 107048. https://doi.org/10.1016/j.compag.2022.107048.

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF0011625.

Raza, M., Qureshi, W., and Afzal, A. (2024). Hybrid feature selection techniques for improving crop yield prediction. Agricultural Systems, 219, 103894.

Singh, R., Mehta, P., and Kapoor, A. (2023). Machine learning approaches for soil nutrient-based yield prediction. Computers and Electronics in Agriculture, 204, 107596.

Singh, H., and Prasad, R. (2020). Micronutrient management for sustainable wheat productivity in semi-arid soils of India. Journal of Plant Nutrition, 43(12), 1805–1816.

Sharma, A., Chauhan, R., and Mehta, V. (2021). Role of manganese nutrition in improving wheat yield and quality in Indo-Gangetic Plains. Indian Journal of Agronomy, 66(3), 289–295.

Sundararajan, M., and Najmi, A. (2019). The many Shapley values for model explanation. arXiv preprint arXiv:1908.08474. https://arxiv.org/abs/1908.08474.

Tao, F., Xiao, D., Zhang, S., Zhang, Z., and Rötter, R. P. (2017). Wheat yield benefited from increases in minimum temperature in the Huang-Huai-Hai Plain of China in the past three decades. Agricultural and Forest Meteorology, 239, 1–14. https://doi.org/10.1016/j.agrformet.2017.02.033.

Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer. https://doi.org/10.1007/978-1-4757-2440-0.

Xu, X., Gao, P., Zhu, X., Guo, W., Ding, J., Li, C., and Wu, X. (2019). Design of an integrated climatic assessment indicator (ICAI) for wheat production: A case study in Jiangsu Province, China. Ecological Indicators, 101, 943–953.

Downloads

Published

2026-04-05

How to Cite

Tabassum, A. ., Sharma, M. ., Kumar, B. ., Dwivedi, S. ., Gupta, L. M. ., & Guleria, S. . (2026). Prediction of Wheat Yield Through Soil Nutrient: Machine Learning and Feature Selection Approaches. Journal of Reliability and Statistical Studies, 19(01), 215–240. https://doi.org/10.13052/jrss0974-8024.19110

Issue

Section

Articles