Prediction of Wheat Yield Through Soil Nutrient: Machine Learning and Feature Selection Approaches
DOI:
https://doi.org/10.13052/jrss0974-8024.19110Keywords:
Wheat yield, soil parameters, machine learning, feature selection, random forestAbstract
Wheat productivity is greatly influenced by soil nutrient variability, especially in areas with agroclimatic diversity like Jammu. The study analyzed a comprehensive dataset comprising 5,196 soil samples and corresponding wheat yield records from the districts of Jammu, Rajouri, and Kathua in Jammu region. This work predicted wheat yield from soil factors using machine learning (ML) models. The ML models used include Random Forest, Gradient Boosting, Support Vector Regression, and Decision Trees, in conjunction with embedded and wrapper-based feature selection methods. The soil variables analyzed in this study included pH, EC, OC, N, P, K, S, Cu, Zn, Mn, and Fe. Among the tested machine learning models, Random Forest yielded the highest predictive accuracy, with RMSE = 2.6570, MAE = 2.1578, and MAPE = 44.87%. Recursive Feature Elimination identified an optimal subset of 10 soil predictors, with S, Mn, Zn, and EC emerging as the most influential variables for wheat yield estimation. In all models, sulfur (S), manganese (Mn), electrical conductivity (EC), and zinc (Zn) were consistently found to be the most significant predictors. In comparison to other models, Random Forest and Support Vector Machines generated more reliable and broadly applicable predictions, according to stability study using k-fold cross-validation. The study highlights the effectiveness of machine learning techniques, particularly Random Forest, in predicting wheat yield from soil parameters. The consistent importance of micronutrients like S, Mn, and Zn underscores the need for micronutrient-focused soil management strategies. These findings demonstrate the usefulness of data-driven approaches in heterogeneous soil and climatic conditions.
Downloads
References
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324.
Chlingaryan, A., Sukkarieh, S., and Whelan, B. (2018). Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review. Computers and Electronics in Agriculture, 151, 61–69. https://doi.org/10.1016/j.compag.2018.05.012.
Department of Agriculture, Jammu. (2025). Annual Agricultural Report 2025 – Jammu Division. Directorate of Agriculture, Jammu & Kashmir.
Everingham, Y., Sexton, J., Skocaj, D., and Inman-Bamber, G. (2016). Accurate prediction of sugarcane yield using a random forest algorithm. Agronomy for Sustainable Development, 36(2), 27. https://doi.org/10.1007/s13593-016-0354-z.
Fageria, N. K., Baligar, V. C., and Clark, R. B. (2002). Micronutrients in crop production. Advances in Agronomy, 77, 185–268. https://doi.org/10.1016/S0065-2113(02)77015-6.
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451.
Government of India. (2025). Agricultural Statistics at a Glance 2025. Ministry of Agriculture and Farmers Welfare. https://agricoop.nic.in.
Gonzalez-Sanchez, A., Frausto-Solis, J., and Ojeda-Bustamante, W. (2014). Predictive ability of machine learning methods for massive crop yield prediction. Spanish Journal of Agricultural Research, 12(2), 313–328. https://doi.org/10.5424/sjar/2014122-4690.
Gupta, V., Sharma, A., and Chauhan, R. (2022). Ensemble learning models for soil fertility and yield forecasting. Journal of Soil Science and Plant Nutrition, 22(4), 2571–2584.
Guyon, I., Weston, J., Barnhill, S., and Vapnik, V. (2002). Gene selection for cancer classification using SVMs. Machine Learning, 46(1), 389–422. https://doi.org/10.1023/A:1012487302797.
Guyon, I., and Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. http://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf.
Khaki, S., and Wang, L. (2019). Crop yield prediction using deep neural networks. Frontiers in Plant Science, 10, 621. https://doi.org/10.3389/fpls.2019.00621.
Khaki, S., Wang, L., and Archontoulis, S. V. (2020). A CNN-RNN framework for crop yield prediction. Frontiers in Plant Science, 11, 1759. https://doi.org/10.3389/fpls.2020.01759.
Kumar, S., and Bhatia, H. (2021). Integration of soil and climatic variables for machine learning-based yield estimation. Precision Agriculture, 22(5), 1342–1360.
Kruseman, G., Borman, G., Nusser, C., Markovic, M., and Balkovic, J. (2025). What do we know about the future of wheat? FAO–AGRIS.
Kumar, R., Singh, S. K., and Pathak, R. (2022). Predictive modeling of wheat yield using Random Forest and soil nutrient data. Indian Journal of Agricultural Sciences, 92(6), 772–778.
Lamorski, K., Pachepsky, Y. A., Sławiński, C., and Walczak, R. T. (2008). Using support vector machines to develop pedotransfer functions for water retention of soils in Poland. Soil Science Society of America Journal, 72(5), 1243–1247.
LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444.
Lundberg, S. M., Erion, G. G., and Lee, S.-I. (2018). Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888. https://arxiv.org/abs/1802.03888.
Merdun, H., Çınar, Ö., Meral, R., and Apan, M. (2006). Comparison of artificial neural network and regression pedotransfer functions for prediction of soil water retention and saturated hydraulic conductivity. Soil and Tillage Research, 90(1–2), 108–116.
Misra, R. K., Singh, A., and Sharma, V. (2016). Prediction of wheat yield using decision tree algorithms under different fertilization regimes. Journal of Agronomy, 15(2), 58–64.
Misra, P., and Srivastava, A. K. (2021). Soil fertility-based wheat yield prediction using feature selection and machine learning. Journal of the Indian Society of Soil Science, 69(1), 44–51.
Raza, M. M., Qureshi, W. S., and Khan, A. (2022). Feature optimization for wheat yield estimation using machine learning and remote sensing. Computers and Electronics in Agriculture, 198, 107048. https://doi.org/10.1016/j.compag.2022.107048.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. https://doi.org/10.1007/BF0011625.
Raza, M., Qureshi, W., and Afzal, A. (2024). Hybrid feature selection techniques for improving crop yield prediction. Agricultural Systems, 219, 103894.
Singh, R., Mehta, P., and Kapoor, A. (2023). Machine learning approaches for soil nutrient-based yield prediction. Computers and Electronics in Agriculture, 204, 107596.
Singh, H., and Prasad, R. (2020). Micronutrient management for sustainable wheat productivity in semi-arid soils of India. Journal of Plant Nutrition, 43(12), 1805–1816.
Sharma, A., Chauhan, R., and Mehta, V. (2021). Role of manganese nutrition in improving wheat yield and quality in Indo-Gangetic Plains. Indian Journal of Agronomy, 66(3), 289–295.
Sundararajan, M., and Najmi, A. (2019). The many Shapley values for model explanation. arXiv preprint arXiv:1908.08474. https://arxiv.org/abs/1908.08474.
Tao, F., Xiao, D., Zhang, S., Zhang, Z., and Rötter, R. P. (2017). Wheat yield benefited from increases in minimum temperature in the Huang-Huai-Hai Plain of China in the past three decades. Agricultural and Forest Meteorology, 239, 1–14. https://doi.org/10.1016/j.agrformet.2017.02.033.
Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer. https://doi.org/10.1007/978-1-4757-2440-0.
Xu, X., Gao, P., Zhu, X., Guo, W., Ding, J., Li, C., and Wu, X. (2019). Design of an integrated climatic assessment indicator (ICAI) for wheat production: A case study in Jiangsu Province, China. Ecological Indicators, 101, 943–953.


