Multi-Class Classification Method with Feature Engineering for Predicting Hypertension with Diabetes

Authors

  • Mongkhon Sinsirimongkhon 1)Computer and Communication Engineering for Capacity Building Research Center 2)School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand
  • Sujitra Arwatchananukul School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand
  • Punnarumol Temdee 1)Computer and Communication Engineering for Capacity Building Research Center 2)School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand

DOI:

https://doi.org/10.13052/jmm1550-4646.1937

Keywords:

Hypertension, Diabetes, Machine Learning, Ensemble Learning, Disease Prediction

Abstract

Machine learning–based methods are widely applied for the prediction of noncommunicable diseases (NCDs), such as hypertension, diabetes, and cardiovascular disease. However, few models have been developed for predicting hypertension with diabetes, even though these diseases generally co-occur and can cause devastating harm to patients. This paper proposes a multi-class classification method that will be able to predict hypertension with diabetes. The proposed method consists of data preprocessing, model construction and validation, and model comparison. For data preprocessing, feature engineering of corresponding data types is conducted. For model construction, several machine learning methods are applied, including Random Forest (RF), Gradient Boosting (GB), Extra Tree (ET), Decision Tree (DCT), and Support Vector Machine (SVM). The dataset used in this study consists of 17,077 records and 28 features, obtained from Phaya Mengrai Hospital, Chiang Rai, Thailand. The predictive performance of each model with and without feature engineering is compared in terms of accuracy and average area under the Receiver Operating Characteristic curve (AUC-ROC). From the comparison results, SVM with feature engineering outperformed other models based on accuracy and average AUC-ROC achieving a value of 88.39% and 93.32%, respectively. For all ensemble learning–based methods, RF performed the best in terms of both accuracy and average AUC-ROC for both with and without feature engineering. Overall, all the models performed better when feature engineering was applied.

Downloads

Download data is not yet available.

Author Biographies

Mongkhon Sinsirimongkhon, 1)Computer and Communication Engineering for Capacity Building Research Center 2)School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand

Mongkhon Sinsirimongkhon received the bachelor’s degree in software engineering from Mae Fah Luang University, Thailand in 2021. Currently working in Accenture as an Application development associate and studying for the master’s degree in computer engineering at Mae Fah Luang University, Thailand. His research interests include Artificial Intelligence, Machine Learning, and Feature Engineering.

Sujitra Arwatchananukul, School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand

Sujitra Arwatchananukul received the B.S. and M.S. degrees in Computer Science from Chiang Mai University, Thailand, in 2004 and 2008 respectively and Ph.D. degree in Computer Engineering from Yunnan University, China.

In 2016, she joined the School of Information Technology, Mae Fah Luang University, as a lecturer. Her current research interests include Data Science, Machine Learning, Data Analysis, Image Processing, Software engineering, Algorithms and Database Management system.

Punnarumol Temdee, 1)Computer and Communication Engineering for Capacity Building Research Center 2)School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand

Punnarumol Temdee received B.Eng. in Electronic and Telecommunication Engineering, M. Eng in Electrical Engineering, and Ph.D. in Electrical and Computer Engineering from King Mongkut’s University of Technology Thonburi. She is currently a lecturer at School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand. Her research interests are social network analysis, artificial intelligence, software agent, context-aware computing, and ubiquitous computing.

References

World Health Organization, ‘Non communicable diseases’, https://www.who.int/news-room/fact-sheets/detail/noncommunicable-diseases, Sep. 2022, Accessed: 2022-07-30.

B.M. Cheung, ‘The hypertension–diabetes continuum’, Journal of Cardiovascular Pharmacology, 55(4), pp. 333–339, 2010.

S. Naha, M. Gardner, D. Khangura, L. Kurukulasuriya and J. Sowers, ‘Hypertension in Diabetes’, https://www.ncbi.nlm.nih.gov/books/NBK279027, 2021, Accessed: 2022-09-18.

L. Landsberg and M. Molitch, ‘Diabetes and hypertension: pathogenesis, prevention and treatment’, Clinical and Experimental Hypertension, 26(7–8), pp. 621–628, 2004.

N. Rachata and P. Temdee, ‘Mobile-based self-monitoring for preventing patients with type 2 diabetes mellitus and hypertension from cardiovascular complication’, Wireless Personal Communications, 117(1), pp. 151–175, 2021.

N. Rajatanavin, W. Witthayapipopsakul, V. Vongmongkol, N. Saengruang, Y. Wanwong, A. I. Marshall, W. Patcharanarumol and V. Tangcharoensathien, ‘Thailand effective coverage of diabetes and hypertension: challenges and solutions’, medRxiv, 2021.

M. L. Hewett, ‘Q: What is hypertension?’, Journal of the American Academy of PAs, 23(7), pp. 45–46, 2010.

T. Strasser, ‘The menace of high blood pressure’, World Health 1992; Jan–Feb: 12–13, 1992.

World Health Organization, ‘Hypertension’, https://www.who.int/news-room/fact-sheets/detail/hypertension, August 2021, Accessed: 2022-07-30.

C. Viedma, ‘What is diabetes?’, World Health Organization, https://link.gale.com/apps/doc/A11083636/PPNU?u=thmfu&sid=bookmark-PPNU&xid=b8902bb3, May–Jun. 1991, Accessed: 2022-0918.

A. D. Deshpande, M. Harris-Hayes, and M. Schootman, ‘Epidemiology of diabetes and diabetes-related complications’, Physical Therapy, 88(11), pp. 1254–1264, 2008.

I. H. de Boer, and DCCT/EDIC research group, ‘Kidney disease and related findings in the diabetes control and complications trial/epidemiology of diabetes interventions and complications study’, Diabetes Care, 37(1), pp. 24–30, 2014.

O. Kurkela, J. Nevalainen, M. Arffman, J. Lahtela, and L. Forma, ‘Foot-related diabetes complications: care pathways, patient profiles and costs’, BMC Health Services Research, 22(1), pp. 1–11, 2022.

Mayo Clinic, ‘Diabetes - Diagnosis and treatment’, https://www.mayoclinic.org/diseases-conditions/diabetes/diagnosis-treatment/drc-20371451, Oct. 2020, Accessed: 2022-07-30.

World Health Organization, ‘Diabetes’, https://www.who.int/news-room/fact-sheets/detail/diabetes, Nov. 2021, Accessed: 2022-07-30.

M. Chen, Y. Hao, K. Hwang, L. Wang and L. Wang, ‘Disease prediction by machine learning over big data from healthcare communities’, IEEE Access, 5, pp. 8869–8879, 2017.

N. Barakat, A. P. Bradley and M. N. H. Barakat, ‘Intelligible support vector machines for diagnosis of diabetes mellitus’, IEEE Transactions on Information Technology in Biomedicine, 14(4), pp. 1114–1120, 2010.

S. Mohan, C. Thirumalai and G. Srivastava, ‘Effective heart disease prediction using hybrid machine learning techniques’, IEEE Access, 7, pp. 81542–81554, 2019.

S. Tabik, et al., ‘COVIDGR dataset and COVID-SDNet methodology for predicting COVID-19 based on chest X-ray images’, IEEE Journal of Biomedical and Health Informatics, 24(12), pp. 3595–3605, 2020.

H. C. Shin, et al., ‘Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning, In IEEE Transactions on Medical Imaging, 35(5), pp. 1285–1298, 2016.

V. Vinodhini, A. Vishalakshi, G. N. Chandrika, S. Sankar and S. Ramasubbareddy, ‘Predicting vasovagal syncope for paraplegia patients using average weighted ensemble technique’, Journal of Mobile Multimedia, pp. 135–162, 2022.

S. Sankar, A. Potti, G. N. Chandrika and S. Ramasubbareddy, ‘Thyroid Disease Prediction Using XGBoost Algorithms’, Journal of Mobile Multimedia, 18(3), pp. 1–18, 2022.

J. V. D. Prasad, A. R. Pratap, and B. Sallagundla, ‘Machine learning based clinical diagnosis of liver patients with instance replacement’, Journal of Mobile Multimedia, 18(2), pp. 293–306, 2021.

S. Das, B. Amoedo, F. De la Torre and J. Hodgins, ‘Detecting Parkinsons’ symptoms in uncontrolled home environments: a multiple instance learning approach’, In 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 3688–3691, San Diego, 2012.

S. Alian, J. Li and V. Pandey, ‘A personalized recommendation system to support diabetes self-management for American Indians’, IEEE Access, 6, pp. 73041–73051, 2018.

A. Khan, J. A. Doucette, R. Cohen and D. J. Lizotte, ‘Integrating machine learning into a medical decision support system to address the problem of missing patient data’, In 2012 11th International Conference on Machine Learning and Applications, 1, pp. 454–457, Boca Raton, 2012.

S. Pitoglou, Y. Koumpouros and A. Anastasiou, ‘Using electronic health records and machine learning to make medical-related predictions from non-medical data’, In 2018 International Conference on Machine Learning and Data Engineering (iCMLDE), pp. 56–60, Sydney, 2018.

MedlinePlus, ‘High Blood Pressure’, https://medlineplus.gov/highbloodpressure.html, Accessed: 2022-07-30.

L. Lama, et al., ‘Machine learning for prediction of diabetes risk in middle-aged Swedish people’, Heliyon, 7(7), p. e07419, 2021.

S. S. Mirzajani, ‘Prediction and diagnosis of diabetes by using data mining techniques’, Avicenna Journal of Medical Biochemistry, 6(1), pp. 3–7, 2018.

P. Sonar and K. Jayamalini, ‘Diabetes prediction using different machine learning approaches’, In 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC), pp. 367–371, Erode, 2019.

N. Nasir, et al., ‘Hypertension classification using machine learning part II’, In 2021 14th International Conference on Developments in eSystems Engineering (DeSE), pp. 459–463, Sharjah, 2021.

L. A. AlKaabi, L. S. Ahmed, M. F. Al Attiyah and M. E. Abdel-Rahman, ‘Predicting hypertension using machine learning: Findings from Qatar Biobank Study’, Plos One, 15(10), p. e0240370, 2020.

K. Jain, J. Jha and Y. Jha, ‘Comparative analysis of machine learning algorithms for blood pressure prediction’, In 2021 Third International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 636–642, Coimbatore, 2021.

N. L. Fitriyani, M. Syafrudin, G. Alfian and J. Rhee, ‘Development of disease prediction model based on ensemble learning approach for diabetes and hypertension’, IEEE Access, 7, pp. 144777–144789, 2019.

P. Nair, and I. Kashyap, ‘Hybrid pre-processing technique for handling imbalanced data and detecting outliers for KNN classifier,’ In 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), pp. 460–464, Faridabad, 2019.

J. Raymaekers and P. J. Rousseeuw, ‘Transforming variables to central normality’, Machine Learning, pp. 1–23, 2021.

S. Ozedemir, ‘Feature Engineering Bookcamp’, Manning Publications, New York, USA, 2022.

Published

2023-02-15

How to Cite

Sinsirimongkhon, M. ., Arwatchananukul, S. ., & Temdee, P. . (2023). Multi-Class Classification Method with Feature Engineering for Predicting Hypertension with Diabetes. Journal of Mobile Multimedia, 19(03), 799–822. https://doi.org/10.13052/jmm1550-4646.1937

Issue

Section

Articles