Distributed Machine Learning Using Data Parallelism on Mobile Platform
DOI:
https://doi.org/10.13052/jmm1550-4646.1633Keywords:
machine learning, distribution, data parallelism, mobile, client-server architecture, web serviceAbstract
Machine learning has many challenges, and one of them is to deal with large datasets, because the size of them grows continuously year by year. One solution to this problem is data parallelism. This paper investigates the expansion of data parallelism to mobile, which became the most popular platform. Special client-server architecture was created for this purpose. The software implementation of this problem measures the mobile devices training capabilities and the efficiency of the whole system. The results show that doing distributed training on mobile cluster is possible and safe, but its performance depends on the algorithm’s implementation.
Downloads
References
Kourou K., Exarchos T. P., Exarchos K. P., Karamouzis M. V., Fotiadis D. I. Machine learning applications in cancer prognosis and prediction, Computational and Structural Biotechnology Journal, Vol. 13, 2015, pp. 8‒17.
Kukar M., Kononenko I., Groselj C., Kralj K., Fettich J. Analyzing and improving diagnosis of ischaemic heart disease with machine learning, Artificial Intelligence in Medicine, Vol. 16, No. 1, 1999, pp. 25‒50.
Li S., Shi F., Pu F., Li X., Jiang T., Xie S., Wang Y. Hippocampal shape analysis of Alzheimer disease based on machine learning methods, American Journal of Neuroradiology, Vol. 28, No. 7, 2007, pp. 1339‒1345.
Chen M., Hao Y, Hwang K., Wang L., Wang L. Disease prediction by machine learning over big data from healthcare communities, IEEE Access, Vol. 5, 2017, pp. 8869‒8879.
Bartlett M. S., Littleworth G., Frank M., Lainscsek C., Fasel I., Movellan J. , Recognizing facial expression: machine learning and application to spontaneous behavior, Conference on Computer Vision And Pattern Recognition, San Diego, CA, USA, 20-25 June 2005, pp. 568‒573.
Zander S., Nguyen T., Armitage G. Automated traffic classification and application identification using machine learning, The IEEE Conference on Local Computer Networks 30th Anniversary, Sydney, NSW, Australia, 17 November 2005, pp. 250‒257.
Alsabti K., Ranka S., Singh V. Clouds: A decision tree classifier for large datasets, Proceedings of the 4th Knowledge Discovery and Data Mining Conference, New York, USA, 27-31 Aug. 1998, pp. 2‒8.
Lee S. M., Abbott P. A. Bayesian networks for knowledge discovery in large datasets: basics for nurse researchers, Journal of Biomedical Informatics, Vol. 36 No. 4-5, 2003,
pp. 389‒399.
Yu H., Yang J., Han J. Classifying large data sets using SVMs with hierarchical clusters, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, D.C., USA, 24-27 August 2003, pp. 306‒315.
PassMark Android Benchmark Charts, CPU Rating, https://www.androidbenchmark.net/ cpumark_chart.html, (last visited 22 December 2019).
Smartphone Processors, Benchmark List, NotebookCheck.net Tech, https://www.notebookcheck.net/Smartphone-Processors-Benchmark-List.149513.0.html, (last visited 22 December 2019).
Mobile percentage of website traffic 2019, Statista, https://www.statista.com/ statistics/277125/ share-of-website-traffic-coming-from-mobile-devices/, (last visited 22 December 2019).
U.S. daily mobile media usage time 2018, Statista, https://www.statista.com/ statistics/469983/time-spent-mobile-media-type-usa/, (last visited 22 December 2019).
Mobile vs desktop traffic in 2019, Perficient DigiDigital, Perficient Digital Agency, https://www.perficientdigital.com/insights/our-research/mobile-vs-desktop-usage-study, (last visited 22 December 2019).
Guazzelli A., Lin W. C., Jena T. Taylor J. PMML in action: Unleashing the power of open standards for data mining and predictive analytics, CreateSpace, Paramount, CA, 2010.
Lin W. F., Tsai D. Y., Tang L., Hsieh C. T., Chou C. Y., Chang P. H., Hsu L. ONNC: A compilation framework connecting ONNX to proprietary deep learning accelerators, IEEE International Conference on Artificial Intelligence Circuits and Systems, Hsinchu, Taiwan, Taiwan, 18-20 March 2019, pp. 214‒218.
GitHub - onnx/onnx: Open neural network exchange, https://github.com/onnx/onnx, (last visited 22 December 2019).
Amos B., Turner H., White J. Applying machine learning classifiers to dynamic android malware detection at scale, 9th International Wireless Communications and Mobile Computing Conference, Sardinia, Italy, 1-5 July 2013, pp. 1666‒1671.
Peiravian N., Zhu X. Machine learning for Android malware detection using permission and API calls, IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon, VA, USA, 4-6 November 2013, pp. 300‒305.
Sahs J., Khan L. A machine learning approach to android malware detection, European Intelligence and Security Informatics Conference, Odense, Denmark, 22-24 August 2012, pp. 141‒147.
Li M., Andersen D. G., Park J. W., Smola A. J., Ahmed A., Josifovski V., Long J., Shekita E. J., Su B. Y. Scaling distributed machine learning with the parameter server, 11th Symposium on Operating Systems Design and implementation, Broomfield, CO, 6-8 October 2014, pp. 583‒598.
Kraska T., Talwalkar A., Duchi J., Griffith R., Franklin M. J., Jordan M. MLbase: A distributed machine-learning system, 6th Biennial Conference on Innovative Data Systems Research, Asilomar, California, USA, 6-9 Januar 2013.
Sparks E. R., Talwalkar A., Smith V., Kottalam J., Pan X., Gonzalez J., Franklin M. J., Jordan M. I., Kraska T. MLI: An API for distributed machine learning, IEEE 13th International Conference on Data Mining, Dallas, TX, USA, 7-10 December 2013, pp. 1187‒1192.
Meng X., Bradley J., Yavuz B., Sparks E., Venkataraman S., Liu D., Freeman J., Tsai D. B., Amde M., Owen S., Xin D., Xin R., Franklin M. J., Zadeh R., Zaharia M., Talwalka A. Mllib: Machine learning in apache spark, The Journal of Machine Learning Research, Vol. 17, No. 1, 2016, pp. 1235‒1241.
UCI machine learning repository: Record linkage comparison patterns data set, https://archive.ics.uci.edu/ml/datasets/record+linkage+comparison+patterns, (Last visited 22 December 2019).
Sariyar M., Borg A., Pommerening K. Controlling false match rates in record linkage using extreme value theory, Journal of Biomedical Informatics, Vol. 44, No. 4, 2011, pp. 648‒654.