Data Lake Conceptualized Web Platform for Food Research Data Collection

Authors

  • Gi-taek An 1) Korea Food Research Institute, Wanju-gun 55365, Republic of Korea 2) Division of Computer Science and Artificial Intelligence, Jeonbuk National University, Jeonju 54896, Republic of Korea
  • Seyoung Oh Korea Food Research Institute, Wanju-gun 55365, Republic of Korea
  • Eunhye Kim Korea Food Research Institute, Wanju-gun 55365, Republic of Korea
  • Jung-min Park Korea Food Research Institute, Wanju-gun 55365, Republic of Korea

DOI:

https://doi.org/10.13052/jwe1540-9589.2333

Keywords:

Food research data, big data, data platform, data collection, web-based platform

Abstract

Food research is uniquely intertwined with everyday life and necessitates the utilization of big data. Within this domain, the research data consist of various forms and formats, encompassing biological experiment results, chemical analysis data, nutritional information, microbiological data, sensor data, images, and videos. This diversity stems from the integration of data from various subdomains within the larger field. With recent advancements in deep learning technology, the importance of data has grown significantly, resulting in increased reliance on data-driven research. Although specialized platforms for sharing and utilizing data have been established at the national level, particularly in the bioscience field, food research lacks a dedicated infrastructure and specialized data-sharing platforms. In this study, we develop a platform that leverages Hadoop-based distributed file systems to create a data lake. This platform enables data storage and sharing through a web-based interface. The distributed file system supports scalability by adding data nodes, making it an effective solution for capacity expansion. In addition, the web-based platform ensures high accessibility, allowing users access from anywhere, at any time, using any device. Finally, we introduce the establishment of a 1.8 PB Hadoop-based physical storage system and present an approach for building a highly accessible web platform with substantial utility.

Downloads

Download data is not yet available.

Author Biographies

Gi-taek An, 1) Korea Food Research Institute, Wanju-gun 55365, Republic of Korea 2) Division of Computer Science and Artificial Intelligence, Jeonbuk National University, Jeonju 54896, Republic of Korea

Gi-taek An received his B.Sc. in Computer Science from Namseoul University in 2011 and M.Sc. in Computer Science from Jeonbuk National University. Currently, he is a Senior Engineer at the Korea Food Research Institute and a Ph.D. student at Jeonbuk National University. His research areas include information retrieval, artificial intelligence, and data platforms.

Seyoung Oh, Korea Food Research Institute, Wanju-gun 55365, Republic of Korea

Seyoung Oh received his B.Sc. in Computer Science from Hanbat University in 2010 and M.Sc. in Industrial System Engineering from Chungnam National University in 2018. Currently, he is an Engineer at the Korea Food Research Institute. His research area includes information systems and data platforms.

Eunhye Kim, Korea Food Research Institute, Wanju-gun 55365, Republic of Korea

Eunhye Kim received her B.Sc. in Electronic Engineering from Jeonbuk National University in 2012. Currently, she is an Engineer at the Korea Food Research Institute. Her research area includes Information Systems.

Jung-min Park, Korea Food Research Institute, Wanju-gun 55365, Republic of Korea

Jung-min Park received her B.Sc. in Biology from Ewha Womans University in 1991. She received her M.Sc. and Ph.D. degrees in Economics from Hannam University in 2001 and 2005, respectively. Currently, she is Intelligent Policy Team Leader and Principal Researcher at the Korea Food Research Institute. Her research areas include technology innovation and research data.

References

Galanakis, C.M. (2020). The food systems in the era of the coronavirus (COVID-19) pandemic crisis. Foods, 9, 523.

Jin, C., Bouzembrak, Y., Zhou, J., Liang, Q., Van Den Bulk, L.M., Gavai, A., Liu, N., Van Den Heuvel, L.J., Hoenderdaal, W., Marvin, H.J. (2020). Big Data in food safety- A review. Current Opinion in Food Science, 36, 24–32.

National Center for Biotechnology Information. (n.d.). About NCBI. Retrieved from https://www.ncbi.nlm.nih.gov/home/about/ (accessed on 2023.10.5.).

EMBL-EBI. (n.d.). About us. Retrieved from https://www.ebi.ac.uk/about (accessed on 2023.10.5.).

DDBJ Center. (n.d.). About DDBJ Center. Retrieved from https://www.ddbj.nig.ac.jp/about/index-e.html (accessed on 2023.10.5.).

National Genomics Data Center. (n.d.). About. Retrieved from https://ngdc.cncb.ac.cn/about (accessed on 2023.10.5.).

kobic. (n.d.). About Us |Introduction. Retrieved from ttps://www.kobic.re.kr/kobic/intro/overview (accessed on 2023.10.5.).

Foundation. A.S. (n.d.). Hadoop. Retrieved from https://hadoop.apache.org/ (accessed on 2023.10.5.).

Ji, Q. (2021). A Novel Mass Meteorological Data Storage System Based on Hadoop Ecosystem. Fresenius Environmental Bulletin, 30(7), 5332–5339.

Wu, J., Xiong, J., Dai, H., Wang, Y., Xu, C. (2022). MIX-RS: A multi-indexing system based on HDFS for remote sensing data storage. Tsinghua Science and Technology, 27(6), 881–893.

Chawla, T., Singh, G., Pilli, E.S. (2021). MuSe: a multi-level storage scheme for big RDF data using MapReduce. Journal of Big Data, 8(1), 1–26.

Sisodia, A., Jindal, R. (2022). An effective model for healthcare to process chronic kidney disease using big data processing. Journal of Ambient Intelligence and Humanized Computing, 1–17.

Y. Chen, D. Li, L. Yan, Z. Ma. (2022). Two-Stage Detection of Semantic Redundancies in RDF Data. Journal of Web Engineering, 21(8), 2313–2337. doi: 10.13052/jwe1540-9589.2184.

Chen, T., Ma, J., Liu, Y., Chen, Z., Xiao, N., Lu, Y., Fu, Y., Yang, C., Li, M., Wu, S. (2022). iProX in 2021: connecting proteomics data sharing with big data. Nucleic Acids Research, 50, D1522–D1527.

Ferraro Petrillo, U., Palini, F., Cattaneo, G., Giancarlo, R. (2021). FASTA/Q data compressors for MapReduce-Hadoop genomics: space and time savings made easy. BMC Bioinformatics, 22(1), 1–21.

Zu, C. (2021). Hadoop-Based Painting Resource Storage and Retrieval Platform Construction and Testing. Complexity, 2021, 1–11.

Belov, V., Kosenkov, A.N., Nikulchev, E. (2021). Experimental characteristics study of data storage formats for data marts development within data lakes. Applied Sciences, 11(19), 8651.

Armstrong, E.M., Bourassa, M.A., Cram, T.A., DeBellis, M., Elya, J., Greguska III, F.R., Huang, T., Jacob, J.C., Ji, Z., Jiang, Y. (2019). An Integrated Data Analytics Platform. Frontiers in Marine Science, 6, 354.

Han, X., Shen, H., Hu, H., Gao, J. (2022). Open Innovation Web-Based Platform for Evaluation of Water Quality Based on Big Data Analysis. Sustainability, 14(22), 8811.

Bossi, G., Schenato, L., Marcato, G. (2023). Web-Based Platforms for Landslide Risk Mitigation: The State of the Art. Water, 15(4), 1632.

David, F.P., Litovchenko, M., Deplancke, B., Gardeux, V. (2020). ASAP 2020 update: an open, scalable and interactive web-based portal for (single-cell) omics analyses. Nucleic Acids Research, 48, W403–W414.

Li, H., Shi, M., Ren, K., Zhang, L., Ye, W., Zhang, W., Cheng, Y., Xia, X.-Q. (2023). Visual Omics: a web-based platform for omics data analysis and visualization with rich graph-tuning capabilities. Bioinformatics, 39, btac777.

Zhou, G., Ewald, J., Xia, J. (2021). OmicsAnalyst: a comprehensive web-based platform for visual analytics of multi-omics data. Nucleic Acids Research, 49, W476–W482.

Zhou, G., Pang, Z., Lu, Y., Ewald, J., Xia, J. (2022). OmicsNet 2.0: a web-based platform for multi-omics integration and network visual analytics. Nucleic Acids Research, 50, W527-W533.

Asif, M., Abbas, S., Khan, M. A., Fatima, A., Khan, M. A., and Lee, S. W. (2022). MapReduce based intelligent model for intrusion detection using machine learning technique. Journal of King Saud University-Computer and Information Sciences, 34(10), 9723-9731.

Xiao, B., Yang, Z., Qiu, X., Xiao, J., Wang, G., Zeng, W., … and Chen, W. (2021). PAM-DenseNet: A deep convolutional neural network for computer-aided COVID-19 diagnosis. IEEE Transactions on Cybernetics, 52(11), 12163–12174.

Pavlova, M., Terhljan, N., Chung, A. G., Zhao, A., Surana, S., Aboutalebi, H., … and Wong, A. (2022). Covid-net cxr-2: An enhanced deep convolutional neural network design for detection of covid-19 cases from chest x-ray images. Frontiers in Medicine, 9, 861680.

Shinde, P. P., Desai, V. P., Katkar, S. V., Oza, K. S., Kamat, R. K., and Thakar, C. M. (2022). Big data analytics for mask prominence in COVID pandemic. Materials Today: Proceedings, 51, 2471–2475.

Bawankule, K. L., Dewang, R. K., and Singh, A. K. (2022). Historical data based approach to mitigate stragglers from the Reduce phase of MapReduce in a heterogeneous Hadoop cluster. Cluster Computing, 25(5), 3193–3211.

Amankwah-Amoah, J., Khan, Z., Wood, G., Knight, G. (2021). COVID-19 and digitalization: The great acceleration. Journal of Business Research, 136, 602–611.

Downloads

Published

2024-05-25

How to Cite

An, G.- taek, Oh, S., Kim, E., & Park, J.- min. (2024). Data Lake Conceptualized Web Platform for Food Research Data Collection. Journal of Web Engineering, 23(03), 377–392. https://doi.org/10.13052/jwe1540-9589.2333

Issue

Section

ECTI