Latent Diffusion Models: A Survey on Foundations, Variants, and Web-scale Deployments

Jee-Woo  Shin; Chayapol  Kamyod; Chung-Pyo  Hong

doi:10.13052/jwe1540-9589.2546

Authors

Jee-Woo Shin Division of Computer Engineering, Hoseo University, Republic of Korea
Chayapol Kamyod Computer and Communication Engineering for Capacity Building Research Center, Mae Fah Luang University, Thailand
Chung-Pyo Hong Division of Computer Engineering, Hoseo University, Republic of Korea

DOI:

https://doi.org/10.13052/jwe1540-9589.2546

Keywords:

Latent diffusion models, diffusion models, web engineering, Stable Diffusion, text-to-image, video diffusion, medical imaging, watermarking, content moderation, web services, generative AI

Abstract

Latent diffusion models (LDMs) have rapidly become the de facto backbone of web-scale generative systems, powering text-to-image platforms such as Stable Diffusion and their video, 3D, and domain-specific extensions. By performing the diffusion process in a compressed latent space rather than directly in pixel space, LDMs achieve a favorable trade-off between computational efficiency and generative fidelity, enabling deployment in interactive web applications and large-scale content pipelines. This paper presents a comprehensive survey of LDMs from the perspective of both foundational modeling and web engineering. We first review the background of diffusion models and latent representations, contrasting LDMs with classical VAEs, GANs, and pixel-space diffusion models. We then dissect the architectural design of LDMs, including autoencoder backbones, latent-space U-Nets and diffusion transformers, conditioning mechanisms, training objectives, and sampling accelerations. Building on recent general surveys of diffusion models in vision, temporal data, and inverse problems, we propose a taxonomy of LDM variants, covering 2D image models, video and 4D models, and domain-specific LDMs in medical imaging, watermarking, time series, and text. From a web engineering viewpoint, we analyze LDM-based services exposed via web APIs, hosted user interfaces, and developer platforms, and discuss system-level concerns such as scalability, latency, cost, safety, and governance. We review current evaluation methodologies (quality, diversity, downstream task performance, robustness, watermarking) and highlight open challenges in controllability, interpretability, resource efficiency, and regulatory compliance, especially in light of recent legal and societal developments around generative deepfakes and copyright. This survey aims to provide both a conceptual map of LDM research and practical guidance for designing, deploying, and governing LDM-driven web systems.

Downloads

Download data is not yet available.

Author Biographies

Jee-Woo Shin, Division of Computer Engineering, Hoseo University, Republic of Korea

Jee-Woo Shin is currently an undergraduate student in the Department of Computer Engineering at Hoseo University, Asan, Republic of Korea, where he enrolled in 2023. His research interests include artificial intelligence, machine learning, and related computational technologies.

Chayapol Kamyod, Computer and Communication Engineering for Capacity Building Research Center, Mae Fah Luang University, Thailand

Chayapol Kamyod achieved his Ph.D. in wireless communication from the Center of TeleInFrastruktur at Aalborg University, Denmark, a significant milestone in his academic career. This was preceded by a master’s in electrical engineering from The City College of New York and, earlier, bachelor and master degrees in telecommunication engineering and laser technology and photonics from Suranaree University of Technology, Thailand. Currently, he is a lecturer in the Computer Engineering program at Mae Fah Luang University, Thailand, where his research is focused on the resilience and reliability of computer networks, wireless sensor networks, and exploring the potentials of IoT applications.

Chung-Pyo Hong, Division of Computer Engineering, Hoseo University, Republic of Korea

Chung-Pyo Hong received his B.Sc. and M.Sc. degrees in computer science from Yonsei University, Seoul, Korea, in 2004 and 2006, respectively. In 2012, he received his Ph.D. degree in computer science from Yonsei University, Seoul, Korea. He is currently an associate professor of Computer Engineering at Hoseo University, Asan, Korea. His research interests include machine learning, explainable AI, and data science.

References

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution image synthesis with latent diffusion models,” Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695, 2022.

L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, article 105, 2024.

F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10850–10869, 2023.

Z. Chang, G. A. Koulieris, and H. P. H. Shum, “On the design fundamentals of diffusion models: A survey,” Pattern Recognition, vol. 169, article 111934, 2026.

H. Hur, M. Kang, S. Seo, and J.-U. Hou, “Latent diffusion models for image watermarking: A review of recent trends and future directions,” Electronics, vol. 14, no. 1, article 25, 2025.

Y. He et al., “Latent video diffusion models for high-fidelity long video generation,” arXiv preprint arXiv:2211.13221, 2022.

D. Podell et al., “SDXL: Improving latent diffusion models for high-resolution image synthesis,” Proc. Int. Conf. Learning Representations (ICLR), 2024; also arXiv preprint arXiv:2307.01952, 2023.

Stability AI, “Stable Diffusion and Stable Video Diffusion developer platform,” technical documentation and API reference, 2023–2025 (online, accessed 2025).

W. H. L. Pinaya, M. S. Graham, E. Kerfoot, et al., “Brain imaging generation with latent diffusion models,” in Deep Generative Models, DGM4MICCAI 2022, Lecture Notes in Computer Science, vol. 13609, pp. 117–126, Springer, 2022.

G. Müller-Franzes, J. M. Niehues, F. Khader, et al., “Diffusion probabilistic models beat GANs on medical images,” arXiv preprint arXiv:2212.07501, 2022; and “A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis,” Scientific Reports, vol. 13, article 12098, 2023.

F. Campos, M. A. T. Figueiredo, B. L. Póvoa, et al., “Latent diffusion models for privacy-preserving medical image anonymization,” in Proc. 3rd Workshop on eXplainable AI in Healthcare (XAI-Healthcare), CEUR Workshop Proceedings, vol. 3831, 2024.

A. Kazerouni, E. K. Aghdam, M. Heidari, R. Azad, M. Fayyaz, I. Hacihaliloglu, and D. Merhof, “Diffusion models for medical image analysis: A comprehensive survey,” arXiv preprint arXiv:2211.07804, 2022; and “Diffusion models in medical imaging: A comprehensive survey,” Medical Image Analysis, vol. 88, article 102846, 2023.

Y. Shi, A. Abulizi, H. Wang, et al., “Diffusion models for medical image computing: A survey,” Tsinghua Science and Technology, vol. 30, no. 1, pp. 357–383, 2025.

Q. Liu, Y. Guan, W. Wu, H. Shan, and D. Liang, “Diffusion models in medical imaging: A comprehensive survey,” CT Theory and Applications, vol. 34, no. 3, pp. 506–524, 2025 (in Chinese).

G. Daras, C. A. Diaconu, E. Bagdasaryan, G. Frangella, and A. G. Dimakis, “A survey on diffusion models for inverse problems,” arXiv preprint arXiv:2410.00083, 2024.

Y. Li, K. Zhou, W. X. Zhao, and J.-R. Wen, “Diffusion models for non-autoregressive text generation: A survey,” Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), 2023.

Y. Yang et al., “A survey on diffusion models for time series and spatio-temporal data,” arXiv preprint arXiv:2404.18886, 2024.

M. M. Ahsan, S. Raman, Y. Liu, and Z. Siddique, “A comprehensive survey on diffusion models and their applications,” arXiv preprint arXiv:2408.10207, 2024.

Diffusers Team, “Stable Diffusion pipelines,” Hugging Face documentation (online), accessed 2025.

Y.-Y. Yang, “Diffusion model for time series and spatio-temporal data: A curated list,” GitHub repository, 2024–2025.

A. Kazerouni, “Awesome diffusion models in medical imaging,” GitHub repository, 2023–2025.

stablediffusionapi.com, “Stable Diffusion API services,” technical documentation (online), accessed 2025.

Hugging Face Spaces, “Stable Diffusion WebUI and related web user interfaces,” community applications (online), accessed 2025.

Stability AI, “Stability’s API platform: simplifying API discovery and accelerating integration,” developer blog posts and documentation, 2022–2023.

Journal of Web Engineering, “Guidelines for Authors” and “Aims and Scope,” Rinton Press / River Publishers (online), accessed 2025.

B. Wei, D. Ruthven, M. Lalmas, and J. M. Jose, “A survey of faceted search,” Journal of Web Engineering, vol. 12, no. 1–2, pp. 41–64, 2013.

B. Ojokoh and E. Adebisi, “A review of question answering systems,” Journal of Web Engineering, vol. 17, no. 8, pp. 717–758, 2019.

W. Hawkins, B. Mittelstadt, and C. Russell, “Deepfakes on demand,” Proc. ACM Conf. Fairness, Accountability, and Transparency (FAccT ’25), ACM, New York, NY, USA, 13 pp., 2025.

Reuters, “Getty Images largely loses landmark UK lawsuit over AI image generator,” news report, 2025.

Latent Diffusion Models: A Survey on Foundations, Variants, and Web-scale Deployments

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

Jee-Woo Shin, Division of Computer Engineering, Hoseo University, Republic of Korea

Chayapol Kamyod, Computer and Communication Engineering for Capacity Building Research Center, Mae Fah Luang University, Thailand

Chung-Pyo Hong, Division of Computer Engineering, Hoseo University, Republic of Korea

References

Downloads

Published

How to Cite

Issue

Section

IEEE Xplore

ImpactScore

specialissue

issn

cover

Make a Submission

subreq

indexed