Latent Diffusion Models: A Survey on Foundations, Variants, and Web-scale Deployments
DOI:
https://doi.org/10.13052/jwe1540-9589.2546Keywords:
Latent diffusion models, diffusion models, web engineering, Stable Diffusion, text-to-image, video diffusion, medical imaging, watermarking, content moderation, web services, generative AIAbstract
Latent diffusion models (LDMs) have rapidly become the de facto backbone of web-scale generative systems, powering text-to-image platforms such as Stable Diffusion and their video, 3D, and domain-specific extensions. By performing the diffusion process in a compressed latent space rather than directly in pixel space, LDMs achieve a favorable trade-off between computational efficiency and generative fidelity, enabling deployment in interactive web applications and large-scale content pipelines. This paper presents a comprehensive survey of LDMs from the perspective of both foundational modeling and web engineering. We first review the background of diffusion models and latent representations, contrasting LDMs with classical VAEs, GANs, and pixel-space diffusion models. We then dissect the architectural design of LDMs, including autoencoder backbones, latent-space U-Nets and diffusion transformers, conditioning mechanisms, training objectives, and sampling accelerations. Building on recent general surveys of diffusion models in vision, temporal data, and inverse problems, we propose a taxonomy of LDM variants, covering 2D image models, video and 4D models, and domain-specific LDMs in medical imaging, watermarking, time series, and text. From a web engineering viewpoint, we analyze LDM-based services exposed via web APIs, hosted user interfaces, and developer platforms, and discuss system-level concerns such as scalability, latency, cost, safety, and governance. We review current evaluation methodologies (quality, diversity, downstream task performance, robustness, watermarking) and highlight open challenges in controllability, interpretability, resource efficiency, and regulatory compliance, especially in light of recent legal and societal developments around generative deepfakes and copyright. This survey aims to provide both a conceptual map of LDM research and practical guidance for designing, deploying, and governing LDM-driven web systems.
Downloads
References
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution image synthesis with latent diffusion models,” Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), pp. 10684–10695, 2022.
L. Yang, Z. Zhang, Y. Song, S. Hong, R. Xu, Y. Zhao, Y. Shao, W. Zhang, B. Cui, and M.-H. Yang, “Diffusion models: A comprehensive survey of methods and applications,” ACM Computing Surveys, vol. 56, no. 4, article 105, 2024.
F.-A. Croitoru, V. Hondru, R. T. Ionescu, and M. Shah, “Diffusion models in vision: A survey,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 45, no. 9, pp. 10850–10869, 2023.
Z. Chang, G. A. Koulieris, and H. P. H. Shum, “On the design fundamentals of diffusion models: A survey,” Pattern Recognition, vol. 169, article 111934, 2026.
H. Hur, M. Kang, S. Seo, and J.-U. Hou, “Latent diffusion models for image watermarking: A review of recent trends and future directions,” Electronics, vol. 14, no. 1, article 25, 2025.
Y. He et al., “Latent video diffusion models for high-fidelity long video generation,” arXiv preprint arXiv:2211.13221, 2022.
D. Podell et al., “SDXL: Improving latent diffusion models for high-resolution image synthesis,” Proc. Int. Conf. Learning Representations (ICLR), 2024; also arXiv preprint arXiv:2307.01952, 2023.
Stability AI, “Stable Diffusion and Stable Video Diffusion developer platform,” technical documentation and API reference, 2023–2025 (online, accessed 2025).
W. H. L. Pinaya, M. S. Graham, E. Kerfoot, et al., “Brain imaging generation with latent diffusion models,” in Deep Generative Models, DGM4MICCAI 2022, Lecture Notes in Computer Science, vol. 13609, pp. 117–126, Springer, 2022.
G. Müller-Franzes, J. M. Niehues, F. Khader, et al., “Diffusion probabilistic models beat GANs on medical images,” arXiv preprint arXiv:2212.07501, 2022; and “A multimodal comparison of latent denoising diffusion probabilistic models and generative adversarial networks for medical image synthesis,” Scientific Reports, vol. 13, article 12098, 2023.
F. Campos, M. A. T. Figueiredo, B. L. Póvoa, et al., “Latent diffusion models for privacy-preserving medical image anonymization,” in Proc. 3rd Workshop on eXplainable AI in Healthcare (XAI-Healthcare), CEUR Workshop Proceedings, vol. 3831, 2024.
A. Kazerouni, E. K. Aghdam, M. Heidari, R. Azad, M. Fayyaz, I. Hacihaliloglu, and D. Merhof, “Diffusion models for medical image analysis: A comprehensive survey,” arXiv preprint arXiv:2211.07804, 2022; and “Diffusion models in medical imaging: A comprehensive survey,” Medical Image Analysis, vol. 88, article 102846, 2023.
Y. Shi, A. Abulizi, H. Wang, et al., “Diffusion models for medical image computing: A survey,” Tsinghua Science and Technology, vol. 30, no. 1, pp. 357–383, 2025.
Q. Liu, Y. Guan, W. Wu, H. Shan, and D. Liang, “Diffusion models in medical imaging: A comprehensive survey,” CT Theory and Applications, vol. 34, no. 3, pp. 506–524, 2025 (in Chinese).
G. Daras, C. A. Diaconu, E. Bagdasaryan, G. Frangella, and A. G. Dimakis, “A survey on diffusion models for inverse problems,” arXiv preprint arXiv:2410.00083, 2024.
Y. Li, K. Zhou, W. X. Zhao, and J.-R. Wen, “Diffusion models for non-autoregressive text generation: A survey,” Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), 2023.
Y. Yang et al., “A survey on diffusion models for time series and spatio-temporal data,” arXiv preprint arXiv:2404.18886, 2024.
M. M. Ahsan, S. Raman, Y. Liu, and Z. Siddique, “A comprehensive survey on diffusion models and their applications,” arXiv preprint arXiv:2408.10207, 2024.
Diffusers Team, “Stable Diffusion pipelines,” Hugging Face documentation (online), accessed 2025.
Y.-Y. Yang, “Diffusion model for time series and spatio-temporal data: A curated list,” GitHub repository, 2024–2025.
A. Kazerouni, “Awesome diffusion models in medical imaging,” GitHub repository, 2023–2025.
stablediffusionapi.com, “Stable Diffusion API services,” technical documentation (online), accessed 2025.
Hugging Face Spaces, “Stable Diffusion WebUI and related web user interfaces,” community applications (online), accessed 2025.
Stability AI, “Stability’s API platform: simplifying API discovery and accelerating integration,” developer blog posts and documentation, 2022–2023.
Journal of Web Engineering, “Guidelines for Authors” and “Aims and Scope,” Rinton Press / River Publishers (online), accessed 2025.
B. Wei, D. Ruthven, M. Lalmas, and J. M. Jose, “A survey of faceted search,” Journal of Web Engineering, vol. 12, no. 1–2, pp. 41–64, 2013.
B. Ojokoh and E. Adebisi, “A review of question answering systems,” Journal of Web Engineering, vol. 17, no. 8, pp. 717–758, 2019.
W. Hawkins, B. Mittelstadt, and C. Russell, “Deepfakes on demand,” Proc. ACM Conf. Fairness, Accountability, and Transparency (FAccT ’25), ACM, New York, NY, USA, 13 pp., 2025.
Reuters, “Getty Images largely loses landmark UK lawsuit over AI image generator,” news report, 2025.

