Production Image/Video Serving with SGLang Diffusion
DateMay 6Time14:50 - 15:10Location Central Room
Diffusion models have become the backbone of modern image and video generation, but serving them efficiently remains challenging. In this talk, we introduce SGLang-Diffusion, a high-performance inference framework designed for scalable diffusion generation. We present its system architecture and key optimizations — including advanced parallelism, distributed VAE, kernel fusion, and serving improvements — that enable efficient and production-ready deployment of diffusion models. We also demonstrate how SGLang-Diffusion accelerates popular open-source models and supports large-scale multimodal generation workloads.