SGLang Workshop

[Hands-on Lab] SGLang: High-Performance LLM Serving Framework — Run an Open Model Live

Date May 6 Time 14:05 - 14:50 Location Central Room

SGLang is an open-source high-performance serving framework for LLMs and Multimodal Models, with over 24,000 GitHub stars. In this talk, we would present the key design principles behind SGLang's performance.

We will discuss recent advances including native multimodal models support, speculative decoding support (Eagle3/MTP), and FP8/NVFP4 quantization on Hopper and Blackwell GPUs. We also share lessons from maintaining a fast-moving open-source project with thousands of contributors worldwide.

Attendees will gain a practical understanding of how SGLang achieves state-of-the-art throughput and latency, and how to deploy it for production LLM serving.

Speakers

Xinyuan Tong SGLang Maintainer