Open Source Models

Building Scalable LLM Inference Infrastructure

Date May 6 Time 16:40 - 17:10 Location Open Stage

LLM serving infrastructure has become a key pillar for modern society, but building scalable LLM infrastructure remains challenging at scale due to system issues like load imbalancing, stragglers and lack of elasticity. In this talk, I will present our recent work on scalable and efficient LLM infrastructure, including simple but efficient multiplication-based LLM global router and ultra-fast autoscaling mechanisms. Some of these works have been or are being deployed on world's largest LLM service providers.

Speakers

Xingda Wei Associate Professor, Shanghai Jiao Tong University