LLM serving infrastructure has become a key pillar for modern society, but building scalable LLM infrastructure remains challenging at scale due to system issues like load imbalancing, stragglers and lack of elasticity. In this talk, I will present our recent work on scalable and efficient LLM infrastructure, including simple but efficient multiplication-based LLM global router and ultra-fast autoscaling mechanisms. Some of these works have been or are being deployed on world's largest LLM service providers.