Pie: A Programmable Serving System for Agentic Applications
DateMay 6Time14:00 - 14:20Location Master Stage
Emerging large language model (LLM) applications involve diverse reasoning strategies and agentic workflows, straining the capabilities of existing serving systems built on a monolithic token generation loop. This talk presents Pie, a programmable LLM serving system designed for flexibility and efficiency.
Pie decomposes the traditional generation loop into fine-grained service handlers exposed via an API and delegates control of the generation process to user-provided programs called inferlets. This enables applications to implement new KV cache strategies, bespoke generation logic, and seamlessly integrate computation and I/O—entirely within the application, without requiring modifications to the serving system.
Pie executes inferlets using WebAssembly, benefiting from its lightweight sandboxing. Evaluation shows Pie improves latency and throughput by 1.3×–3.4× on agentic workflows. Pie is open-source at pie-project.org.