GOSIM Paris 2026 Has Concluded
Thank you to all attendees, speakers, and sponsors for an incredible event!
Speaker Slides Speaker Slides Photo Album Photo Album
Filter
Agentic AI on Edge

KTransformers: Full-Precision Inference for 600B+ MoE Models on Consumer Hardware

Date May 6 Time 10:20 - 10:45 Location Central Room
KTransformers is an open-source CPU-GPU heterogeneous inference framework that
runs frontier MoE models like DeepSeek-V3 and Qwen3.5-397B at FP8 precision
on consumer GPUs. By offloading expert computations to CPU with CUDA
Graph-capturable coordination, it achieves 35+ tokens/sec decode speed —
making 600B+ models accessible without datacenter infrastructure.