Open Source Models

Linear Attention: Past, Present and Future

Date May 6 Time 10:20 - 10:50 Location Open Stage

The quadratic complexity of standard self-attention has become the fundamental bottleneck for long-context AI agents and edge-deployed models. Linear attention mechanisms—transforming the attention computation from O(n²) to O(n)—have emerged as one of the most significant algorithmic breakthroughs in efficient sequence modeling. This talk presents a comprehensive technical journey through the evolution of linear attention, from theoretical foundations (Performer, Linear Transformer, RNN reformulations) through current state-of-the-art architectures (Mamba, RetNet, GLA), and concludes with critical research frontiers: hybrid attention strategies for agentic workflows, hardware-software co-design for edge deployment, and the convergence of linear attention with state-space models. Drawing from production experience training large-scale foundation models, this session bridges algorithmic innovation with infrastructure reality.

Speakers

Minako Kojima Head of Developer Relations, Moonshot AI