Open Source Robotics

AI Agents Learn Invisible Shortcuts

Date May 6 Time 14:25 - 14:50 Location Founders Cafe

Classic RL agents often achieve "superhuman" performance not through true comprehension, but by exploiting hidden shortcuts in their environments. Much like "Clever Hans", the horse that appeared to do arithmetic but was actually just reading human cues, our models project an illusion of competence. Because these shortcuts are concealed within opaque neural networks, agents can silently fail or completely collapse when faced with trivial tasks modifications. This talk explores the pervasive danger of shortcut learning and proposes a path forward: prioritizing interpretability. By using LLMs and neuro-symbolic approaches to distill black-box policies into transparent, human-readable programs, we can unmask these hidden flaws, audit agent behavior, and build genuinely robust AI systems.

Speakers

Quentin Delfosse Agentic AI Researcher, Google Intrinsic