Open Source Models

World Model for Universal Generation and Control

Date May 6 Time 14:00 - 14:30 Location Open Stage

In AI and cognitive science, world models are key for planning, reasoning, and learning from experience. An effective world model needs to: senses and learns real-world knowledge, predicts and generates real-world scenes, reasons and controls according to physical laws, and acts robustly with human-in-the-loop. Prior work on world model has limited capability in representation/generation and physical awareness. We overcome these limitations through two innovations and towards the first open-source, physically grounded world model from academia. First, we develop a flow matching and DPO reinforcement learning framework to improve the continuity and physical awareness in world model representation and generation, achieving best-of-the-results in physical awareness and state-of-the-art in open-source video generation. Second, we develop a comprehensive physical awareness benchmarking and arena system. We extract comprehensively 50-60 metrics demonstrating physical law awareness of video generation and trajectory. Complete benchmark including video quality, common sense, Newtonian mechanics, optics, energy, chemical, materials, etc. This is missing in literature. We generate an agent and a 27B language model for evaluating physical awareness according to these benchmarks. Last, we describe our effort towards the concept of "world model for all", which utilizes a single world model for robotic control and robot navigation, task management, planning, task decomposition for high-level management and control, as well as automatic SLAM and 3D reconstruction for environment sensing.

Speakers

Yanzhi Wang Professor, Northeastern University