EvoPhys Team - Peking University

EvoPhys-World

Human-Centric Scene-Level Controllable 5D World Model

Worlds are no longer only seen. They can be twinned, interacted with, controlled, and evolved.

Core Model
5D World Model
Mode 1
World Engine
Mode 2
World Policy
Learning Loop
Self-Evolution

From watching worlds to moving worlds.

EvoPhys-World moves beyond visual generation and camera navigation, enabling action-conditioned physical interaction, long-horizon prediction, and human-centered policy learning.

01

World Twin

Scene-level memory builds dynamic twins that preserve spatial structure and temporal continuity.

02

Physical Interaction

Actions drive object responses, contact changes, and future state generation.

03

World Policy

Human action space connects egocentric perception, hand motion, and embodied control.

Space, time, action, causality, and value.

The model does not only predict how the world looks. It imagines how different actions push the same world state into different futures.

Stage 01

3D

Spatial scene structure, geometry, objects, and layout.

Stage 02

4D

Space plus time, motion, continuity, and future state prediction.

Stage 03

5D

Action-conditioned worldlines with memory, causality, feedback, and policy value.

One base model, two forms.

A unified state-action world model integrates 4D spatiotemporal memory, next-state prediction, next-action prediction, and a self-evolving interaction loop.

Self-Evolving Strategy Framework World Engine and World Policy form a closed interaction loop.

Model as World Engine.

World Engine builds dynamic scene-level twins that can be navigated, manipulated, and physically interacted with.

Model as World Policy.

World Policy learns in a human-centered action space, then maps imagined interaction into embodied control.

Scene-level controllability in action.

Video slots are ready for project demos. Each block is designed for a short, looped clip with a direct task title.

01

Scene Navigation

World twin roaming

02

Object Interaction

Physical response

03

Local Manipulation

Fine-grained control

04

Scene Manipulation

Long-horizon rollout

05

Human Action Prediction

Action chunk decoding

06

Dexterous Retargeting

Human-to-robot transfer

07

Worldline Rollout

Multiple futures

08

Real Robot Control

Embodied execution

Data, model, interaction, and policy evolve together.

Virtual interaction generates experience, policy learning turns experience into action, and real-world feedback closes the loop.

Step 01

Data

Human-centered egocentric observations and interaction traces.

Step 02

World Twin

Scene-level memory creates dynamic worlds for interaction.

Step 03

Interaction

Actions produce future states, contact changes, and outcomes.

Step 04

Policy Evolution

Generated experience feeds embodied policy learning.

Multiple futures from one world state.

Given different actions, EvoPhys-World imagines different physically grounded futures and selects actionable paths.

S0 A1 F1 Stable rotation
S0 A2 F2 Object slip
S0 A3 F3 Contact failure

EvoPhys Team, Peking University

Led by Prof. Shanghang Zhang.

Back to Top