Key Facts
- ✓ Fei-Fei Li co-founded World Labs in 2024 with $230 million initial backing
- ✓ Yann LeCun is launching Advanced Machine Intelligence (AMI Labs) after leaving Meta
- ✓ World models mimic human mental constructs to anticipate outcomes
- ✓ Moonvalley unveiled Marey, its first video-generation model, in March
- ✓ World models require understanding of 3D environments and physical reality
Quick Summary
Leading AI researchers are developing world models as an alternative to large language models. Computer scientists like Fei-Fei Li and Yann LeCun are building systems that mimic human mental constructs to anticipate outcomes.
Unlike LLMs that determine outputs based on statistical relationships between words, world models aim to understand and predict physical reality. These systems face data challenges but offer applications in robotics, healthcare, and creative fields.
What Are World Models?
World models represent a fundamental shift in artificial intelligence research. Unlike large language models that process text through statistical patterns, these systems attempt to mimic the mental constructs humans create to understand their environment.
As OpenAI, Anthropic, and major technology companies invest billions in language models, a smaller group of elite researchers is pursuing what they consider the next breakthrough. The core concept involves creating AI systems that anticipate what will happen next, similar to how humans use intuition based on experience.
MIT professor Jay Wright Forrester explained this concept in his 1971 paper, noting that humans constantly use mental models for decision-making. These models represent selected concepts and relationships rather than containing actual reality. If AI is to surpass human intelligence, researchers believe it must develop similar modeling capabilities.
"Humans not only do we survive, live, and work, but we build civilization beyond language."
— Fei-Fei Li
Fei-Fei Li's World Labs
Fei-Fei Li, the Stanford professor famous for inventing ImageNet, co-founded World Labs in 2024 with initial backing of $230 million from venture firms including Andreessen Horowitz, New Enterprise Associates, and Radical Ventures.
The company's stated mission is to lift AI models from the 2D plane of pixels to full 3D worlds, both virtual and real, endowing them with spatial intelligence as rich as our own. Li defines spatial intelligence as the ability to understand, reason, interact, and generate 3D worlds.
Li sees applications for world models in several areas:
- Creative fields requiring infinite universes
- Robotics and physical interaction
- Any domain needing complex 3D reasoning
The primary challenge is data scarcity. Unlike language, which humans have refined over centuries, spatial intelligence is less developed. Li notes that creating detailed 3D models of one's immediate environment is surprisingly difficult without training. Gathering sufficient data requires sophisticated engineering, acquisition, processing, and synthesis.
Yann LeCun's Advanced Machine Intelligence
Yann LeCun, Meta's outgoing chief AI scientist, is launching Advanced Machine Intelligence (AMI Labs) to build world models he considers more competent than LLMs. LeCun argues these systems possess common sense, reasoning capacity, planning abilities, and persistent memory.
In a November LinkedIn post, LeCun stated AMI Labs aims to bring about the next big revolution in AI: systems that understand the physical world, have persistent memory, can reason, and can plan complex action sequences.
On December 19, LeCun announced he recruited Alex LeBrun, co-founder and CEO of Nabla, as CEO of AMI Labs. LeBrun stated that healthcare AI is entering an era where reliability, determinism, and simulation matter as much as linguistic intelligence. He added that access to world model technology will complement today's LLMs and help unlock safe, autonomous systems for clinicians.
Prior to launching AMI Labs, LeCun worked on similar research at Meta using video data to train models. The approach involves running simulations that abstract videos at different levels rather than predicting at the pixel level. This creates an abstract representation that eliminates unpredictable details while enabling predictions within that representation.
Moonvalley and Industry Applications
Moonvalley, founded by former DeepMind researchers, is quietly developing world models for generative AI video. In March, the company unveiled Marey, its first video-generation model.
Mateusz Malinowski, Moonvalley's chief scientific officer, explained that the company is thinking about world models and visual multimodal intelligence. The goal is to move beyond purely visual systems into models that understand not just what they see, but how the world works.
Applications for world models include:
- Humanoid robotics
- Real-world planning
- Filmmaking with motion modeling
- Soft bodies modeling
Malinowski noted that while world models share long-term goals, approaches differ between companies. Moonvalley focuses on using video models as first-class citizens, where spatial intelligence is more implicit. This approach appears more suited for filmmaking and robotics in the short term due to motion and soft bodies modeling capabilities.
"We aim to lift AI models from the 2D plane of pixels to full 3D worlds — both virtual and real — endowing them with spatial intelligence as rich as our own."
— World Labs
"If I ask you to close your eyes right now and draw out or build a 3D model of the environment around you, it's not that easy."
— Fei-Fei Li
"We require more and more sophisticated data engineering, data acquisition, data processing, and data synthesis."
— Fei-Fei Li
"Bring about the next big revolution in AI: systems that understand the physical world, have persistent memory, can reason, and can plan complex action sequences."
— Yann LeCun
"Healthcare AI is entering a new era, one where reliability, determinism, and simulation matter as much as linguistic intelligence."
— Alex LeBrun
"The basic idea is that you don't predict at the pixel level. You train a system to run an abstract representation of the video so that you can make predictions in that abstract representation, and hopefully this representation will eliminate all the details that cannot be predicted."
— Yann LeCun
"We're thinking about world models and visual multimodal intelligence. We want to move beyond purely visual systems into something broader — models that understand not just what they see, but how the world works."
— Mateusz Malinowski
