Embodied AI: The Next Intelligence Paradigm

I believe the next paradigm after training models for thinking and reasoning is solving human embodiment tasks to build AGI. This includes both digital and physical tasks that are economically valuable.

In the digital world, using computers ought to be the most general forms of digital intelligence that requires a strong world model “VLMs” as well as an intuition for taking actions and sequencing to solve tasks.

My expectation is that we will solve the computer use problem before the physical intelligence problem due to the data bottleneck. But there’s a fundamental overlap between the two intelligence dimensions that will make transferring progress from the digital world into the physical world valuable.

The reason I think embodied AI is the best next paradigm to focus research and engineering on is it puts training efforts into focusing on solving for System 1/System 2 thinking. Moreover, I’m a strong believer in the Fiverr eval instead of “university tests” as a measure of model capabilities — that is human valuable real world activities.

Solving problems on computers requires reasoning, long term planning with short-term sub-goals, verification of task completion, and a mental ability to perform economically valuable tasks.

Another reason I’m super excited to work on computer use models is that they’re the closest form of creating a digital twin to ourselves. Instead of creating agents with their own tools and environments, what if an agent can use my computer and all my personal workspace to do my work for me and under my supervision? This is what I imagine an LLM OS to look like.

It’s essentially the superset of all agents to help with our work. Instead of building more vertical AI tools, why not build a digital agent (or clone) of ourselves that can use our IDEs and browsers and is familiar with our life to guide and do the work for us?

One of the less discussed aspects of Cursor’s success, in my opinion, is their product design that puts software engineers in the steering wheel for action completion which I think explains their wide adoption compared to AI software engineers like Devin. And I think that is the future we should be designing our AI agents for.

Instead of building replacements, design tools that empower humans and put them as the orchestrators!