Embodied intelligence for the home

A world model for home robotics.

Arkona builds the predictive core that lets a robot understand a physical space, imagine the consequences of its actions, and act with the dexterity a home demands. We are teaching machines to plan in the real world before they touch it.

See our first challenge How the world model works

Flagship challenge

10k+

Simulated build episodes

Berlin

Where we build

Perception · Prediction · Planning · Dexterous manipulation — unified in one learned model.

The core idea

We learn how the physical world responds — then we act.

A world model is a learned, predictive simulator of the robot's environment. Instead of reacting frame by frame, Arkona's robot imagines the outcome of each candidate action and chooses the one most likely to succeed. This is what makes reliable behaviour possible in the messy, unscripted setting of a real home.

Figure 1 — Arkona's perception-to-action loop, built around a learned world model

Predict before acting

Every motion is rehearsed inside the model first. The robot commits only to plans it expects to succeed, then corrects in real time as reality diverges.

Generalises across tasks

Because the model captures physics — contact, weight, friction, occlusion — skills transfer between builds, objects and rooms instead of being hand-scripted one by one.

Recovers from error

When a brick slips or a step fails, the loop detects the mismatch and re-plans — the same way a person notices a mistake and tries again.

First challenge

A robot that builds LEGO sets from the instructions.

We chose LEGO assembly as Arkona's flagship benchmark because it compresses the hardest problems in home robotics into one tractable task: reading a visual instruction, finding the right part, grasping it precisely, and placing it with sub-millimetre alignment — step after step, recovering when something goes wrong.

Master a LEGO manual and you have mastered the core of loading a dishwasher, tidying a shelf, or assembling flat-pack furniture.

Instruction followingPrecise manipulationMulti-step planningError recoveryVisual grounding

Partner with us

Figure 2 — From printed step to verified placement

The assembly pipeline, end to end

Parse the instruction page

A vision model reads each printed step, identifying the parts called out and the target sub-assembly.

Decompose into actions

The step is broken into an ordered sequence of pick-and-place sub-goals with explicit success criteria.

Locate & grasp the brick

Perception finds the correct part in the bin; the world model selects a stable grasp and approach.

Predict the placement

The model imagines candidate placements and picks the one that aligns studs and clears collisions.

Place & seat the brick

Force-aware control presses the brick home, sensing the click of a successful connection.

Verify, then continue or retry

The result is checked against the instruction. A mismatch triggers re-planning before the next step.

Technology pillars

Four capabilities, one model.

Arkona's stack is built so that perception, prediction, planning and control share a single learned representation of the world — not four brittle systems stitched together.

Multimodal perception

Fuses colour, depth and touch into a coherent 3-D understanding of the scene and the objects in it.

Predictive world model

Rolls out imagined futures so the robot can weigh actions against their likely physical outcomes.

Instruction grounding

Connects human instructions — printed steps, language, diagrams — to concrete actions in the world.

Dexterous control

Force-aware manipulation that grasps, aligns and seats parts with the precision a home demands.

Prototype

Meet the testbed: Arkona P-1.

Our first-generation research cell pairs a 6-axis manipulator with overhead and wrist cameras above an instrumented build surface — the platform where the world model meets real bricks.

Figure 3 — Arkona P-1 research cell (schematic)

Figure 4 — Safe operation alongside people and pets at home

The same world model that masters a LEGO build is what lets the robot work calmly and predictably around people — sensing its surroundings continuously and stopping the instant something unexpected enters its space.

Built for the home

AI-first — and safe, secure, reliable by design.

Modern AI is the foundation everything else stands on. On top of that base, three principles govern how the robot behaves in your home — not bolted on afterwards, but part of how the system perceives, predicts and acts.

AI foundation

Large multimodal models give the robot broad commonsense and language — the bedrock its perception, world model and control are built on and continually improved with.

Safety

Force-limited, collision-aware motion with hardware e-stops. The world model predicts contact before it happens, so the robot slows and stops around people and pets.

Security

Perception runs on-device — your home isn't streamed to the cloud. What does leave is minimal and end-to-end encrypted, with privacy built into the data model from day one.

Reliability

Redundant sensing and continuous self-checks mean the robot knows when it's unsure. It cross-validates before acting, recovers from errors, and behaves predictably every time.

Where we are headed

From a single brick to the whole home.

NOW

Single-step placement

Reliable grasp-and-place of individual bricks under the world model.

Full set assembly

Complete a small LEGO set end to end from its printed manual.

LATER

Unseen sets

Generalise to manuals and parts the robot has never encountered before.

VISION

Everyday home tasks

Carry the same world model to tidying, loading and assembling around the house.

Building the future of home robotics?

We are talking to researchers, hardware partners and early collaborators who want to help teach robots to understand the physical world.

Get in touch