# The Robotics AI Stack: Putting It All Together

These three model types are not independent; they work together in a cohesive robotics AI stack to create intelligent behavior.

Here’s a simplified workflow:

1. **Perception (The Eyes 👀)**: The robot's camera captures an image. The Foundation Vision Model, pre-trained on a massive dataset like the one from OVER, processes this image to perform 3D reconstruction and semantic segmentation. This creates an immediate, detailed, and metrically accurate understanding of the surrounding scene.
2. **Prediction & Planning (The Imagination 🧠):** This rich, real-time perception data is fed into the World Model. The world model, whose physics and environmental rules were also learned from realistic data, updates its internal representation and simulates future possibilities to plan the best course of action.
3. **Action (The Ears & Hands 👂✋):** A human gives a command, such as "bring me the apple from the kitchen." The VLA Model interprets this command in the context of the world model's understanding of the environment. It works with the world model to devise a safe and efficient plan, which is then translated into low-level motor commands for the robot's actuators to execute.

In this stack, a dataset from OVER provides the essential, high-quality pre-training foundation for the perception models and the ground truth for building the predictive world models, enabling the entire system to function with a high degree of real-world understanding.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.overthereality.ai/over-wiki/physical-ai/the-robotics-ai-stack-putting-it-all-together.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
