RoboPoint

RoboPoint is a powerful multimodal model optimized for robotic perception and understanding. It combines visual and language capabilities to enable robots to understand their environment and follow instructions.

Environment Setup

Before using RoboPoint, you need to set up your environment:

First, install the Exla SDK by following the installation instructions
Clone the examples repository:

git clone https://github.com/exla-ai/exla-sdk-examples.git

Navigate to the RoboPoint example directory:

cd exla-sdk-examples/robopoint

Running RoboPoint

You can run the RoboPoint example with a single command:

python example_robopoint.py

This will:

Load the optimized RoboPoint model
Process sample images
Demonstrate multimodal capabilities

Example Code

Here’s a simplified version of how to use RoboPoint in your own projects:

from exla.models.robopoint import robopoint

model = robopoint()

model.inference(
    image_path="data/sink.jpg",
    text_instruction="Find a few spots within the vacant area on the rightmost white plate.",
    output="data/sink_output.png"
)   

Key Features

Multimodal Understanding: Combines vision and language for comprehensive scene understanding
Object Detection: Identifies and localizes objects in the environment
Instruction Following: Interprets natural language instructions in the context of visual scenes
Spatial Reasoning: Understands spatial relationships between objects
Hardware Optimized: Runs efficiently on NVIDIA GPUs

Use Cases

Robotic Manipulation: Identify objects for grasping and manipulation
Navigation: Understand the environment for safe navigation
Human-Robot Interaction: Follow natural language instructions
Scene Understanding: Analyze complex scenes and relationships between objects

Next Steps

Explore the full example code
Learn about optimizing your own models
Check out other multimodal models

Multimodal

Large Language Models

Computer Vision

Audio

Optimize your own models

RoboPoint VLA

RoboPoint

Environment Setup

Running RoboPoint

Example Code

Key Features

Use Cases

Next Steps

Multimodal

Large Language Models

Computer Vision

Audio

Optimize your own models

​RoboPoint

​Environment Setup

​Running RoboPoint

​Example Code

​Key Features

​Use Cases

​Next Steps

RoboPoint

Environment Setup

Running RoboPoint

Example Code

Key Features

Use Cases

Next Steps