RoboPoint

RoboPoint is a powerful multimodal model optimized for robotic perception and understanding. It combines visual and language capabilities to enable robots to understand their environment and follow instructions.

Environment Setup

Before using RoboPoint, you need to set up your environment:

  1. First, install the Exla SDK by following the installation instructions

  2. Clone the examples repository:

git clone https://github.com/exla-ai/exla-sdk-examples.git
  1. Navigate to the RoboPoint example directory:
cd exla-sdk-examples/robopoint

Running RoboPoint

You can run the RoboPoint example with a single command:

python example_robopoint.py

This will:

  1. Load the optimized RoboPoint model
  2. Process sample images
  3. Demonstrate multimodal capabilities

Example Code

Here’s a simplified version of how to use RoboPoint in your own projects:

from exla.models.robopoint import robopoint

model = robopoint()

model.inference(
    image_path="data/sink.jpg",
    text_instruction="Find a few spots within the vacant area on the rightmost white plate.",
    output="data/sink_output.png"
)   

Key Features

  • Multimodal Understanding: Combines vision and language for comprehensive scene understanding
  • Object Detection: Identifies and localizes objects in the environment
  • Instruction Following: Interprets natural language instructions in the context of visual scenes
  • Spatial Reasoning: Understands spatial relationships between objects
  • Hardware Optimized: Runs efficiently on NVIDIA GPUs

Use Cases

  • Robotic Manipulation: Identify objects for grasping and manipulation
  • Navigation: Understand the environment for safe navigation
  • Human-Robot Interaction: Follow natural language instructions
  • Scene Understanding: Analyze complex scenes and relationships between objects

Next Steps