RoboPoint Model
RoboPoint is a specialized multimodal model for robotic perception and understanding. With InferX, you can run RoboPoint on any device using the same API - perfect for robotics applications across different hardware platforms.Features
- Multimodal Understanding: Combines vision and language for comprehensive scene understanding
- Keypoint Detection: Identifies manipulation points for robotic grasping
- Cross-Platform: Same code works on Jetson, GPU, or CPU
- Real-time Performance: Optimized for robotics applications
- Natural Language: Understands human instructions in natural language
Installation
RoboPoint is included with InferX:Basic Usage
Advanced Usage
Interactive Robot Control
Batch Processing for Dataset Creation
Example Instructions
RoboPoint understands various natural language instructions:- Grasping: “Find the best grasping points on the bottle”
- Manipulation: “Identify areas where I can safely manipulate this object”
- Spatial Understanding: “Show me empty spaces on the table”
- Object-specific: “Find spots on the rightmost plate where I can place items”
Performance
InferX optimizes RoboPoint for your hardware:Hardware | Inference Time | Memory Usage |
---|---|---|
Jetson AGX Orin | ~150ms | ~3GB |
RTX 4090 | ~60ms | ~4GB |
Intel i7 CPU | ~800ms | ~2GB |
Response Format
Integration with Robot Frameworks
ROS Integration
Hardware Detection
Example Applications
- Pick and Place: Identify optimal grasping points for robotic arms
- Object Manipulation: Find safe areas to push, pull, or rotate objects
- Scene Understanding: Understand spatial relationships for navigation
- Human-Robot Interaction: Interpret natural language commands
Next Steps
- Explore CLIP model for general image understanding
- Check out practical robotics examples
- Learn about combining models for advanced workflows