CLIP Model
The CLIP (Contrastive Language-Image Pretraining) model is a powerful multimodal model that connects text and images. With InferX, you can run CLIP on any device using the same API - whether it’s a Jetson, GPU server, or CPU-only system.
Features
- Universal API: Same code works on Jetson, GPU, or CPU
- Hardware-Optimized: Automatically detects your hardware and uses the appropriate implementation
- Real-time Processing: Optimized for fast inference across all platforms
- Zero Configuration: No setup required - just import and run
Installation
CLIP is included with InferX. No separate installation required.
pip install git+https://github.com/exla-ai/InferX.git
Basic Usage
from inferx.models.clip import clip
import json
# Initialize the model (automatically detects your hardware)
model = clip()
# Run inference
results = model.inference(
image_paths=["path/to/image1.jpg", "path/to/image2.jpg"],
text_queries=["a photo of a dog", "a photo of a cat", "a photo of a bird"]
)
# Print results
print(json.dumps(results, indent=2))
Advanced Usage
Processing Multiple Images
from inferx.models.clip import clip
# Process a list of images
images = [
"path/to/image1.jpg",
"path/to/image2.jpg",
"path/to/image3.jpg"
]
# Or load images from a text file (one path per line)
images = "path/to/image_list.txt"
model = clip()
results = model.inference(
image_paths=images,
text_queries=["query1", "query2", "query3"]
)
Batch Processing
from inferx.models.clip import clip
import os
# Initialize model
model = clip()
# Process directory of images
image_directory = "path/to/images/"
image_paths = [
os.path.join(image_directory, f)
for f in os.listdir(image_directory)
if f.endswith(('.jpg', '.png', '.jpeg'))
]
text_queries = [
"a photo of a dog",
"a photo of a cat",
"a landscape photo",
"a person walking"
]
results = model.inference(
image_paths=image_paths,
text_queries=text_queries
)
InferX automatically optimizes CLIP for your hardware:
Hardware | Typical Inference Time | Memory Usage |
---|
Jetson AGX Orin | ~50ms | ~2GB |
RTX 4090 | ~20ms | ~3GB |
Intel i7 CPU | ~200ms | ~1GB |
[
{
"a photo of a dog": [
{
"image_path": "data/dog.png",
"score": "23.1011"
},
{
"image_path": "data/cat.png",
"score": "17.1396"
}
]
}
]
Hardware Detection
InferX automatically detects and optimizes for your hardware:
✨ InferX - CLIP Model ✨
🔍 Device Detected: AGX_ORIN
⠏ [0.5s] Initializing InferX Optimized CLIP model for AGX_ORIN [GPU Mode]
✓ [0.6s] Ready for inference
Error Handling
from inferx.models.clip import clip
try:
model = clip()
results = model.inference(
image_paths=["nonexistent.jpg"],
text_queries=["test query"]
)
except FileNotFoundError:
print("Image file not found")
except Exception as e:
print(f"Error: {e}")
Next Steps