CLIP Model

The CLIP (Contrastive Language-Image Pretraining) model is a powerful multimodal model that connects text and images. With InferX, you can run CLIP on any device using the same API - whether it’s a Jetson, GPU server, or CPU-only system.

Features

  • Universal API: Same code works on Jetson, GPU, or CPU
  • Hardware-Optimized: Automatically detects your hardware and uses the appropriate implementation
  • Real-time Processing: Optimized for fast inference across all platforms
  • Zero Configuration: No setup required - just import and run

Installation

CLIP is included with InferX. No separate installation required.

pip install git+https://github.com/exla-ai/InferX.git

Basic Usage

from inferx.models.clip import clip
import json

# Initialize the model (automatically detects your hardware)
model = clip()

# Run inference
results = model.inference(
    image_paths=["path/to/image1.jpg", "path/to/image2.jpg"],
    text_queries=["a photo of a dog", "a photo of a cat", "a photo of a bird"]
)

# Print results
print(json.dumps(results, indent=2))

Advanced Usage

Processing Multiple Images

from inferx.models.clip import clip

# Process a list of images
images = [
    "path/to/image1.jpg",
    "path/to/image2.jpg",
    "path/to/image3.jpg"
]

# Or load images from a text file (one path per line)
images = "path/to/image_list.txt"

model = clip()
results = model.inference(
    image_paths=images,
    text_queries=["query1", "query2", "query3"]
)

Batch Processing

from inferx.models.clip import clip
import os

# Initialize model
model = clip()

# Process directory of images
image_directory = "path/to/images/"
image_paths = [
    os.path.join(image_directory, f) 
    for f in os.listdir(image_directory) 
    if f.endswith(('.jpg', '.png', '.jpeg'))
]

text_queries = [
    "a photo of a dog",
    "a photo of a cat", 
    "a landscape photo",
    "a person walking"
]

results = model.inference(
    image_paths=image_paths,
    text_queries=text_queries
)

Performance

InferX automatically optimizes CLIP for your hardware:

HardwareTypical Inference TimeMemory Usage
Jetson AGX Orin~50ms~2GB
RTX 4090~20ms~3GB
Intel i7 CPU~200ms~1GB

Response Format

[
  {
    "a photo of a dog": [
      {
        "image_path": "data/dog.png",
        "score": "23.1011"
      },
      {
        "image_path": "data/cat.png",
        "score": "17.1396"
      }
    ]
  }
]

Hardware Detection

InferX automatically detects and optimizes for your hardware:

✨ InferX - CLIP Model ✨
🔍 Device Detected: AGX_ORIN
⠏ [0.5s] Initializing InferX Optimized CLIP model for AGX_ORIN [GPU Mode]
✓ [0.6s] Ready for inference

Error Handling

from inferx.models.clip import clip

try:
    model = clip()
    results = model.inference(
        image_paths=["nonexistent.jpg"],
        text_queries=["test query"]
    )
except FileNotFoundError:
    print("Image file not found")
except Exception as e:
    print(f"Error: {e}")

Next Steps