InternVL2.5 multimodal vision-language model optimized for Jetson and other devices
InternVL2.5 is a powerful multimodal vision-language model that connects visual and textual understanding. It can process both images and text, enabling capabilities like visual question answering, image captioning, and multimodal reasoning.
Hardware-Optimized: Automatically detects your hardware and uses the appropriate implementation:
Real-time Progress Indicators: Visual feedback with rotating spinners and timing information
Resource Monitoring: Built-in monitoring of system resources (memory usage, GPU utilization)
Multiple Tasks: Support for visual question answering, image captioning, and multimodal reasoning
The InternVL2.5 model is included with InferX:
internvl2_5()
Factory function that returns the appropriate InternVL2.5 model based on the detected hardware.
Returns:
model.vqa(image_path=None, image_paths=None, question=None, timeout=300, debug=False)
Runs visual question answering on the provided image(s) with the given question.
Parameters:
image_path
(str): Path to a single imageimage_paths
(list): List of image paths for batch processingquestion
(str): Question to ask about the image(s)timeout
(int): Maximum time in seconds to wait for inference (default: 300)debug
(bool): Whether to print detailed debug information (default: False)Returns:
model.caption(image_path=None, image_paths=None, timeout=300, debug=False)
Generates captions for the provided image(s).
Parameters:
image_path
(str): Path to a single imageimage_paths
(list): List of image paths for batch processingtimeout
(int): Maximum time in seconds to wait for inference (default: 300)debug
(bool): Whether to print detailed debug information (default: False)Returns:
model.reason(image_path, prompt, timeout=300, debug=False)
Performs multimodal reasoning on the provided image with the given prompt.
Parameters:
image_path
(str): Path to an imageprompt
(str): Reasoning prompt or instructiontimeout
(int): Maximum time in seconds to wait for inference (default: 300)debug
(bool): Whether to print detailed debug information (default: False)Returns:
model.install_nvidia_pytorch()
Explicitly installs NVIDIA’s PyTorch wheel for optimal performance on Jetson devices.
Returns:
bool
: True if installation was successful, False otherwiseThe InternVL2.5 model provides rich visual feedback during execution:
InternVL2.5 multimodal vision-language model optimized for Jetson and other devices
InternVL2.5 is a powerful multimodal vision-language model that connects visual and textual understanding. It can process both images and text, enabling capabilities like visual question answering, image captioning, and multimodal reasoning.
Hardware-Optimized: Automatically detects your hardware and uses the appropriate implementation:
Real-time Progress Indicators: Visual feedback with rotating spinners and timing information
Resource Monitoring: Built-in monitoring of system resources (memory usage, GPU utilization)
Multiple Tasks: Support for visual question answering, image captioning, and multimodal reasoning
The InternVL2.5 model is included with InferX:
internvl2_5()
Factory function that returns the appropriate InternVL2.5 model based on the detected hardware.
Returns:
model.vqa(image_path=None, image_paths=None, question=None, timeout=300, debug=False)
Runs visual question answering on the provided image(s) with the given question.
Parameters:
image_path
(str): Path to a single imageimage_paths
(list): List of image paths for batch processingquestion
(str): Question to ask about the image(s)timeout
(int): Maximum time in seconds to wait for inference (default: 300)debug
(bool): Whether to print detailed debug information (default: False)Returns:
model.caption(image_path=None, image_paths=None, timeout=300, debug=False)
Generates captions for the provided image(s).
Parameters:
image_path
(str): Path to a single imageimage_paths
(list): List of image paths for batch processingtimeout
(int): Maximum time in seconds to wait for inference (default: 300)debug
(bool): Whether to print detailed debug information (default: False)Returns:
model.reason(image_path, prompt, timeout=300, debug=False)
Performs multimodal reasoning on the provided image with the given prompt.
Parameters:
image_path
(str): Path to an imageprompt
(str): Reasoning prompt or instructiontimeout
(int): Maximum time in seconds to wait for inference (default: 300)debug
(bool): Whether to print detailed debug information (default: False)Returns:
model.install_nvidia_pytorch()
Explicitly installs NVIDIA’s PyTorch wheel for optimal performance on Jetson devices.
Returns:
bool
: True if installation was successful, False otherwiseThe InternVL2.5 model provides rich visual feedback during execution: