Android SDK
Run Exla’s Optimized Llama 3.2 3B locally on your Android device
Exla AI SDK for Android
The Exla Android SDK allows you to run state-of-the-art AI text generation directly on your Android device, without requiring an internet connection for inference.
Demo App
We provide a simple Android application that demonstrates the usage of the Exla AI SDK:
Features
- Download and initialize the Llama 3.2 3B model on your device
- Enter prompts and generate AI responses locally (no internet required for inference)
- Simple and intuitive user interface
Demo
To view a demo of using the SDK, take a look at the example: https://github.com/exla-ai/exla-android-sdk-example
Replace the username in this file with the auth token provided to you.
Requirements
- Android 7.0 (API level 24) or higher
- Approximately 2.5GB of free storage space
- At least 4GB of RAM recommended
Setup Instructions
JitPack API Token
We will provide you with a JitPack API token for accessing the SDK. When you receive the token:
- Add it to your project’s
settings.gradle.kts
file as shown in the Installation section - The token allows you to download the Exla AI SDK from JitPack
Build and Run
- Open the project in Android Studio
- Sync project with Gradle files
- Build and run the app on your device or emulator
Usage
- Launch the app
- Press the “Download Model” button to initialize the SDK, which will automatically download the Llama 3.2 3B model
- Wait for the download and initialization to complete (~1.3GB download)
- Enter your prompt in the text field
- Press “Ask AI” to generate a response
- The AI-generated response will appear in the response section
UI Design
The app features a clean, simple UI with:
- Status display at the top
- Download button and progress bar
- Text input for questions
- Response area that scrolls for longer responses
Privacy
All text generation happens entirely on your device. The app only connects to the internet once to download the model file. No data is sent to remote servers during text generation.
SDK Features
- 🚀 Fully On-Device: All processing happens on your device, ensuring privacy and offline usage
- 📱 Optimized for Mobile: Uses quantized models specifically designed for mobile devices
- ⚡ Fast Responses: Efficient implementation for quick text generation
- 🧠 Llama 3.2 3B: Powered by Meta’s latest Llama 3.2 3B Instruct model
Installation
Step 1: Add JitPack Repository
Add the JitPack repository to your project’s settings.gradle.kts
using the token we provide:
Step 2: Add SDK Dependency
Add the Exla AI SDK dependency to your app’s build.gradle.kts
:
Step 3: Add Required Permissions
Ensure your AndroidManifest.xml
includes the necessary permissions:
Using the SDK
1. Initialize the SDK
First, get an instance of the SDK:
The SDK follows the singleton pattern, so you only need to get the instance once.
2. Download the Model
The SDK automatically downloads the Llama 3.2 3B model when initialized. This only needs to be done once as the model will be cached for future use:
You don’t need to specify the model or manage its download location - the SDK handles everything automatically. The download process requires an internet connection, so make sure to verify network connectivity before starting.
3. Check if the Model is Ready
After initialization, you can check if the model is ready to use:
4. Generate Text
Once the model is ready, you can generate text responses:
Note that the response callback may run on a background thread, so you should use runOnUiThread
to update UI elements.
Model Details
The SDK automatically downloads and uses Llama 3.2 3B Instruct (Q2_K.gguf) - a 2-bit quantized version of the Llama 3.2 3B model optimized for mobile devices.
- Model Size: ~1.3GB download
- Capabilities: Text completion, question answering, simple reasoning
- Languages: Primarily English, with limited support for other languages
- Context Window: 2048 tokens
Complete Example
Here’s a basic example of how to use the Exla AI SDK: