Exla AI SDK for Android

The Exla Android SDK allows you to run state-of-the-art AI text generation directly on your Android device, without requiring an internet connection for inference.

Demo App

We provide a simple Android application that demonstrates the usage of the Exla AI SDK:

Features

  • Download and initialize the Llama 3.2 3B model on your device
  • Enter prompts and generate AI responses locally (no internet required for inference)
  • Simple and intuitive user interface

Demo

To view a demo of using the SDK, take a look at the example: https://github.com/exla-ai/exla-android-sdk-example

Replace the username in this file with the auth token provided to you.

Requirements

  • Android 7.0 (API level 24) or higher
  • Approximately 2.5GB of free storage space
  • At least 4GB of RAM recommended

Setup Instructions

JitPack API Token

We will provide you with a JitPack API token for accessing the SDK. When you receive the token:

  1. Add it to your project’s settings.gradle.kts file as shown in the Installation section
  2. The token allows you to download the Exla AI SDK from JitPack

Build and Run

  1. Open the project in Android Studio
  2. Sync project with Gradle files
  3. Build and run the app on your device or emulator

Usage

  1. Launch the app
  2. Press the “Download Model” button to initialize the SDK, which will automatically download the Llama 3.2 3B model
  3. Wait for the download and initialization to complete (~1.3GB download)
  4. Enter your prompt in the text field
  5. Press “Ask AI” to generate a response
  6. The AI-generated response will appear in the response section

UI Design

The app features a clean, simple UI with:

  • Status display at the top
  • Download button and progress bar
  • Text input for questions
  • Response area that scrolls for longer responses

Privacy

All text generation happens entirely on your device. The app only connects to the internet once to download the model file. No data is sent to remote servers during text generation.

SDK Features

  • 🚀 Fully On-Device: All processing happens on your device, ensuring privacy and offline usage
  • 📱 Optimized for Mobile: Uses quantized models specifically designed for mobile devices
  • Fast Responses: Efficient implementation for quick text generation
  • 🧠 Llama 3.2 3B: Powered by Meta’s latest Llama 3.2 3B Instruct model

Installation

Step 1: Add JitPack Repository

Add the JitPack repository to your project’s settings.gradle.kts using the token we provide:

dependencyResolutionManagement {
    repositoriesMode.set(RepositoriesMode.FAIL_ON_PROJECT_REPOS)
    repositories {
        google()
        mavenCentral()
        maven {
            url = uri("https://jitpack.io")
            credentials {
                username = "jp_xxxxxxxxxxxxxxxxxxxxxxxx" // We will provide this JitPack API token
            }
        }
    }
}

Step 2: Add SDK Dependency

Add the Exla AI SDK dependency to your app’s build.gradle.kts:

dependencies {
    implementation("com.github.exla-ai:exla-android-sdk:407fc21173")
}

Step 3: Add Required Permissions

Ensure your AndroidManifest.xml includes the necessary permissions:

<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.ACCESS_NETWORK_STATE" />

Using the SDK

1. Initialize the SDK

First, get an instance of the SDK:

// Get the SDK instance
val sdk = ExlaAiSdk.getInstance(applicationContext)

The SDK follows the singleton pattern, so you only need to get the instance once.

2. Download the Model

The SDK automatically downloads the Llama 3.2 3B model when initialized. This only needs to be done once as the model will be cached for future use:

sdk.initialize(
    progressCallback = { progress -> 
        // Update UI with download progress (0-100)
        runOnUiThread {
            progressBar.progress = progress
            statusTextView.text = "Downloading model: $progress%"
        }
    },
    completionCallback = { success ->
        runOnUiThread {
            if (success) {
                // Model loaded successfully
                statusTextView.text = "Model ready!"
            } else {
                // Model download or initialization failed
                statusTextView.text = "Model initialization failed"
            }
        }
    }
)

You don’t need to specify the model or manage its download location - the SDK handles everything automatically. The download process requires an internet connection, so make sure to verify network connectivity before starting.

3. Check if the Model is Ready

After initialization, you can check if the model is ready to use:

if (sdk.isReady()) {
    // Model is loaded and ready for inference
} else {
    // Model is not yet loaded
}

4. Generate Text

Once the model is ready, you can generate text responses:

sdk.askAI(prompt) { response ->
    runOnUiThread {
        // Update UI with the generated text
        outputTextView.text = response
    }
}

Note that the response callback may run on a background thread, so you should use runOnUiThread to update UI elements.

Model Details

The SDK automatically downloads and uses Llama 3.2 3B Instruct (Q2_K.gguf) - a 2-bit quantized version of the Llama 3.2 3B model optimized for mobile devices.

  • Model Size: ~1.3GB download
  • Capabilities: Text completion, question answering, simple reasoning
  • Languages: Primarily English, with limited support for other languages
  • Context Window: 2048 tokens

Complete Example

Here’s a basic example of how to use the Exla AI SDK:

class ExlaAiDemoActivity : AppCompatActivity() {
    private lateinit var sdk: ExlaAiSdk
    
    override fun onCreate(savedInstanceState: Bundle?) {
        super.onCreate(savedInstanceState)
        
        // Initialize the SDK
        sdk = ExlaAiSdk.getInstance(applicationContext)
        
        // Check if model is already downloaded
        if (sdk.isReady()) {
            // Model is ready, you can start using it
            generateAIResponse("What is the capital of France?")
        } else {
            // Download the model
            downloadModel()
        }
    }
    
    private fun downloadModel() {
        // Start model download and initialization
        sdk.initialize(
            progressCallback = { progress -> 
                // Handle download progress (0-100)
                Log.d("ExlaAI", "Download progress: $progress%")
            },
            completionCallback = { success ->
                if (success) {
                    // Model downloaded and initialized successfully
                    Log.d("ExlaAI", "Model ready")
                    
                    // Now you can start using the model
                    generateAIResponse("What is the capital of France?")
                } else {
                    // Model download or initialization failed
                    Log.d("ExlaAI", "Model initialization failed")
                }
            }
        )
    }
    
    private fun generateAIResponse(prompt: String) {
        // Generate AI response
        sdk.askAI(prompt) { response ->
            // Handle the generated response
            Log.d("ExlaAI", "AI Response: $response")
        }
    }
}