---
title: The assistant runtime
description: Build a voice assistant on smart glasses with glasses.assistant — wake/sleep, tools, vision, barge-in, memory, on the managed AI gateway. Phase-4 preview.
type: concept
platform: android
vendor: meta
related:
  - /docs/guides/voice-assistant
  - /docs/concepts/ai-gateway
  - /docs/guides/voice-triggers
  - /docs/concepts/display
  - /docs/concepts/capabilities
  - /docs/reference/errors
---

The **assistant runtime** (`glasses.assistant.*`) is the canonical way to build a voice AI on Extentos. It wraps an end-to-end speech-to-speech provider — OpenAI Realtime by default — so the model itself owns wake detection, turn-taking, intent parsing, and confirmation speech. Your code shrinks to one block: declare `instructions`, register `tool(name, description) { ... }` bodies that act on your app's own state, and wire a wake trigger. The model decides which tool to call from each tool's natural-language description; there is no keyword routing, no `when (transcript)` ladder, no spec file.

> **Preview snapshot — not on Maven Central yet.** The assistant runtime ships in the **`1.4.0-phase4-dogfood`** preview snapshot, resolved via `mavenLocal()` — it is **not** on Maven Central (the published Android SDK is `1.3.0`, which does not include `glasses.assistant.*`). Build and publish the snapshot locally to dogfood it; see [SDK install](/docs/sdk/android/install). **iOS is pending** — the Swift port of the wake/sleep state machine is in flight and not at parity. This page documents the Android surface.

## The shape of an assistant

```kotlin
import com.extentos.glasses.core.assistant.AssistantProvider
import com.extentos.glasses.core.assistant.AssistantSession
import com.extentos.glasses.core.assistant.ToolResult
import com.extentos.glasses.core.assistant.tool

class RunCompanion(private val glasses: ExtentosGlasses, private val scope: CoroutineScope) {
    private var session: AssistantSession? = null

    fun start() = scope.launch {
        session = glasses.assistant.start(
            // model + voice come from your dashboard Agent settings; pass them
            // here only to hard-pin in code (a code value wins over the dashboard).
            // Defaults if neither is set: gpt-realtime-2 + alloy.
            provider = AssistantProvider.OpenAi(),
        ) {
            instructions = "You are a running companion. Speak briefly — they're running."

            tool("get_pace", "The runner's current average pace in minutes per km.") {
                ToolResult.Ok("${'$'}{routeTracker.avgPaceMinKm()} min per km")
            }
        }
        // Any trigger that calls session.wake() works — here, a wake phrase.
        glasses.voice.onPhrase("hey coach") { session?.wake() }
    }
}
```

Two registration forms exist and produce identical behavior:

- **Sugar** — `glasses.assistant.start(provider) { ... }`. The trailing lambda is an `AssistantConfigBuilder`; idiomatic for the common case. It creates a session and starts it for you.
- **Raw** — `glasses.assistant.createSession(AssistantConfig(...))` then `session.start()`. For programmatic construction (tools loaded from config, conditional registration). The sugar is implemented over the raw form — you can always skip the builder.

`tool(...)` is a plain Kotlin builder extension, not a runtime-interpreted tree: it appends a `ToolDefinition` to the config. You could replace every `tool(...) { ... }` line with `tools += ToolDefinition(...)` and get the same result.

## Tools

A tool is a `name`, a `description` the model reads verbatim, and a suspend `body` that returns a `ToolResult`. The body is ordinary app code running on `Dispatchers.IO` — it sees `glasses.camera`, `glasses.audio`, your repositories, your DB, third-party SDKs. There is no sandbox; the security boundary is the app/OS level (Android permissions), not the SDK.

```kotlin
tool("take_photo", "Take a photo when the user asks to capture or remember a moment.") {
    val photo = glasses.camera.capturePhoto().valueOrNull()
        ?: return@tool ToolResult.Err("camera failed")
    library.add(photo)
    ToolResult.Ok("photo saved")          // short factual strings — the model reads this aloud
}
```

Three overloads:

| Overload | Signature | Use |
|---|---|---|
| No-arg | `tool(name, description) { -> ToolResult }` | Most tools — camera, status, simple actions |
| Typed-args | `tool<Args>(name, description) { args -> ToolResult }` | `Args` is `@Serializable`; the JSON Schema is inferred from its descriptor |
| Explicit-schema | `tool(name, description, schema) { args -> ToolResult }` | Polymorphic types or format constraints (`date-time`, `minLength`) |

The typed-args overload needs the `org.jetbrains.kotlin.plugin.serialization` plugin applied in your app's `build.gradle.kts` (the serialization runtime ships transitively, but the `@Serializable` compiler plugin cannot).

`ToolResult.Ok(output)` feeds a short string back to the model to read or weave into its reply; `ToolResult.Err(message)` surfaces a failure the model explains ("sorry, the camera failed"). For structured data, emit JSON as a string: `ToolResult.Ok("""{"distance_km": 12.4}""")`.

By default the model speaks a "let me check…" filler while a tool runs. Set `blocking = true` on a tool that returns in well under 100 ms (a no-arg "what time is it") so the model waits silently and the filler doesn't feel awkward.

**Write descriptions for the model.** Be specific about *when* to call ("Take a photo when the user asks to capture a moment"), not *what it does internally* ("Captures imagery via the camera SDK"). The same description also drives the `Mock` provider's deterministic test matcher.

## Wake and sleep

A session is created `Idle`. `start()` moves it to **`Dormant`** — set up, but **no provider connection open and $0 token spend**. You wire *any* trigger to `session.wake()`, which opens the WebSocket and transitions to `Active`. The model runs the conversation; `sleep()` returns it to `Dormant`; `stop()` is terminal.

```text
Idle ──start()──▶ Dormant ──wake()──▶ Activating ──▶ Active
                    ▲                                  │
                    └──── Sleeping ◀── sleep() / sleepAfterSilence / sleepOnPhrase / end_conversation
                                                       │
   any non-Stopped state ──stop()──▶ Stopped (terminal)
```

`Active ⇄ Reconnecting` is a transparent, library-owned reconnect (OpenAI Realtime caps sessions at ~60 min, and the SDK also reconnects proactively for stability). The conversation history replays on the new connection and the customer never leaves the `Active` surface; an `AssistantEvent.Reconnected` fires only for observability. Set `startActive = true` to skip `Dormant` and open the connection immediately at `start()`.

The Dormant/Active split is deliberate: it lets you pick **any** wake mechanism without the library prescribing one.

### Wiring the wake

The canonical wake is a voice phrase, reusing the existing voice-trigger system:

```kotlin
glasses.voice.onPhrase("hey coach") { session?.wake() }
```

`onPhrase` defaults to `VoiceScope.WhenDormant`, so the phrase won't double-fire during an active conversation. Swap that line for a button `onClick`, a gesture handler, or an MCP call — anything that calls `session.wake()`. See [voice triggers](/docs/guides/voice-triggers) for how `onPhrase` matches transcripts; this page won't re-explain it.

### Sleeping

| Mechanism | How |
|---|---|
| **Model-driven (default)** | `endOnIntent = true` registers a hidden `end_conversation` tool the model calls when it hears the user wrap up ("bye", "thanks, I'm good") — in any language, no phrase list to maintain. Its body calls `session.sleep()`. |
| **Deterministic phrase** | `sleepOnPhrase("that's all")` — case-insensitive substring on final transcripts, wired through the same voice-command system. |
| **Silence timeout** | `sleepAfterSilence(30.seconds)` — auto-sleep after contiguous user silence. Assistant speech pauses (never truncates) the timer. |
| **Explicit** | Call `session.sleep()` from your own code. |

Set `endOnIntent = false` for strict, deterministic-only sleep.

## Greeting

On every wake the SDK automatically speaks a greeting, generated **out-of-band** from the user's memory context — so it greets fresh and can never accidentally continue the prior (ended) conversation.

```kotlin
greeting = Greeting.Custom("Greet the runner warmly in one short sentence and ask how you can help.")
```

`Greeting.Default` uses the SDK's built-in directive (memory-aware); `Greeting.Custom(directive)` supplies your own; `Greeting.Off` opts out (greet manually in `onWake { say(...) }`, or not at all). This replaces hand-wiring `onWake { greet(...) }`.

## Vision

`session.includeImage(uri, prompt?)` adds an image to the conversation and auto-triggers a response — the model speaks about it in its configured voice. The image stays in context for follow-up turns. Typical use is from a tool body:

```kotlin
tool("describe_scene", "Describe what the user is looking at. Call for 'what do you see' / 'describe this'.") {
    val photo = glasses.camera.capturePhoto().valueOrNull()
        ?: return@tool ToolResult.Err("camera failed")
    val uri = photo.uri ?: return@tool ToolResult.Err("photo had no uri")
    glasses.assistant.activeSession?.includeImage(uri)   // non-null while a tool dispatches
    ToolResult.Ok("looking")
}
```

`uri` accepts a `data:` URI (what `capturePhoto().uri` returns under the sim transport), an `http(s)` URL (the provider fetches it), or a `file://` / `content://` / absolute path (the library base64-encodes it). To reach the session from a tool body, use `glasses.assistant.activeSession` or a field your handler captured from `start(...)`.

## Speaking and barge-in

- `session.say(text)` speaks fixed text **in the provider's voice** — use this instead of `glasses.audio.speak(...)` for assistant speech, so it matches the model's own voice rather than the jarringly-different platform TTS engine.
- `session.greet(prompt?)` speaks a model-generated, memory-personalized greeting (the manual primitive behind the automatic greeting above).
- `session.cancelSpeak()` cancels the model's in-flight utterance — the app/tool-driven barge-in primitive (the model also handles user-voice barge-in automatically via VAD). Use it when a tool result is ready and you want to interrupt the filler to deliver it.

Mid-session setters exist for `setReasoningEffort`, `setVoice`, `setModel`, and `updateInstructions`. Voice and model bind to the connection, so `setVoice`/`setModel` take effect on the **next** wake, not mid-conversation; `updateInstructions` and `setReasoningEffort` apply to the next response immediately.

## Provider and configuration

`AssistantProvider` is a sealed type. v1 ships:

- **`AssistantProvider.OpenAi(model = null, voice = null, turnDetection = ServerVad(), reasoningEffort = Low)`** — production. Leave `model`/`voice` `null` (the canonical `OpenAi()` form) to take the values configured for this project in the dashboard, falling back to the SDK defaults `gpt-realtime-2` / `alloy`. A value passed in code **hard-pins it and wins over the dashboard**. `reasoningEffort` defaults to `Low` (OpenAI's own recommendation for voice agents — higher settings add noticeable latency).
- **`AssistantProvider.Mock(...)`** — deterministic, in-process, $0. Substring-matches injected utterances against tool descriptions. Powers unit tests and the MCP `injectAssistantUtterance` path.

There is **no** `Claude` or `Cascaded` provider (Anthropic ships no Realtime API as of v1); **Gemini Live is future**, not present. Precedence for every model-side knob is **code > dashboard > SDK default** — leave a field `null` to defer to the dashboard.

Key `AssistantConfig` / builder fields:

| Field | Default | Meaning |
|---|---|---|
| `instructions` | `""` | The full system prompt — you own all of it; the library adds nothing |
| `startActive` | `false` | `true` opens the connection at `start()` (skips Dormant) |
| `onWake {}` / `onSleep {}` | — | Hooks run as coroutines with the session as receiver (`onWake { say("…") }`) |
| `sleepAfterSilence(Duration)` | off | Auto-sleep after contiguous user silence |
| `sleepOnPhrase(phrase)` | — | Deterministic sleep phrase |
| `endOnIntent` | `true` | Register the hidden model-driven `end_conversation` tool |
| `greeting` | `Greeting.Default` | Auto-greeting policy (see [Greeting](#greeting)) |
| `historyCap` | `100` | Local replay-buffer cap, in turns |
| `historyCompaction` | `Auto` | What happens as the buffer fills (see [Memory](#memory)) |

## The managed gateway is the default

The assistant carries **no API key in your app**. The SDK opens a WebSocket to Extentos's managed gateway, which relays the realtime session to OpenAI on Extentos's key and meters usage. There is no `setOpenaiApiKey` — it was removed when the assistant moved to gateway-only. To run on your own OpenAI account, upload your key in the dashboard **Credentials** section and the gateway swaps it in server-side (BYOK); your handler code doesn't change.

All gateway, BYOK, identity/attestation, metering, and the planned credit billing live in one place — see **[the managed AI gateway](/docs/concepts/ai-gateway)**. This page won't re-derive them.

**PII note:** Phase 4 events carry **verbatim transcripts** in the dev event log (`user_spoke` / `assistant_spoke`) — fine for development, but the transcript is yours to govern in production; document retention in your app's privacy policy. (The gateway meters usage but does not persist conversation content.)

## Memory

The assistant has two independent memory layers, both configured on `AssistantConfig` — neither is a separate capability.

### Within-session history and compaction

The SDK keeps a local replay buffer of recent turns (capped at `historyCap`, default 100) and replays it to the provider on every reconnect — so a conversation survives wake/sleep and the transparent reconnects. The buffer lives in memory; it is **not persisted to disk** (persist to your own storage if you need it across launches, then restore with `replaceHistory`).

`historyCompaction` controls what happens as the buffer fills (it fires in the background near ~80% of the cap):

| Policy | Behavior |
|---|---|
| `Auto` (default) | Summarizes the oldest ~50% of turns via a cheap chat model (`compactionModel`, default `gpt-4o-mini`) into one summary turn — the conversation continues indefinitely without silent forgetting. ~$0.001 per compaction. |
| `DropOldest` | Drop the oldest turn when full. Free, lossy. |
| `Custom(compact)` | Your own `suspend (List<Turn>) -> List<Turn>` compactor — bring your own summarizer, model, or vector-DB recall. |
| `None` | No compaction; the buffer holds at `historyCap` and you manage it manually via `clearHistory` / `appendHistory` / `replaceHistory`. |

Session history methods — `conversationHistory(limit)`, `clearHistory()`, `appendHistory(turn)`, `replaceHistory(turns)` — let you snapshot, wipe (e.g. `onWake { session.clearHistory() }` for a fresh-each-wake notetaker), inject app-context hints, or restore persisted turns.

### Cross-session persistent memory (v0 preview, Android-only)

`persistentMemory = true` loads this end-user's stored profile at session start and merges durable signal back at session end — so the agent remembers the user *across* sessions (the automatic greeting personalizes from it).

- **Opt-in and consent-gated.** It stores a person's data, so it requires the end-user's consent — which you can only obtain in your app. It is therefore a **code-side switch, never a dashboard toggle**.
- **Keyed per-device by default.** `memoryUserId = null` keys the profile on the SDK's attested per-device id (memory follows the device). Set it to your app's stable id for the signed-in user to make memory follow the *person* across devices and reinstalls (isolated per user on a shared device). The profile is always scoped to your project by the attestation JWT, so one app can never reach another's memory.
- **Managed-gateway only** unless you supply a `MemoryStore`. By default the profile lives on the Extentos backend behind the gateway; BYOK has no Extentos store. Provide a `MemoryStore` to keep profiles entirely in your own infrastructure — that also enables persistent memory under BYOK.

## Errors and events

Open-time failures throw `AssistantException` wrapping an `AssistantError` (`NoApiKey`, `AlreadyActive`, `SessionEnded`, `NotReady`, `NetworkError`, `ProviderError`) — pattern-match on `.error`. Starting a second session while one is active throws `AlreadyActive` (the runtime is **singleton-active** per `ExtentosGlasses`). Once a session is `Active`, transient errors surface as events and the session rides them out through the reconnection state machine. Full table: [error reference](/docs/reference/errors#assistant-errors-phase-4--preview).

Lifecycle flows through the shared `glasses.runtime.events` stream as `RuntimeEvent.Assistant` wrapping an `AssistantEvent` (`SessionStarted`, `SessionEnded`, `UserSpoke`, `AssistantSpoke`, `ToolCalled`, `ToolResultEvent`, `Reconnected`, `Error`, `WentDormant`). In the simulator these land on the **`ai`** event-log chip (an `Error` climbs to the **`errors`** chip automatically). Capture transcripts off this stream:

```kotlin
glasses.runtime.events
    .filterIsInstance<RuntimeEvent.Assistant>()
    .onEach { (it.event as? AssistantEvent.UserSpoke)?.let { spoke -> notes.append(spoke.transcript) } }
    .launchIn(scope)
```

> `glasses.conversation.*` (the Phase 3 cascaded runtime) is **removed** on current Android — use `glasses.assistant.*`.

## Related

- [Build a voice assistant](/docs/guides/voice-assistant) — the task guide: wake phrase → tools acting on app state → vision → sleep
- [The managed AI gateway](/docs/concepts/ai-gateway) — gateway default, BYOK, metering, and the planned credit billing
- [Voice triggers](/docs/guides/voice-triggers) — `glasses.voice.onPhrase`, the canonical wake mechanism
- [The display capability](/docs/concepts/display) — assistant tools can render on the Ray-Ban Display via `glasses.display.*`
- [Capabilities](/docs/concepts/capabilities) — the full vendor-agnostic SDK vocabulary
- [Error reference](/docs/reference/errors) — `AssistantError` and the no-`DisplayError` model
