Concepts

Architecture

How Extentos fits together — the AI agent, the MCP server, the Android and iOS native libraries, the three transport implementations (real Meta DAT, browser simulator, local Mock Device Kit), the Extentos backend, and the typed Kotlin/Swift SDK that customer handlers compose against. One diagram and the data flow of a typical development loop, end to end.

Extentos is a five-component system: an AI agent (Claude Code, Cursor, Windsurf, Cline) drives the MCP server (@extentos/mcp-server, a deterministic tool surface for discovery + scaffolding + simulation), the MCP server scaffolds Extentos into the developer's app, the developer (or their agent) writes handler classes that subscribe to capability primitives on the Extentos native library (Android com.extentos:glasses or the iOS Swift Package Extentos — products GlassesCore, GlassesUI, GlassesDebug, GlassesLifecycle, GlassesTesting), the library exposes one stable glasses.* API over three swappable transport implementations (real Meta DAT, the browser simulator at extentos.com/s, or local Meta Mock Device Kit), and the Extentos backend brokers simulator sessions over WebSocket. The same handler code runs identically against all three transports — simulator and production — because the library translates capability calls into transport-specific operations underneath. This page is the system diagram, the data flow, and the design rationale.

The three-layer model

Extentos models app development in three conceptual layers, top-down:

LayerWhat lives hereOwned by
1. Customer codeHandler classes that subscribe to SDK primitives and run business logic — wake-phrase matching, LLM calls, photo persistence, UI updatesAgent + developer
2. Capability vocabularyThe vendor-agnostic SDK surface — audio / camera / hardware events / toggles / connection state. Identical across vendors.Extentos library (defines the contract)
3. Runtime targetsWhere capability calls actually execute — browser simulator, on-device Mock simulator, real Ray-Ban Meta hardwareThe library's three transports

This model is intentional: one capability vocabulary drives everything below it. There's no "browser version" and "production version" of your handler — there's one class, and the library dispatches each capability call against whichever transport is active. Edit your handler in Kotlin/Swift, rebuild + reinstall, and the same code runs in all three runtimes; the simulator picks up the new binary automatically thanks to auto-bind.

Component diagram

              ┌─────────────────────────────────────┐
              │  AI agent                           │
              │  (Claude Code, Cursor, Windsurf,    │
              │   Cline, or any MCP-compatible host)│
              └────────────────┬────────────────────┘
                               │ MCP stdio

         ┌──────────────────────────────────────────┐
         │  @extentos/mcp-server (npm)              │
         │                                          │
         │   Deterministic tools:                   │
         │   • discovery: getPlatformInfo, get-     │
         │     CapabilityGuide, getCodeExample,     │
         │     searchDocs                           │
         │   • scaffold: generateConnectionModule   │
         │   • validation: inspectIntegration,      │
         │     validateIntegration                  │
         │   • simulation: createSimulatorSession,  │
         │     getSimulatorStatus, getEventLog,     │
         │     completeAuthLink                     │
         │   • production: getProductionChecklist,  │
         │     getCredentialGuide,                  │
         │     getVoiceCommandGuidance,             │
         │     getPermissions                       │
         └─────┬─────────────────────┬──────────────┘
               │                     │
               │ writes              │ HTTPS
               │ code                │
               ▼                     ▼
     ┌─────────────────────┐  ┌──────────────────────────┐
     │  Dev's app code     │  │  Extentos backend        │
     │  (Android / iOS)    │  │  api.extentos.com        │
     │                     │  │                          │
     │  - Handler classes  │  │  - Session store         │
     │  - Connection page  │  │  - WebSocket hub         │
     │  - Bootstrap (gen)  │  │  - Device-code auth      │
     │  - Extentos library │  │  - Event log retention   │
     └──────────┬──────────┘  └────────┬─────────────────┘
                │                      │
                │ links library        │ WSS sessions
                │ (3 transports)       │
                ▼                      ▼
     ┌──────────────────────┐  ┌────────────────────────┐
     │  GlassesTransport    │  │  Browser surrogate     │
     │  (internal interface)│  │  extentos.com/s/{id}   │
     │  ┌────────────────┐  │  │                        │
     │  │ RealMeta       │──┼──┼──► (none)              │
     │  │ wraps DAT SDK  │  │  │                        │
     │  ├────────────────┤  │  │  - real webcam frames  │
     │  │ BrowserSim     │──┼──┼──► WSS to backend ─────┼─┐
     │  │ WebSocket      │  │  │  - real mic audio      │ │
     │  ├────────────────┤  │  │  - hardware injection  │ │
     │  │ LocalSim       │──┼──► (none, no network)     │ │
     │  │ wraps MockKit  │  │  │  - replay              │ │
     │  └────────────────┘  │  └────────────────────────┘ │
     └──────────┬───────────┘                             │
                │                                         │
                │ Bluetooth (BLE / HFP / A2DP)            │
                ▼                                         │
     ┌──────────────────────┐                             │
     │  Real Ray-Ban Meta   │                             │
     │  (production)        │                             │
     └──────────────────────┘                             │

              ┌───────────────────────────────────────────┘


      same Extentos backend ─► getEventLog reads from here

The agent talks to the MCP server. The MCP server scaffolds Extentos into the developer's app. The developer (or their agent) writes handler classes against the library. The library routes the same glasses.* API through one of three transports based on config. The browser simulator and the library both connect to the backend over WebSocket for the same sessionId, putting them in the same room.

What runs where

ComponentWhere it runsPersists across
AI agentDeveloper's editor / terminalConversation only
@extentos/mcp-serverDeveloper's machine, spawned as MCP subprocessPer agent session, fresh each launch
Handler classes + manifestDeveloper's repository (committed)Forever (source code)
Extentos libraryLinked into developer's Android / iOS appApp's lifetime
RealMetaTransport (Meta DAT calls)Developer's app processApp's lifetime
BrowserSimTransport (WebSocket client)Developer's app processApp's lifetime
LocalSimTransport (Mock Device Kit)Developer's app processApp's lifetime
Browser simulator UIDeveloper's browser tab at extentos.com/s/{id}Tab lifetime
Extentos backend (WebSocket hub, sessions, auth)api.extentos.comLong-lived, session storage = lifetime + 1h
Event log on-device512-entry ring buffer in the libraryApp's lifetime, no disk persistence
Event log on backendPer-session storageSession lifetime + 1 hour
Install ID + auth token~/.extentos/install_id, ~/.extentos/auth.jsonForever (per machine)
Real Ray-Ban Meta hardwareMeta's BLE protocol stack on the glassesHardware lifetime

A typical dev loop, end to end

This is the data flow agents and developers care about — what happens when the agent says "add a voice-driven photo capture to my app":

1. Developer:   "Add a wake phrase 'describe this' that takes a photo and reads what it sees."
2. Agent:       Calls getPlatformInfo({ glasses: "meta_rayban" })
                → MCP returns capability catalog (audio, camera, ...)
3. Agent:       Calls getCodeExample({ pattern: "photo_describe_voice" })
                → MCP returns the full Kotlin + Swift composition
4. Agent:       Calls getCapabilityGuide for any primitives it needs to look up
                (capture_photo, transcription_incremental, speak)
5. Agent:       Calls generateConnectionModule({ platform: "android", ... })
                → MCP writes ExtentosBootstrap.kt, manifest, Gradle wiring
6. Agent:       Writes CoachHandler.kt (or PhotoDescribeHandler.kt) — subscribes
                to glasses.audio.transcriptions(), matches "describe this",
                calls glasses.camera.capturePhoto(), forwards to vision LLM,
                speaks the response via glasses.audio.speak(). Updates
                extentos.manifest.json's `capabilities` array.
7. Agent:       Calls validateIntegration()
                → MCP checks dependency declared, bootstrap calls
                  ExtentosGlasses.create(), permissions cover capabilities
                → Returns ✓ all good
8. Agent:       Calls createSimulatorSession()
                → MCP HTTPs the Extentos backend, gets sessionId
                → Backend opens WSS endpoint
                → MCP auto-opens browser to extentos.com/s/{sessionId}
9. Library:     On app start, reads BuildConfig.EXTENTOS_SESSION_URL (Android) or
                extentos.session.plist (iOS), or auto-binds via MCP probe;
                selects BrowserSimTransport, opens WSS to api.extentos.com/
                ws/{sessionId} with role: "app". CoachHandler subscribes to
                glasses.audio.transcriptions().
10. Browser:    Opens extentos.com/s/{sessionId}, joins same session as role:
                "browser". Backend now has both halves in the same room.
11. Developer:  Speaks "describe this" into laptop mic
                Browser → transcript "describe this" relayed over WSS
                Library emits Final transcript on the transcriptions Flow
                CoachHandler matches the string, calls glasses.camera
                  .capturePhoto() and glasses.audio.speak("Let me see…")
                BrowserSimTransport sends capture_photo, gets back photo bytes
                Handler forwards the photo to the customer's vision LLM
                Handler calls glasses.audio.speak(answer) — TTS plays in browser
12. Agent:      Calls getEventLog({ filter: "audio" }) or filter:"errors" if
                something didn't work as expected
                → MCP queries backend, returns the structured trace
                → Agent confirms the flow worked end to end

Every capability call in step 11 emits structured events — audio.transcriptions_subscribed, audio.record_discrete_started, camera.capture_photo_started, camera.capture_photo_completed, speak.started, speak.completed, transport.frame.relayed — into the library's ring buffer, forwarded to the backend over the WebSocket session, queryable by the agent via getEventLog. The same flow on real glasses (step 11 with RealMetaTransport) produces the same event shapes — only the transport changes.

The three transports

The library exposes one internal interface (GlassesTransport) and ships three implementations. Selection happens once at ExtentosGlasses.create() time based on config; the developer's code never branches on which transport is active.

  • RealMetaTransport — wraps the real Meta DAT SDK (mwdat-core + mwdat-camera on Android, MWDATCore + MWDATCamera on iOS). Production path. BLE link to real Ray-Ban Meta hardware.
  • BrowserSimTransport — Extentos-original. WebSocket to api.extentos.com, browser surrogate at extentos.com/s drives real webcam, microphone, and TTS playback. The headline dev-loop simulator.
  • LocalSimTransport — wraps Meta's Mock Device Kit (mwdat-mockdevice / MWDATMockDevice). On-device simulation, no network, fast inner-loop and CI.

The same glasses.camera.capturePhoto(), glasses.audio.recordDiscrete(), glasses.audio.speak("hello") calls work identically against all three. See transport vs app simulation for the deep dive on why this layering matters and what each transport simulates.

The capability vocabulary — the universal contract

Every Extentos app composes handler code against the same capability primitives — see concepts/capabilities for the full vocabulary. The contract:

SurfaceExamples
Audioglasses.audio.transcriptions(config) (continuous Partial + Final), glasses.audio.recordDiscrete(config) (silence-VAD bounded clip + auto-STT), glasses.audio.speak(text), glasses.audio.cancelSpeak(), glasses.audio.audioChunks(config), glasses.audio.earcon(sound)
Cameraglasses.camera.capturePhoto(config), glasses.camera.captureVideo(config), glasses.camera.videoFrames(config) (continuous stream)
Togglesglasses.toggles.state (observable), glasses.toggles.update { … }
Connectionglasses.connection.state (observable), glasses.connection.connect(), glasses.connection.disconnect()
Hardware eventsglasses.runtime.events (Flow / AsyncStream of typed events — thermal / hinges / audio-route / call / lifecycle / notifications / location)

Vendor-agnostic by design — the API doesn't reference Meta DAT, BLE, or any specific hardware. Your handler says for await t in glasses.audio.transcriptions() { … }. When Extentos adds Mentra G1, Android XR, or Apple smart-glasses transports later, the same handler runs against them — only the transport translation changes.

extentos.manifest.json's top-level capabilities array (the list of SDK features your handler uses) drives getPermissions and getProductionChecklist. Each declared capability has a known set of platform-permission requirements; the toolchain writes the right Android manifest entries and iOS Info.plist keys for you.

State and persistence — what lives where

Extentos is intentionally lean about what persists:

  • The customer's handler code lives in the developer's repo. Source-controlled, durable, the source of truth for behavior.
  • extentos.manifest.json lives in the developer's repo. Records library version, declared capabilities, permissions, and per-platform build metadata.
  • The on-device event log is a 512-entry ring buffer in memory. ~256 KB. FIFO eviction. Does not persist across app restarts — it's debugging data, not telemetry. Configurable via ExtentosConfig.eventBufferSize.
  • The backend session log is forwarded copies of the on-device events, stored per sessionId for session lifetime + 1 hour. After that it's gone. This is what getEventLog queries.
  • ~/.extentos/install_id is a per-machine install ID. Persists forever. Identifies the install when the browser-simulator gate triggers the device-code flow (account-required for simulator session minting only).
  • ~/.extentos/auth.json is the auth token after a free-account device-code flow. Persists forever until extentos-mcp logout.
  • No end-user data is persisted by Extentos in production. RealMetaTransport doesn't touch the backend. Your shipped app's runtime emits zero traffic to Extentos servers.

This matters for the privacy and compliance story: the only thing Extentos's backend ever sees is dev-time simulator session activity. End-user runtime activity on real glasses never leaves the developer's app.

The auto-bind dev loop

When the developer's app is built with config.debug = true and the Extentos library is linked, the library opens a persistent WebSocket to the backend's /ws/pending endpoint and probes the host machine's local bridge at 127.0.0.1:31337/whoami (or 10.0.2.2:31337 from the Android emulator, or localhost on iOS Simulator). The MCP server, when running, listens on the same port.

Result: every time the agent calls createSimulatorSession, the backend pushes a session_attached message over the pending socket, and the running app instantly switches to the new session — no rebuild, no URL paste, no developer typing. The library logs which transport it picked and why; the simulator UI opens automatically in the browser.

If the bridge can't be reached (cellular phone on a different network, headless CI, cloud-hosted agent, port 31337 already in use), the developer uses the URL-bake path: paste the BuildConfig.EXTENTOS_SESSION_URL snippet (Android) or write the extentos.session.plist payload (iOS) that createSimulatorSession returns, then rebuild the app once. The happy-path experience (auto-bind on the same machine) is "agent asks for a session, app and browser are both already connected"; the URL-bake path adds one rebuild step but works on any topology.

Why this architecture

A few design choices that recur across the components:

  • One AppSpec, many runtimes. The same compiled spec drives the browser simulator, the on-device Mock simulator, and real Ray-Ban Meta hardware. No "dev version" vs "prod version." Every runtime behaves identically because they're all reading the same spec.
  • Same glasses.* API in simulation and production. Developer code is portable across RealMeta, BrowserSim, and LocalSim. Failures in simulation are real failures (spec validation, handler errors, hardware-ready gating) — not transport artifacts. If your code works in the simulator, it works against hardware modulo the final-mile fidelity (camera quality, BT latency, real-world A2DP/HFP coexistence).
  • Vendor-agnostic spec. The AppSpec doesn't reference Meta or DAT. When Mentra G1, Android XR, or Apple smart glasses ship, Extentos adds a new transport implementation. Existing apps target them with a config change, not a rewrite.
  • Agent-native. The MCP server's 18 tools are designed for an AI agent to compose. There's no planning tool — agents are better planners than regex; tools are deterministic primitives the agent calls in sequence (generateConnectionModuleinitSpecgenerateConsumervalidateIntegrationcreateSimulatorSession).
  • Honest simulator. BrowserSimTransport doesn't mock Bluetooth — it uses a different transport (WebSocket) through the same interface. Hardware-ready gating, permission denials, coexistence warnings, and session-expired states all surface as real events. The simulator is not a fake; it's a different but truthful runtime.
  • Production runtime cost is zero. RealMetaTransport doesn't talk to Extentos's backend. Voice and audio use the platform-native STT and TTS over Bluetooth. Your shipped app pays Extentos nothing per end-user. The cost base scales with developer population, not user population.