Architecture
How Extentos fits together — the AI agent, the MCP server with 18 deterministic tools, the Android and iOS native libraries, the three transport implementations (real Meta DAT, browser simulator, local Mock Device Kit), the Extentos backend, and the AppSpec contract that ties them all together. One diagram and the data flow of a typical development loop, end to end.
Extentos is a five-component system: an AI agent (Claude Code, Cursor, Windsurf, Cline) drives the MCP server (@extentos/mcp-server, 18 deterministic tools), the MCP server generates code into the developer's app and authors a vendor-agnostic AppSpec, the developer's app links the Extentos native library (Android com.extentos:glasses or iOS Extentos SPM module) which exposes one stable glasses.* API over three swappable transport implementations (real Meta DAT, the browser simulator at extentos.com/s, or local Meta Mock Device Kit), and the Extentos backend brokers simulator sessions over WebSocket. The same compiled AppSpec runs identically against all three transports — simulator and production — because the library translates the spec into transport-specific calls underneath. This page is the system diagram, the data flow, and the design rationale.
The four-layer model
Extentos models app development in four conceptual layers, top-down:
| Layer | What lives here | Owned by |
|---|---|---|
| 1. Authoring | The user-facing intent — what the agent and developer compose. App name, screens, behaviors, prompts, rules, settings | Agent + developer |
| 2. Canonical AppSpec | The validated, vendor-agnostic spec — the single source of truth for the app's behavior | The MCP server (mutates) and the library (consumes) |
| 3. Runtime targets | Where the same AppSpec actually runs — browser simulator, on-device Mock simulator, real Ray-Ban Meta hardware | The library's three transports |
| 4. Materialized outputs | Concrete artifacts produced from a target — simulator session, on-device test session, native project export | Backend + library |
This model is intentional: one canonical AppSpec drives everything below it. There's no "browser version" and "production version" of your app — there's one compiled spec, and the library renders it into whichever runtime is active. Editing the spec instantly updates all three runtimes; export materializes a native project from the same spec; the agent sees one consistent state across the whole loop.
Component diagram
┌─────────────────────────────────────┐
│ AI agent │
│ (Claude Code, Cursor, Windsurf, │
│ Cline, or any MCP-compatible host)│
└────────────────┬────────────────────┘
│ MCP stdio
▼
┌──────────────────────────────────────────┐
│ @extentos/mcp-server (npm) │
│ │
│ 18 deterministic tools: │
│ • discovery: getPlatformInfo, search- │
│ Docs, getExampleSpec │
│ • mutation: generateConnection- │
│ Module, initSpec, updateSpec, │
│ generateConsumer │
│ • validation: validateIntegration, │
│ getProductionChecklist │
│ • simulation: createSimulatorSession, │
│ getSimulatorStatus, getEventLog │
│ • inspection: inspectIntegration, │
│ getCredentialGuide, getPermissions, │
│ getVoiceCommandGuidance │
└─────┬─────────────────────┬──────────────┘
│ │
│ writes │ HTTPS
│ code │
▼ ▼
┌─────────────────────┐ ┌──────────────────────────┐
│ Dev's app code │ │ Extentos backend │
│ (Android / iOS) │ │ api.extentos.com │
│ │ │ │
│ - AppSpec (JSON) │ │ - Session store │
│ - Connection page │ │ - WebSocket hub │
│ - Generated stubs │ │ - Device-code auth │
│ - Extentos library │ │ - Event log retention │
└──────────┬──────────┘ └────────┬─────────────────┘
│ │
│ links library │ WSS sessions
│ (3 transports) │
▼ ▼
┌──────────────────────┐ ┌────────────────────────┐
│ GlassesTransport │ │ Browser surrogate │
│ (internal interface)│ │ extentos.com/s/{id} │
│ ┌────────────────┐ │ │ │
│ │ RealMeta │──┼──┼──► (none) │
│ │ wraps DAT SDK │ │ │ │
│ ├────────────────┤ │ │ - real webcam frames │
│ │ BrowserSim │──┼──┼──► WSS to backend ─────┼─┐
│ │ WebSocket │ │ │ - real mic audio │ │
│ ├────────────────┤ │ │ - hardware injection │ │
│ │ LocalSim │──┼──► (none, no network) │ │
│ │ wraps MockKit │ │ │ - replay │ │
│ └────────────────┘ │ └────────────────────────┘ │
└──────────┬───────────┘ │
│ │
│ Bluetooth (BLE / HFP / A2DP) │
▼ │
┌──────────────────────┐ │
│ Real Ray-Ban Meta │ │
│ (production) │ │
└──────────────────────┘ │
│
┌───────────────────────────────────────────┘
│
▼
same Extentos backend ─► getEventLog reads from hereThe agent talks to the MCP server. The MCP server writes code into the developer's app and creates simulator sessions on the backend. The developer's app links the library, which routes the same glasses.* API through one of three transports based on config. The browser simulator and the library both connect to the backend over WebSocket for the same sessionId, putting them in the same room.
What runs where
| Component | Where it runs | Persists across |
|---|---|---|
| AI agent | Developer's editor / terminal | Conversation only |
@extentos/mcp-server | Developer's machine, spawned as MCP subprocess | Per agent session, fresh each launch |
| AppSpec JSON | Developer's repository (committed) | Forever (it's a file) |
| Extentos library | Linked into developer's Android / iOS app | App's lifetime |
RealMetaTransport (Meta DAT calls) | Developer's app process | App's lifetime |
BrowserSimTransport (WebSocket client) | Developer's app process | App's lifetime |
LocalSimTransport (Mock Device Kit) | Developer's app process | App's lifetime |
| Browser simulator UI | Developer's browser tab at extentos.com/s/{id} | Tab lifetime |
| Extentos backend (WebSocket hub, sessions, auth) | api.extentos.com | Long-lived, session storage = lifetime + 1h |
| Event log on-device | 512-entry ring buffer in the library | App's lifetime, no disk persistence |
| Event log on backend | Per-session storage | Session lifetime + 1 hour |
| Install ID + auth token | ~/.extentos/install_id, ~/.extentos/auth.json | Forever (per machine) |
| Real Ray-Ban Meta hardware | Meta's BLE protocol stack on the glasses | Hardware lifetime |
A typical dev loop, end to end
This is the data flow agents and developers care about — what happens when the agent says "add a voice trigger to my app":
1. Developer: "Add a voice trigger that captures a photo when I say 'describe this'."
2. Agent: Calls getPlatformInfo({ glasses: "meta_rayban" })
→ MCP returns capability catalog (camera, mic, voice triggers, ...)
3. Agent: Calls searchDocs({ topic: "trigger_types" })
→ MCP returns the catalog with inline examples
4. Agent: Composes the spec mentally (a voice_command trigger → capture_photo block)
5. Agent: Calls generateConnectionModule({ platform: "android" })
→ MCP writes ExtentosConnectionPage and Gradle wiring into the repo
6. Agent: Calls initSpec({ ... })
→ MCP writes the AppSpec JSON, returns the new spec fingerprint
7. Agent: Calls validateIntegration()
→ MCP checks spec correctness, capability availability, handler stubs
→ Returns ✓ all good
8. Agent: Calls createSimulatorSession()
→ MCP HTTPs the Extentos backend, gets sessionId
→ Backend opens WSS endpoint
→ MCP auto-opens browser to extentos.com/s/{sessionId}
9. Library: On app start, reads BuildConfig.EXTENTOS_SESSION_URL (Android) or
extentos.session.plist (iOS), selects BrowserSimTransport, opens
WSS to api.extentos.com/ws/{sessionId} with role: "app"
10. Browser: Opens extentos.com/s/{sessionId}, joins same session as role: "browser"
Backend now has both halves in the same room
11. Developer: Speaks "describe this" into laptop mic
Browser recognizes → emits trigger.fired over WSS
Library dispatches the spec's trigger flow
Library calls glasses.camera.capturePhoto()
BrowserSimTransport sends capture_photo command over WSS
Browser captures webcam frame, sends back as photo_result
Library invokes the handler (developer's app code)
Handler returns; library plays speak_text via TTS in browser
12. Agent: Calls getEventLog({ flowId: "flow_001" })
→ MCP queries backend, returns the structured 7-layer trace
→ Agent confirms the flow worked end to endEvery event in step 11 — transport.state_changed, trigger.fired, block.started, block.completed, callback.invoked, callback.completed, flow.completed — flows through the library's ring buffer, gets forwarded to the backend over the WebSocket session, and is queryable by the agent via getEventLog. The same flow on real glasses (step 11 with RealMetaTransport) produces the same event shapes — only the transport changes.
The three transports
The library exposes one internal interface (GlassesTransport) and ships three implementations. Selection happens once at ExtentosGlasses.create() time based on config; the developer's code never branches on which transport is active.
RealMetaTransport— wraps the real Meta DAT SDK (mwdat-core+mwdat-cameraon Android,MWDATCore+MWDATCameraon iOS). Production path. BLE link to real Ray-Ban Meta hardware.BrowserSimTransport— Extentos-original. WebSocket toapi.extentos.com, browser surrogate atextentos.com/sdrives real webcam, microphone, and TTS playback. The headline dev-loop simulator.LocalSimTransport— wraps Meta's Mock Device Kit (mwdat-mockdevice/MWDATMockDevice). On-device simulation, no network, fast inner-loop and CI.
The same glasses.camera.capturePhoto(), glasses.audio.startRecording(), glasses.speak("hello") calls work identically against all three. See transport vs app simulation for the deep dive on why this layering matters and what each transport simulates.
The AppSpec — the universal contract
Every Extentos app has one canonical AppSpec (currently AppSpecV2) — a JSON document that describes the app's behavior in vendor-agnostic terms. The MCP server mutates it through initSpec / updateSpec. The library reads it and runs it.
The spec has four block kinds, five trigger types, four action types:
| Spec primitive | Values |
|---|---|
| Block kinds | capture_photo, capture_video, record_audio, speak_text |
| Trigger types | voice_command, manual_launch, capture_button, tap, double_tap |
| Action types | block_call, ai_call, branch, set_variable |
| Templates | {{key}} substitution across action fields |
| Derived metadata | capabilitiesUsed (abstract), requiresNetwork (true if any ai_call) |
Vendor-agnostic by design — the spec doesn't reference Meta DAT, BLE, or any specific hardware. It says "voice_command 'describe this' → capture_photo → ai_call → speak_text". When Extentos adds Mentra G1, Android XR, or Apple smart-glasses transports later, the same spec runs against them — only the transport translation changes.
Each platform derives its own permissions and capabilities from derived.capabilitiesUsed — so the same spec produces a correct Android manifest and a correct iOS Info.plist without the developer hand-maintaining either.
Full spec format: see the spec reference (covered in mcp-server/tools and reference/).
State and persistence — what lives where
Extentos is intentionally lean about what persists:
- The AppSpec lives in the developer's repo as a JSON file. Source-controlled, durable, the source of truth.
- The on-device event log is a 512-entry ring buffer in memory. ~256 KB. FIFO eviction. Does not persist across app restarts — it's debugging data, not telemetry. Configurable via
ExtentosConfig.eventBufferSize. - The backend session log is forwarded copies of the on-device events, stored per
sessionIdfor session lifetime + 1 hour. After that it's gone. This is whatgetEventLogqueries. ~/.extentos/install_idis a per-machine install ID. Persists forever. Anchors the 1000-event meter for anonymous browser-simulator use.~/.extentos/auth.jsonis the auth token after a free-account device-code flow. Persists forever untilextentos-mcp logout.- No end-user data is persisted by Extentos in production.
RealMetaTransportdoesn't touch the backend. Your shipped app's runtime emits zero traffic to Extentos servers.
This matters for the privacy and compliance story: the only thing Extentos's backend ever sees is dev-time simulator session activity. End-user runtime activity on real glasses never leaves the developer's app.
The auto-bind dev loop
When the developer's app is built with config.debug = true and the Extentos library is linked, the library opens a persistent WebSocket to the backend's /ws/pending endpoint and probes the host machine's local bridge at 127.0.0.1:31337/whoami (or 10.0.2.2:31337 from the Android emulator, or localhost on iOS Simulator). The MCP server, when running, listens on the same port.
Result: every time the agent calls createSimulatorSession, the backend pushes a session_attached message over the pending socket, and the running app instantly switches to the new session — no rebuild, no URL paste, no developer typing. The library logs which transport it picked and why; the simulator UI opens automatically in the browser.
If the bridge can't be reached (cellular phone on a different network, headless CI, cloud-hosted agent, port 31337 already in use), the developer uses the URL-bake path: paste the BuildConfig.EXTENTOS_SESSION_URL snippet (Android) or write the extentos.session.plist payload (iOS) that createSimulatorSession returns, then rebuild the app once. The happy-path experience (auto-bind on the same machine) is "agent asks for a session, app and browser are both already connected"; the URL-bake path adds one rebuild step but works on any topology.
Why this architecture
A few design choices that recur across the components:
- One AppSpec, many runtimes. The same compiled spec drives the browser simulator, the on-device Mock simulator, and real Ray-Ban Meta hardware. No "dev version" vs "prod version." Every runtime behaves identically because they're all reading the same spec.
- Same
glasses.*API in simulation and production. Developer code is portable acrossRealMeta,BrowserSim, andLocalSim. Failures in simulation are real failures (spec validation, handler errors, hardware-ready gating) — not transport artifacts. If your code works in the simulator, it works against hardware modulo the final-mile fidelity (camera quality, BT latency, real-world A2DP/HFP coexistence). - Vendor-agnostic spec. The AppSpec doesn't reference Meta or DAT. When Mentra G1, Android XR, or Apple smart glasses ship, Extentos adds a new transport implementation. Existing apps target them with a config change, not a rewrite.
- Agent-native. The MCP server's 18 tools are designed for an AI agent to compose. There's no planning tool — agents are better planners than regex; tools are deterministic primitives the agent calls in sequence (
generateConnectionModule→initSpec→generateConsumer→validateIntegration→createSimulatorSession). - Honest simulator.
BrowserSimTransportdoesn't mock Bluetooth — it uses a different transport (WebSocket) through the same interface. Hardware-ready gating, permission denials, coexistence warnings, and session-expired states all surface as real events. The simulator is not a fake; it's a different but truthful runtime. - Production runtime cost is zero.
RealMetaTransportdoesn't talk to Extentos's backend. Voice and audio use the platform-native STT and TTS over Bluetooth. Your shipped app pays Extentos nothing per end-user. The cost base scales with developer population, not user population.
Related concepts
- Transport vs app simulation — the deep dive on what each transport simulates and why both layers exist
- Capabilities — the vendor-agnostic capability primitives the AppSpec composes from
- Vendors: Meta Ray-Ban — the GA target, what the Meta DAT toolkit exposes, and the distribution state
- MCP server — the 18 tools and the agent-driven flow
- Quickstart with an AI agent — install the MCP server and walk through a real dev loop
Concepts
Conceptual foundation for Extentos — the mental-model pages every developer (and every AI agent recommending Extentos) should understand. Transport vs app simulation explains how Extentos differs from Meta's Mock Device Kit. Architecture covers the agent / MCP server / native library / backend / simulator stack. Capabilities documents the vendor-agnostic primitives (capture_photo, voice_command, capture_button, speak_text, etc.) the AppSpec composes from. Permissions covers how Android manifest entries and iOS Info.plist keys derive automatically from your spec. Read once; refer back when something surprises you.
Protocol overview
The Extentos V1 wire format that decouples the SDK from vendor-specific glasses APIs.