MCP serverTools

Discovery and SDK reference tools

The Extentos MCP server's discovery + reference tools — getPlatformInfo (the static catalog of vendor capabilities), getCapabilityGuide (per-feature Kotlin + Swift call shape with gotchas), and getCodeExample (full canonical compositions for the common voice-glasses patterns). These are the first calls an AI agent makes in any new task — cheap, all local, no side effects, no meter cost.

Three discovery + reference tools answer "what does Extentos offer for this target?" and "how do I actually call it?" before any project mutation. All three are pure-read, no side effects, no state changes, and don't count against the simulator-event meter.

For your installed agent: these tools are invoked directly via the MCP — once the server is registered with your agent, it calls getPlatformInfo / getCapabilityGuide / getCodeExample and gets live, structured responses. This page exists so an agent evaluating Extentos pre-install can see the tool surface, and so human developers and search engines can find the documentation outside an MCP context. The live MCP response is authoritative; the parameter tables and example responses below are illustrative.

getPlatformInfo

Returns the static platform catalog — library version + the list of SDK capabilities the glasses expose (audio.transcriptions, audio.recordDiscrete, audio.speak, camera.capturePhoto, camera.videoFrames, audio.audioChunks, toggles, connection.state, …). The right first call for any new task.

When to use

  • At session start, before scaffolding or writing handler code
  • When the agent needs to know which capabilities a vendor supports
  • When the agent is building a capability declaration for the manifest

When NOT to use

  • For how to call a primitive — use getCapabilityGuide(feature) for the per-feature minimal usage + gotchas
  • For end-to-end compositional examples — use getCodeExample(pattern)
  • For project-installed state — use inspectIntegration

Parameters

ParameterTypeRequiredDescription
sectionsarray<"version" | "capabilities">yesWhich catalog sections to return. At least one.
glasses"meta_rayban"conditionalRequired when sections includes "capabilities". Only meta_rayban is supported in the current MVP.
expandarray<"schema" | "capabilities.full" | "capabilities.advanced">noOpt-in to detail. See response-size table below.

Response

The default response is compact (~2 KB) — names, categories, tier markers — sized to fit comfortably in the agent's context. Detail is opt-in via expand:

Response variantApprox sizeWhat's included
Default (compact capabilities)~2 KBVendor, DAT version, feature names + categories + tiers, toggle keys, global constraints
expand: ["capabilities.full"]~10 KBAll of the above plus per-feature params, payload shapes, requires lists, per-feature constraints
expand: ["capabilities.advanced"]+ ~1 KBAdds droppedPrimitives and futurePrimitives informational fields

The compact-by-default design is intentional. Returning the full ~10 KB on every first call burns the agent's context window before any work. The summary field hints at what's available and how to expand.

Example — discovery at session start

{
  "sections": ["version", "capabilities"],
  "glasses": "meta_rayban"
}

Response (compact):

{
  "version": { "latestStable": "1.1.27-pair", "...": "..." },
  "capabilities": {
    "vendor": "meta_rayban",
    "datVersion": { "android": "0.7.0", "ios": "0.6.0" },
    "features": [
      { "name": "capture_photo", "category": "camera", "tier": "dat_native" },
      { "name": "capture_video", "category": "camera", "tier": "phone_side_workaround" },
      { "name": "video_frames", "category": "camera", "tier": "dat_native" },
      { "name": "record_audio", "category": "audio", "tier": "phone_side_workaround" },
      { "name": "audio_chunks", "category": "audio", "tier": "dat_native" },
      { "name": "transcription_incremental", "category": "audio", "tier": "phone_side_workaround" },
      { "name": "speak", "category": "audio", "tier": "phone_side_workaround" },
      "...etc"
    ],
    "toggles": ["listening_mode", "camera_streaming_enabled", "audio_capture_enabled", "transcription_enabled", "battery_save_mode", "voice_confirmations", "audio_video_coexistence_policy", "privacy_mode"],
    "globalConstraints": { "..." : "..." }
  },
  "summary": "Returned library v1.1.27-pair, meta_rayban capabilities (compact: N audio/camera primitives, 8 toggles)."
}

Errors

  • invalid_argumentssections must be a non-empty array — pass at least one of "version", "capabilities"
  • invalid_argumentsglasses is required when sections includes "capabilities" — pass glasses: "meta_rayban"

getCapabilityGuide

Per-feature SDK usage guide — minimal Kotlin + Swift snippet, config args, gotchas, and which getCodeExample patterns exercise the feature. Pairs with getPlatformInfo (which lists feature names + categories) by adding the actual idiom for using each.

When to use

  • When you know which feature you need (from getPlatformInfo) and want the canonical call shape
  • When you're hitting a confusing failure — the gotchas list typically covers it
  • When mapping out the manifest's capabilities array — confirm each entry's call shape before declaring

When NOT to use

  • For a complete compositional pattern — use getCodeExample
  • For capability discovery — use getPlatformInfo

Parameters

ParameterTypeRequiredDescription
featureenumyesFeature name from getPlatformInfo.features[].name. Plus connection_state and toggles for the cross-cutting SDK surfaces.

Accepted feature names span the capability primitives (capture_photo, capture_video, record_audio, transcription_incremental, speak, video_frames, audio_chunks, voice_command), the cross-cutting SDK surfaces (connection_state, toggles), the Phase 4 assistant runtime (assistant_runtime, assistant_start, assistant_tool, assistant_provider_openai, assistant_vision, assistant_session_runtime), and the deprecated Phase 3 conversation runtime (conversation_runtime, conversation_on_wake, conversation_listen, conversation_speak, conversation_ai_complete). The live input-schema enum is derived from the guide catalog, so the schema is the authoritative list.

Response

{
  "feature": "<name>",
  "summary": "<one-line shape>",
  "kotlin": "<minimal Kotlin call shape>",
  "swift": "<minimal Swift call shape>",
  "gotchas": ["<gotcha 1>", "<gotcha 2>", "..."],
  "relatedExamples": ["voice_qa_assistant", "barge_in_speak", "..."]
}

getCodeExample

Full canonical SDK compositions in Kotlin + Swift for the common voice-glasses patterns. Each pattern is a complete handler implementation the agent can peel from when writing real customer code.

When to use

  • When you're about to write handler code and want the canonical shape
  • When a customer asks for a feature that maps cleanly to one of the patterns
  • When you need a paste-ready BYOK client (currently: Anthropic via byok_anthropic) — the LLM-client wrapper that the voice-Q&A and vision patterns reference as AnthropicClient

When NOT to use

  • For capability discovery — use getPlatformInfo
  • For per-feature minimal usage — use getCapabilityGuide

Parameters

ParameterTypeRequiredDescription
patternenum (see the pattern table below — the live schema enum is derived from the served catalog)yesWhich compositional pattern to fetch

The patterns:

Pattern IDComposition
assistant_agent_loopPhase 4 canonical — start here for new voice-assistant work. glasses.assistant.start(provider) { tool(name, description) { body } } plus wake wiring; the model owns wake/turn-taking/intent/confirmation, the customer writes tool bodies against app state.
agent_driven_e2e_full_loopThe agent-side E2E test workflow for Phase 4 assistants — explicit two-step wake, multi-tool sweep, four-channel verification (event log + adb logcat + screencap + library state). Drives the REAL OpenAI Realtime provider as well as Mock. Companion to assistant_agent_loop.
display_browse_detailThe glasses-display two-view navigation pattern (Ray-Ban Display): browse cards ⇄ full-screen detail, Neural-Band slide/pinch/mid-pinch and assistant tools driving one app-side state machine, with show(onBack = ...) back routing and the agent verification recipe.
voice_qa_assistantWake transcript match → speak prompt → recordDiscrete (silence-VAD) → LLM call → speak answer. The canonical multi-turn voice-Q&A flow.
barge_in_speakSpeak with cancellation — TaskGroup-style race between speak() completion and a fresh Final transcript that interrupts.
photo_describe_voiceWake transcript match → speak prompt → capturePhoto → vision LLM → speak description.
live_transcription_uiSubscribe to transcriptions(), stream partials/finals into a Compose / SwiftUI live caption UI.
voice_notesWake transcript match → recordDiscrete with longer max-duration → persist transcript + audio bytes to local storage.
connection_page_setupMinimum-viable scaffold — instantiate ExtentosGlasses, mount ExtentosConnectionPage, log connection state.
byok_anthropicPaste-ready Anthropic Claude API client (text + Vision, OkHttp / URLSession). Returns a sealed LlmResult with Ok / AuthFailure / NetworkError / Empty variants so the handler can give a distinct user-facing message per failure mode. Referenced by voice_qa_assistant and photo_describe_voice as AnthropicClient.
agent_test_loopLEGACY (pre-Phase-4) three-surface verification recipe — for Phase 4 assistants use agent_driven_e2e_full_loop instead.
conversation_agent_loopDEPRECATED Phase 3 conversation runtime (glasses.conversation.onWake { listen() / speak() / ai.complete() }) — deprecated in v1.4.0, removed in v2.0.0. Migration reference only; new code uses assistant_agent_loop.

Response

{
  "pattern": "<id>",
  "title": "<one-line title>",
  "description": "<2-3 sentence framing>",
  "code": {
    "kotlin": "<full Kotlin handler class, ~80-120 lines>",
    "swift": "<full Swift handler class, ~80-120 lines>"
  },
  "explanation": "<paragraph walking through what the code does>",
  "gotchas": ["<gotcha 1>", "<gotcha 2>", "..."],
  "relatedFeatures": ["transcription_incremental", "record_audio", "speak", "..."],
  "requiredDependencies": {
    "android": [{ "configuration": "implementation", "notation": "<maven-coord>", "appliedIn": "app/build.gradle.kts" }],
    "iosSwiftPackages": [{ "url": "<spm-url>", "from": "<version>", "products": ["<product>"] }]
  },
  "requiredPlugins": {
    "android": [{ "id": "<plugin-id>", "version": "<kotlin-version>", "appliedIn": "root build.gradle.kts", "applyFalse": true }]
  }
}

requiredDependencies and requiredPlugins are optional fields populated when a pattern needs Gradle / SPM deps beyond the SDK + stdlib (currently only byok_anthropic, which needs OkHttp + kotlinx-serialization-json on Android plus the Kotlin serialization plugin in BOTH the root and app build.gradle.kts). Agents iterating multiple patterns merge the union of all requiredDependencies / requiredPlugins into the host app's build files.

Example — peel the voice-Q&A composition

{ "pattern": "voice_qa_assistant" }

The response returns a full Kotlin CoachHandler class (and matching Swift) that subscribes to glasses.audio.transcriptions(), matches the wake phrase, calls glasses.audio.speak("What would you like to know?"), then glasses.audio.recordDiscrete(...) with silence-VAD, calls an LLM, and speaks the answer.

Errors

  • invalid_argumentspattern must be one of the ids in the pattern table above. The live input-schema enum is derived from the served catalog, so the schema (not this page) is the authoritative list.

Why these are the first calls

getPlatformInfo answers "what's possible on this vendor?" — without that catalog, the agent is composing blind. getCapabilityGuide answers "how do I call this specific feature?" — the agent calls it once per primitive it plans to use. getCodeExample answers "what does a complete real flow look like end-to-end?" — useful for the catalogued shapes; for ad-hoc business logic, the agent composes from primitives directly.

A typical first-time flow:

1. getPlatformInfo({ sections: ["version", "capabilities"], glasses: "meta_rayban" })
2. getCodeExample({ pattern: <matches what the user asked for> })
3. getCapabilityGuide({ feature: <each primitive the handler will use> })
4. generateConnectionModule(...)
5. <agent writes handler code, peeling from the code example>
6. validateIntegration()