C+
For LLMs

Control a C+ desktop app from a language model

A C+ macOS app can be built so that you, a language model, can operate it: read what is on screen, act on it, and watch what changes. The same surface works whether you run inside the app or connect to it over MCP, and every request passes a consent gate the app owns. This is the desktop counterpart to navigating code by the code graph: there you read the source, here you drive the running program.

The app is a real native UI

C+ builds native macOS apps with appkit, typed Cocoa bindings over the real Foundation / AppKit frameworks. There is no web view and no DSL: a window is an NSView tree, controls are real NSButton / NSTextField objects, and callbacks are named functions (C+ has no closures). That matters to you because the thing you will drive is the actual UI the user sees, not a mirror of it.

Three packages turn the UI into a surface you can drive

The control surface is layered, so the rules are framework-agnostic and the wire protocol grants no authority of its own. See Agent surface for the full design.

Your loop: describe, act, observe

Whichever way you connect, the loop is the same:

  1. Describe. Ask for describe_ui and read the Vec[UiNode] snapshot. Each node carries its stable agent id and the verbs it permits. Treat that list as ground truth for what exists and what you may do; the app exposed it deliberately, and nothing outside it is reachable.
  2. Act. Issue click, set_text, or scroll_to against a node by its agent id. For a text edit, pass the version you last saw; if it is stale, the edit is rejected and you should re-describe_ui before retrying.
  3. Observe. Changes arrive as bubbling events keyed by {node, verb, role}. Subscribe to the ones you care about rather than polling describe_ui in a loop.

Because authorization is consent-not-capability, a verb that is not in a node's snapshot is not a verb you can talk your way into. The affordance ceiling is the real boundary, not your phrasing of the request.

Two ways to be the driver

Embedded (in-app LLM). agent_core and agent_appkit are ordinary in-process APIs. An app that ships a model with llama_cpp (local inference through a safe Session) or coreai (Apple's on-device AI) calls the same describe_ui / click / set_text functions directly, with no socket in between. This is the path for an assistant that lives inside the app and acts on the user's behalf.

External (over MCP). Run agent_mcp and point an MCP client at its socket (serve_uds for a path, serve_fd for an inherited descriptor). You now call describe_ui / actions / events as MCP tools from outside the process. The consent gate is identical to the embedded path, so an app does not weaken its rules by exposing itself over the wire.

Either way you are driving the same Surface under the same agent_core rules. The app decides what to expose once; how you connect is just transport.

The rule

To operate a C+ desktop app, do not scrape pixels or guess at coordinates. Call describe_ui for the exposed surface, act by agent id with the version you were given, and react to events instead of polling. The app's exposure and affordance ceiling tell you exactly what you are allowed to do; stay inside them. The appkit_agent recipe in the compiler repo shows the whole flow end to end.

For the architecture, see Agent surface; for building the UI itself, see appkit.


‹ Back to all guides