C+
Packages · View as Markdown

agent_gtk

The Linux/BSD GUI backend for the agent surface: it binds the framework-neutral agent_core rules to a live GTK 4 window. It is the sibling of agent_appkit (macOS) — agent_core is shared unchanged, and only this thin bridge is GTK-specific.

  • open(window) walks the live GtkWidget tree into a Surface — the controllable model an agent sees. The walk is a DFS over the GTK 4 child chain (gtk_widget_get_first_child / gtk_widget_get_next_sibling), classifying each widget by its GObject type into an agent_core::Role.
  • Surface::describe returns a live snapshot of the exposed nodes (Vec[UiNode]). A node is exposed by tagging its widget with set_agent_id; untagged widgets are still walked for tree completeness but are not actionable.
  • Authorized actionsclick, set_text, and focus run through the agent_core authorization brain. The real I/O (gtk_widget_activate, gtk_editable_set_text, gtk_widget_grab_focus) only runs on Allowed. Text edits use optimistic-concurrency versioning, so a stale write is rejected.
  • EventsSurface::emit translates a fired widget (an app-installed GObject signal handler) into an agent_core verb offered to a Subscriber.

Curating the surface

Only widgets tagged with set_agent_id are exposed. The id is held as GObject data (g_object_set_data does not copy), so pass a stable NUL-terminated string literal.

import "agent_gtk/agent_gtk" as agent;

// Tag the widgets the agent may see and act on.
agent::set_agent_id(button.raw(), #str_ptr("btn_login\0"));
agent::set_agent_id(entry.raw(),  #str_ptr("user_field\0"));

Read, then act

open snapshots the window; describe reads each node's current frame, text, and enabled state, so it reflects state now — including after a set_text. Each write resolves the agent-id, asks agent_core first, and returns a surface::Outcome (Allowed / NotFound / NotExposed / NotActionable / VersionConflict).

import "agent_gtk/agent_gtk" as agent;
import "agent_core/surface" as surface;

let surf: agent::Surface = agent::open(window.raw());

// READ — the curated UiNode list.
let nodes: agent::Vec[agent::UiNode] = surf.describe();

// WRITE — authorized through agent_core.
let _ = surf.click("btn_login");

// Optimistic concurrency: read a version, then write against it.
let v = surf.text_version("user_field");
let oc = surf.set_text("user_field", "alice", v);
if surface::outcome_eq(oc, surface::Outcome::VersionConflict) {
    // a racing edit landed first — re-read and retry.
}

A UiNode carries { id, role, class_name, frame, is_hidden, text, actionable, parent }. frame is a Rect of f32 fields (GTK lays out in graphene floats, so the coordinates stay faithful, no lossy cast), and parent indexes back into the returned list so the flat snapshot reconstructs the tree.

Platform notes

  • GTK 4 only. Roles are decided with g_type_check_instance_is_a, which is ancestry-aware — a GtkToggleButton answers a GtkButton query, any GtkEditable is an Input — the GTK analogue of AppKit's isKindOfClass:.
  • Single-threaded by contract. Unlike the AppKit backend, there is no main-thread marshaling helper: GTK is not thread-safe, so an app that drives the surface off the GTK main thread must hop threads itself — the same rule all GTK code lives under.
  • Links the GTK 4 stack (gtk-4, gobject-2.0, gio-2.0, glib-2.0). On Debian/Ubuntu install libgtk-4-dev; pango/cairo/graphene resolve transitively. See Targets for cross-build details.

agent_gtk also exposes a backend-neutral mcp_backend() vtable, the seam an MCP bridge plugs into to serve this surface to an external agent — see agent_mcp. For the shared rules underneath both backends, see agent_core.