TestAnyware is a two-channel, platform-agnostic driver for VMs. The host talks to every VM through exactly two wires:
- VNC channel (RFB over TCP) — pixels in, keyboard/mouse out.
- Agent channel (HTTP/1.1 JSON on port 8648) — accessibility tree, semantic actions, exec, file transfer, window management.
Everything else in the repo — vision pipeline, provisioner, golden images, vendored RFB client — exists to feed or consume those two channels.
Component map
HOST (macOS 14+)
┌────────────────────────────────────────────────────────────────┐
│ cli/ │
│ ┌──────────────┐ ┌────────────────────┐ │
│ │ testanyware │───▶│ TestAnywareDriver │ │
│ │ (CLI bin) │ │ VNC + Agent + │ │
│ └──────────────┘ │ VM lifecycle │ │
│ │ └──────┬─────┬───────┘ │
│ │ │ │ │
│ │ ┌─────────┘ └─────────────┐ │
│ │ │ │ │
│ │ [RFB / port 5900+] [HTTP / port 8648] │
│ │ │ │ │
│ ▼ │ │ │
│ ┌──────────────┐ │ │ │
│ │ provisioner/ │ │ │ │
│ │ scripts │ │ tart / QEMU+swtpm manage VMs below │
│ └──────────────┘ │ │ │
│ │ │ │
│ vision/ │ │ │
│ ┌──────────────┐ │ │ │
│ │ stages: │ │ consumes PNGs captured via VNC │
│ │ window/icon/ │ │ │ │
│ │ drawing │ │ │ │
│ └──────────────┘ │ │ │
└───────────────────┼─────────────────────────────┼──────────────┘
│ │
│ │
┌───────────▼───┐ ┌──────────────▼──────────────┐
│ VM framebuffer│ │ agents/<platform>/ │
│ (RFB server) │ │ ┌──────────┐ ┌──────────┐ │
│ │ │ │ macOS │ │ linux │ │
│ tart: macOS │ │ │ Swift + │ │ Python+ │ │
│ tart: Linux │ │ │ Hbird │ │ http... │ │
│ QEMU: Windows │ │ └──────────┘ └──────────┘ │
└────────────────┘ │ ┌──────────┐ │
│ │ windows │ │
│ │ C#+ │ │
│ │ ASP. │ │
│ │ NET 9 │ │
│ └──────────┘ │
└─────────────────────────────┘
IN-VM AGENTS
Where each piece lives
| Component | Path | Language | Runs on |
|---|---|---|---|
| CLI binary | cli/Sources/testanyware/ |
Swift | Host |
| Driver library (VNC + agent client + VM lifecycle) | cli/Sources/TestAnywareDriver/ |
Swift | Host |
| Wire-format types (host copy) | cli/Sources/TestAnywareAgentProtocol/ |
Swift | Host |
| macOS agent | agents/macos/ |
Swift | In-VM |
| Linux agent | agents/linux/testanyware_agent/ |
Python | In-VM |
| Windows agent | agents/windows/ |
C# | In-VM |
| Vision pipeline | vision/ |
Python (uv workspace) | Host |
| Provisioner (VM lifecycle bash wrappers, autounattend XML) | provisioner/ |
Bash + XML | Host |
| Vendored RFB implementation | vendored/RoyalVNCKit/ |
Swift | Host (linked into driver) |
Isolation notes
- The macOS agent is self-contained. It vendors its own copy of the
TestAnywareAgentProtocolsource tree rather than path-depending on the host CLI package. The agent ships separately as a VM binary; keeping its source self-contained means it can be built in any working copy without the host CLI present. Both sides must agree on the wire shape — seedocs/architecture/agent-protocol.md. - Windows agent is cross-built from macOS.
dotnet build -r win-arm64 --no-self-containedon the host produces the ARM64 Windows binary that ships inside the golden image's autounattend payload. - No cli/linux or cli/windows. The CLI package is flat. Linux host
support is planned via a Rust port — see
LLM_STATE/core/decisions.md.
What the two channels do and don't do
The VNC channel is the only source of pixels and the only sink for raw keyboard/mouse input. It knows nothing about windows, apps, or accessibility.
The agent channel is the only source of semantic structure. It
speaks in windows, roles, and labels; it cannot capture pixels or
synthesize raw input. (It does run exec and file transfer, which
conceptually belong to neither channel but were placed with the agent
to keep the VNC channel strictly RFB.)
This split lets the CLI degrade gracefully: without an agent, you still
have screenshots, video, OCR (via find-text), and every keyboard/mouse
primitive. Adding the agent is purely additive.