Troubleshooting
Known issues, VM quirks, platform-specific workarounds, and CLI edge cases. Every entry here has been observed against real hardware and software; workarounds are load-bearing.
VM state is non-deterministic
Each run generates different file explorer content, different terminal state. Cross-session metric comparisons are meaningless; re-run baselines when comparing. Use A/B evaluation on captured snapshots instead.
Icon-only buttons pollute ground truth
Toolbar buttons with AX descriptions (e.g., "back", "forward") aren't rendered text. Filter applied to button/toggle-button only; non-button roles (menu-item, image) still carry icon descriptions. Extended GT filtering needed for high-precision scenarios.
macOS OCR has sparse coverage
Apple Vision OCR skips lines in dense monospace output (e.g., terminal
ls listing). Terminal content GT helps Linux/Windows but hurts macOS
(-4.3 pp) because predictions don't exist. Disable terminal GT on
macOS.
AT-SPI coordinate bug on Linux
GTK4 returns (0,0) for all coordinate types. The TestAnyware Linux
agent detects this and computes the offset via an xdotool window
search plus CSD padding. Requires WaylandEnable=false in GDM because
xdotool can't find native Wayland windows.
Order matters in filter chains
Pipeline filters (showing check → role exclusion → text-content inclusion → label extract → icon filter) apply sequentially; later filters only see elements that passed earlier ones. Changing the order changes the result set.
IoU spatial matching is fundamentally broken
AX boxes include padding (menu bar 30 px tall vs OCR 12 px). Median IoU 0.305 even for matched text. Replaced with center-distance: predicts within GT box (±10 px margin) or GT fully contains predict. Not a threshold fix — the metric itself was wrong.
--window on input commands includes Tahoe drop-shadow inset
The --window <name> flag on testanyware input click / mouse-down
/ mouse-up / move / scroll / drag translates caller-supplied
coordinates via the AX-reported window origin. On macOS Tahoe, that
origin includes the window's drop-shadow inset, so every click lands
~40 px below the intended position. The failure is silent: the CLI
reports a successful click; the target UI control is untouched. For
precise targeting, capture a full-screen testanyware screenshot,
read screen-absolute coords off it, and pass those directly without
--window. --window is still useful for approximate targeting
where 40 px drift is tolerable. Surfaced 2026-04-18 during Mini
Browser verification.
testanyware agent set-value fails for NSTextField inside NSStackView
The macOS agent's element resolver does not reliably reach text
fields hosted inside NSStackView containers on Tahoe. Symptom:
--role textfield --window "..." returns "Multiple elements matched";
adding --index N returns "No element found matching query". AppKit
does not always propagate stack-view children through
kAXChildrenAttribute on Tahoe. Workaround until fixed: derive the
field's screen-absolute VNC coords from a full-screen screenshot,
triple-click at those coords to focus and select existing text, then
testanyware input type + testanyware input key return. Backlog
item #8 tracks the fix.
Electron apps on testanyware-golden-linux-24.04 need --disable-gpu
Launching an Electron app (Obsidian, VSCode, Slack, etc.) inside the
Linux golden image produces a completely black framebuffer — X11
reports the window as created and focused,
xdotool getactivewindow getwindowname returns the correct title,
but testanyware screenshot and in-VM scrot both capture pure
black. The virtio-gpu backend under tart on ARM64 Ubuntu 24.04
doesn't expose the GL acceleration Electron expects for compositing,
and Electron falls back to a non-rendering path instead of software
compositing by default. Workaround: launch the app with
--disable-gpu --no-sandbox. Software compositing then renders
correctly into both the VNC framebuffer and local scrot. The
--no-sandbox flag is only needed if chrome-sandbox is not
suid-root (AppImage extraction drops the suid bit; restore it with
sudo chown root:root chrome-sandbox && sudo chmod 4755 chrome-sandbox
and drop --no-sandbox). First observed during the Ravel
Obsidian symlink validation spike (2026-04-13); same symptom
expected for any Chromium-derived app.