~cytrogen/gstack (3d901066cdb34602675d41dd70b165a6f1a7347c): BROWSER.md

#Browser — technical details

This document covers the command reference and internals of gstack's headless browser.

#Command reference

Category	Commands	What for
Navigate	`goto`, `back`, `forward`, `reload`, `url`	Get to a page
Read	`text`, `html`, `links`, `forms`, `accessibility`	Extract content
Snapshot	`snapshot [-i] [-c] [-d N] [-s sel]`	Get refs for interaction
Interact	`click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport`	Use the page
Inspect	`js`, `eval`, `css`, `attrs`, `console`, `network`, `cookies`, `storage`, `perf`	Debug and verify
Visual	`screenshot`, `pdf`, `responsive`	See what Claude sees
Compare	`diff <url1> <url2>`	Spot differences between environments
Tabs	`tabs`, `tab`, `newtab`, `closetab`	Multi-page workflows
Multi-step	`chain` (JSON from stdin)	Batch commands in one call

All selector arguments accept CSS selectors or @ref after snapshot. 40+ commands total.

#How it works

gstack's browser is a compiled CLI binary that talks to a persistent local Chromium daemon over HTTP. The CLI is a thin client — it reads a state file, sends a command, and prints the response to stdout. The server does the real work via Playwright.

┌─────────────────────────────────────────────────────────────────┐
│  Claude Code                                                    │
│                                                                 │
│  "browse goto https://staging.myapp.com"                        │
│       │                                                         │
│       ▼                                                         │
│  ┌──────────┐    HTTP POST     ┌──────────────┐                 │
│  │ browse   │ ──────────────── │ Bun HTTP     │                 │
│  │ CLI      │  localhost:9400  │ server       │                 │
│  │          │  Bearer token    │              │                 │
│  │ compiled │ ◄──────────────  │  Playwright  │──── Chromium    │
│  │ binary   │  plain text      │  API calls   │    (headless)   │
│  └──────────┘                  └──────────────┘                 │
│   ~1ms startup                  persistent daemon               │
│                                 auto-starts on first call       │
│                                 auto-stops after 30 min idle    │
└─────────────────────────────────────────────────────────────────┘

#Lifecycle

First call: CLI checks /tmp/browse-server.json for a running server. None found — it spawns bun run browse/src/server.ts in the background. The server launches headless Chromium via Playwright, picks a port (9400-9410), generates a bearer token, writes the state file, and starts accepting HTTP requests. This takes ~3 seconds.
Subsequent calls: CLI reads the state file, sends an HTTP POST with the bearer token, prints the response. ~100-200ms round trip.
Idle shutdown: After 30 minutes with no commands, the server shuts down and cleans up the state file. Next call restarts it automatically.
Crash recovery: If Chromium crashes, the server exits immediately (no self-healing — don't hide failure). The CLI detects the dead server on the next call and starts a fresh one.

#Key components

browse/
├── src/
│   ├── cli.ts              # Thin client — reads state file, sends HTTP, prints response
│   ├── server.ts           # Bun.serve HTTP server — routes commands to Playwright
│   ├── browser-manager.ts  # Chromium lifecycle — launch, tabs, ref map, crash handling
│   ├── snapshot.ts         # Accessibility tree → @ref assignment → Locator map
│   ├── read-commands.ts    # Non-mutating commands (text, html, links, js, css, etc.)
│   ├── write-commands.ts   # Mutating commands (click, fill, select, navigate, etc.)
│   ├── meta-commands.ts    # Server management (status, stop, restart)
│   └── buffers.ts          # Console + network log capture (in-memory + disk flush)
├── test/                   # Integration tests + HTML fixtures
└── dist/
    └── browse              # Compiled binary (~58MB, Bun --compile)

#The snapshot system

The browser's key innovation is ref-based element selection, built on Playwright's accessibility tree API:

page.locator(scope).ariaSnapshot() returns a YAML-like accessibility tree
The snapshot parser assigns refs (@e1, @e2, ...) to each element
For each ref, it builds a Playwright Locator (using getByRole + nth-child)
The ref-to-Locator map is stored on BrowserManager
Later commands like click @e3 look up the Locator and call locator.click()

No DOM mutation. No injected scripts. Just Playwright's native accessibility API.

#Authentication

Each server session generates a random UUID as a bearer token. The token is written to the state file (/tmp/browse-server.json) with chmod 600. Every HTTP request must include Authorization: Bearer <token>. This prevents other processes on the machine from controlling the browser.

#Console and network capture

The server hooks into Playwright's page.on('console') and page.on('response') events. All entries are kept in memory and flushed to disk every second:

Console: /tmp/browse-console.log
Network: /tmp/browse-network.log

The console and network commands read from the in-memory buffers, not disk.

#Multi-workspace support

Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs.

If CONDUCTOR_PORT is set (e.g., by Conductor), the browse port is derived deterministically:

browse_port = CONDUCTOR_PORT - 45600

Workspace	CONDUCTOR_PORT	Browse port	State file
Workspace A	55040	9440	`/tmp/browse-server-9440.json`
Workspace B	55041	9441	`/tmp/browse-server-9441.json`
No Conductor	—	9400 (scan)	`/tmp/browse-server.json`

You can also set BROWSE_PORT directly.

#Environment variables

Variable	Default	Description
`BROWSE_PORT`	0 (auto-scan 9400-9410)	Fixed port for the HTTP server
`CONDUCTOR_PORT`	—	If set, browse port = this - 45600
`BROWSE_IDLE_TIMEOUT`	1800000 (30 min)	Idle shutdown timeout in ms
`BROWSE_STATE_FILE`	`/tmp/browse-server.json`	Path to state file
`BROWSE_SERVER_SCRIPT`	auto-detected	Path to server.ts

#Performance

Tool	First call	Subsequent calls	Context overhead per call
Chrome MCP	~5s	~2-5s	~2000 tokens (schema + protocol)
Playwright MCP	~3s	~1-3s	~1500 tokens (schema + protocol)
gstack browse	~3s	~100-200ms	0 tokens (plain text stdout)

The context overhead difference compounds fast. In a 20-command browser session, MCP tools burn 30,000-40,000 tokens on protocol framing alone. gstack burns zero.

#Why CLI over MCP?

MCP (Model Context Protocol) works well for remote services, but for local browser automation it adds pure overhead:

Context bloat: every MCP call includes full JSON schemas and protocol framing. A simple "get the page text" costs 10x more context tokens than it should.
Connection fragility: persistent WebSocket/stdio connections drop and fail to reconnect.
Unnecessary abstraction: Claude Code already has a Bash tool. A CLI that prints to stdout is the simplest possible interface.

gstack skips all of this. Compiled binary. Plain text in, plain text out. No protocol. No schema. No connection management.

#Acknowledgments

The browser automation layer is built on Playwright by Microsoft. Playwright's accessibility tree API, locator system, and headless Chromium management are what make ref-based interaction possible. The snapshot system — assigning @ref labels to accessibility tree nodes and mapping them back to Playwright Locators — is built entirely on top of Playwright's primitives. Thank you to the Playwright team for building such a solid foundation.

#Development

#Prerequisites

Bun v1.0+
Playwright's Chromium (installed automatically by bun install)

#Quick start

bun install              # install dependencies + Playwright Chromium
bun test                 # run integration tests (~3s)
bun run dev <cmd>        # run CLI from source (no compile)
bun run build            # compile to browse/dist/browse

#Dev mode vs compiled binary

During development, use bun run dev instead of the compiled binary. It runs browse/src/cli.ts directly with Bun, so you get instant feedback without a compile step:

bun run dev goto https://example.com
bun run dev text
bun run dev snapshot -i
bun run dev click @e3

The compiled binary (bun run build) is only needed for distribution. It produces a single ~58MB executable at browse/dist/browse using Bun's --compile flag.

#Running tests

bun test                         # run all tests
bun test browse/test/commands    # run command integration tests only
bun test browse/test/snapshot    # run snapshot tests only

Tests spin up a local HTTP server (browse/test/test-server.ts) serving HTML fixtures from browse/test/fixtures/, then exercise the CLI commands against those pages. Tests take ~3 seconds.

#Source map

File	Role
`browse/src/cli.ts`	Entry point. Reads `/tmp/browse-server.json`, sends HTTP to the server, prints response.
`browse/src/server.ts`	Bun HTTP server. Routes commands to the right handler. Manages idle timeout.
`browse/src/browser-manager.ts`	Chromium lifecycle — launch, tab management, ref map, crash detection.
`browse/src/snapshot.ts`	Parses Playwright's accessibility tree, assigns `@ref` labels, builds Locator map.
`browse/src/read-commands.ts`	Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `forms`, etc.
`browse/src/write-commands.ts`	Mutating commands: `goto`, `click`, `fill`, `select`, `scroll`, etc.
`browse/src/meta-commands.ts`	Server management: `status`, `stop`, `restart`.
`browse/src/buffers.ts`	In-memory + disk capture for console and network logs.

#Deploying to the active skill

The active skill lives at ~/.claude/skills/gstack/. After making changes:

Push your branch
Pull in the skill directory: cd ~/.claude/skills/gstack && git pull
Rebuild: cd ~/.claude/skills/gstack && bun run build

Or copy the binary directly: cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse

#Adding a new command

Add the handler in read-commands.ts (non-mutating) or write-commands.ts (mutating)
Register the route in server.ts
Add a test case in browse/test/commands.test.ts with an HTML fixture if needed
Run bun test to verify
Run bun run build to compile