A => .gitignore +6 -0
@@ 1,6 @@
+node_modules/
+browse/dist/
+/tmp/
+*.log
+bun.lock
+*.bun-build
A => BROWSER.md +218 -0
@@ 1,218 @@
+# Browser — technical details
+
+This document covers the command reference and internals of gstack's headless browser.
+
+## Command reference
+
+| Category | Commands | What for |
+|----------|----------|----------|
+| Navigate | `goto`, `back`, `forward`, `reload`, `url` | Get to a page |
+| Read | `text`, `html`, `links`, `forms`, `accessibility` | Extract content |
+| Snapshot | `snapshot [-i] [-c] [-d N] [-s sel]` | Get refs for interaction |
+| Interact | `click`, `fill`, `select`, `hover`, `type`, `press`, `scroll`, `wait`, `viewport` | Use the page |
+| Inspect | `js`, `eval`, `css`, `attrs`, `console`, `network`, `cookies`, `storage`, `perf` | Debug and verify |
+| Visual | `screenshot`, `pdf`, `responsive` | See what Claude sees |
+| Compare | `diff <url1> <url2>` | Spot differences between environments |
+| Tabs | `tabs`, `tab`, `newtab`, `closetab` | Multi-page workflows |
+| Multi-step | `chain` (JSON from stdin) | Batch commands in one call |
+
+All selector arguments accept CSS selectors or `@ref` after `snapshot`. 40+ commands total.
+
+## How it works
+
+gstack's browser is a compiled CLI binary that talks to a persistent local Chromium daemon over HTTP. The CLI is a thin client — it reads a state file, sends a command, and prints the response to stdout. The server does the real work via [Playwright](https://playwright.dev/).
+
+```
+┌─────────────────────────────────────────────────────────────────┐
+│ Claude Code │
+│ │
+│ "browse goto https://staging.myapp.com" │
+│ │ │
+│ ▼ │
+│ ┌──────────┐ HTTP POST ┌──────────────┐ │
+│ │ browse │ ──────────────── │ Bun HTTP │ │
+│ │ CLI │ localhost:9400 │ server │ │
+│ │ │ Bearer token │ │ │
+│ │ compiled │ ◄────────────── │ Playwright │──── Chromium │
+│ │ binary │ plain text │ API calls │ (headless) │
+│ └──────────┘ └──────────────┘ │
+│ ~1ms startup persistent daemon │
+│ auto-starts on first call │
+│ auto-stops after 30 min idle │
+└─────────────────────────────────────────────────────────────────┘
+```
+
+### Lifecycle
+
+1. **First call**: CLI checks `/tmp/browse-server.json` for a running server. None found — it spawns `bun run browse/src/server.ts` in the background. The server launches headless Chromium via Playwright, picks a port (9400-9410), generates a bearer token, writes the state file, and starts accepting HTTP requests. This takes ~3 seconds.
+
+2. **Subsequent calls**: CLI reads the state file, sends an HTTP POST with the bearer token, prints the response. ~100-200ms round trip.
+
+3. **Idle shutdown**: After 30 minutes with no commands, the server shuts down and cleans up the state file. Next call restarts it automatically.
+
+4. **Crash recovery**: If Chromium crashes, the server exits immediately (no self-healing — don't hide failure). The CLI detects the dead server on the next call and starts a fresh one.
+
+### Key components
+
+```
+browse/
+├── src/
+│ ├── cli.ts # Thin client — reads state file, sends HTTP, prints response
+│ ├── server.ts # Bun.serve HTTP server — routes commands to Playwright
+│ ├── browser-manager.ts # Chromium lifecycle — launch, tabs, ref map, crash handling
+│ ├── snapshot.ts # Accessibility tree → @ref assignment → Locator map
+│ ├── read-commands.ts # Non-mutating commands (text, html, links, js, css, etc.)
+│ ├── write-commands.ts # Mutating commands (click, fill, select, navigate, etc.)
+│ ├── meta-commands.ts # Server management (status, stop, restart)
+│ └── buffers.ts # Console + network log capture (in-memory + disk flush)
+├── test/ # Integration tests + HTML fixtures
+└── dist/
+ └── browse # Compiled binary (~58MB, Bun --compile)
+```
+
+### The snapshot system
+
+The browser's key innovation is ref-based element selection, built on Playwright's accessibility tree API:
+
+1. `page.locator(scope).ariaSnapshot()` returns a YAML-like accessibility tree
+2. The snapshot parser assigns refs (`@e1`, `@e2`, ...) to each element
+3. For each ref, it builds a Playwright `Locator` (using `getByRole` + nth-child)
+4. The ref-to-Locator map is stored on `BrowserManager`
+5. Later commands like `click @e3` look up the Locator and call `locator.click()`
+
+No DOM mutation. No injected scripts. Just Playwright's native accessibility API.
+
+### Authentication
+
+Each server session generates a random UUID as a bearer token. The token is written to the state file (`/tmp/browse-server.json`) with chmod 600. Every HTTP request must include `Authorization: Bearer <token>`. This prevents other processes on the machine from controlling the browser.
+
+### Console and network capture
+
+The server hooks into Playwright's `page.on('console')` and `page.on('response')` events. All entries are kept in memory and flushed to disk every second:
+
+- Console: `/tmp/browse-console.log`
+- Network: `/tmp/browse-network.log`
+
+The `console` and `network` commands read from the in-memory buffers, not disk.
+
+### Multi-workspace support
+
+Each workspace gets its own isolated browser instance with its own Chromium process, tabs, cookies, and logs.
+
+If `CONDUCTOR_PORT` is set (e.g., by [Conductor](https://conductor.dev)), the browse port is derived deterministically:
+
+```
+browse_port = CONDUCTOR_PORT - 45600
+```
+
+| Workspace | CONDUCTOR_PORT | Browse port | State file |
+|-----------|---------------|-------------|------------|
+| Workspace A | 55040 | 9440 | `/tmp/browse-server-9440.json` |
+| Workspace B | 55041 | 9441 | `/tmp/browse-server-9441.json` |
+| No Conductor | — | 9400 (scan) | `/tmp/browse-server.json` |
+
+You can also set `BROWSE_PORT` directly.
+
+### Environment variables
+
+| Variable | Default | Description |
+|----------|---------|-------------|
+| `BROWSE_PORT` | 0 (auto-scan 9400-9410) | Fixed port for the HTTP server |
+| `CONDUCTOR_PORT` | — | If set, browse port = this - 45600 |
+| `BROWSE_IDLE_TIMEOUT` | 1800000 (30 min) | Idle shutdown timeout in ms |
+| `BROWSE_STATE_FILE` | `/tmp/browse-server.json` | Path to state file |
+| `BROWSE_SERVER_SCRIPT` | auto-detected | Path to server.ts |
+
+### Performance
+
+| Tool | First call | Subsequent calls | Context overhead per call |
+|------|-----------|-----------------|--------------------------|
+| Chrome MCP | ~5s | ~2-5s | ~2000 tokens (schema + protocol) |
+| Playwright MCP | ~3s | ~1-3s | ~1500 tokens (schema + protocol) |
+| **gstack browse** | **~3s** | **~100-200ms** | **0 tokens** (plain text stdout) |
+
+The context overhead difference compounds fast. In a 20-command browser session, MCP tools burn 30,000-40,000 tokens on protocol framing alone. gstack burns zero.
+
+### Why CLI over MCP?
+
+MCP (Model Context Protocol) works well for remote services, but for local browser automation it adds pure overhead:
+
+- **Context bloat**: every MCP call includes full JSON schemas and protocol framing. A simple "get the page text" costs 10x more context tokens than it should.
+- **Connection fragility**: persistent WebSocket/stdio connections drop and fail to reconnect.
+- **Unnecessary abstraction**: Claude Code already has a Bash tool. A CLI that prints to stdout is the simplest possible interface.
+
+gstack skips all of this. Compiled binary. Plain text in, plain text out. No protocol. No schema. No connection management.
+
+## Acknowledgments
+
+The browser automation layer is built on [Playwright](https://playwright.dev/) by Microsoft. Playwright's accessibility tree API, locator system, and headless Chromium management are what make ref-based interaction possible. The snapshot system — assigning `@ref` labels to accessibility tree nodes and mapping them back to Playwright Locators — is built entirely on top of Playwright's primitives. Thank you to the Playwright team for building such a solid foundation.
+
+## Development
+
+### Prerequisites
+
+- [Bun](https://bun.sh/) v1.0+
+- Playwright's Chromium (installed automatically by `bun install`)
+
+### Quick start
+
+```bash
+bun install # install dependencies + Playwright Chromium
+bun test # run integration tests (~3s)
+bun run dev <cmd> # run CLI from source (no compile)
+bun run build # compile to browse/dist/browse
+```
+
+### Dev mode vs compiled binary
+
+During development, use `bun run dev` instead of the compiled binary. It runs `browse/src/cli.ts` directly with Bun, so you get instant feedback without a compile step:
+
+```bash
+bun run dev goto https://example.com
+bun run dev text
+bun run dev snapshot -i
+bun run dev click @e3
+```
+
+The compiled binary (`bun run build`) is only needed for distribution. It produces a single ~58MB executable at `browse/dist/browse` using Bun's `--compile` flag.
+
+### Running tests
+
+```bash
+bun test # run all tests
+bun test browse/test/commands # run command integration tests only
+bun test browse/test/snapshot # run snapshot tests only
+```
+
+Tests spin up a local HTTP server (`browse/test/test-server.ts`) serving HTML fixtures from `browse/test/fixtures/`, then exercise the CLI commands against those pages. Tests take ~3 seconds.
+
+### Source map
+
+| File | Role |
+|------|------|
+| `browse/src/cli.ts` | Entry point. Reads `/tmp/browse-server.json`, sends HTTP to the server, prints response. |
+| `browse/src/server.ts` | Bun HTTP server. Routes commands to the right handler. Manages idle timeout. |
+| `browse/src/browser-manager.ts` | Chromium lifecycle — launch, tab management, ref map, crash detection. |
+| `browse/src/snapshot.ts` | Parses Playwright's accessibility tree, assigns `@ref` labels, builds Locator map. |
+| `browse/src/read-commands.ts` | Non-mutating commands: `text`, `html`, `links`, `js`, `css`, `forms`, etc. |
+| `browse/src/write-commands.ts` | Mutating commands: `goto`, `click`, `fill`, `select`, `scroll`, etc. |
+| `browse/src/meta-commands.ts` | Server management: `status`, `stop`, `restart`. |
+| `browse/src/buffers.ts` | In-memory + disk capture for console and network logs. |
+
+### Deploying to the active skill
+
+The active skill lives at `~/.claude/skills/gstack/`. After making changes:
+
+1. Push your branch
+2. Pull in the skill directory: `cd ~/.claude/skills/gstack && git pull`
+3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
+
+Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
+
+### Adding a new command
+
+1. Add the handler in `read-commands.ts` (non-mutating) or `write-commands.ts` (mutating)
+2. Register the route in `server.ts`
+3. Add a test case in `browse/test/commands.test.ts` with an HTML fixture if needed
+4. Run `bun test` to verify
+5. Run `bun run build` to compile
A => CHANGELOG.md +10 -0
@@ 1,10 @@
+# Changelog
+
+## 0.0.1 — 2026-03-11
+
+Initial release.
+
+- Five skills: `/plan-ceo-review`, `/plan-eng-review`, `/review`, `/ship`, `/browse`
+- Headless browser CLI with 40+ commands, ref-based interaction, persistent Chromium daemon
+- One-command install as Claude Code skills (submodule or global clone)
+- `setup` script for binary compilation and skill symlinking
A => CLAUDE.md +38 -0
@@ 1,38 @@
+# gstack development
+
+## Commands
+
+```bash
+bun install # install dependencies
+bun test # run integration tests (browse + snapshot)
+bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
+bun run build # compile binary to browse/dist/browse
+```
+
+## Project structure
+
+```
+gstack/
+├── browse/ # Headless browser CLI (Playwright)
+│ ├── src/ # CLI + server + commands
+│ ├── test/ # Integration tests + fixtures
+│ └── dist/ # Compiled binary
+├── ship/ # Ship workflow skill
+├── review/ # PR review skill
+├── plan-ceo-review/ # /plan-ceo-review skill
+├── plan-eng-review/ # /plan-eng-review skill
+├── retro/ # Retrospective skill
+├── setup # One-time setup: build binary + symlink skills
+├── SKILL.md # Browse skill (Claude discovers this)
+└── package.json # Build scripts for browse
+```
+
+## Deploying to the active skill
+
+The active skill lives at `~/.claude/skills/gstack/`. After making changes:
+
+1. Push your branch
+2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main`
+3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
+
+Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
A => LICENSE +21 -0
@@ 1,21 @@
+MIT License
+
+Copyright (c) 2026 Garry Tan
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
A => README.md +420 -0
@@ 1,420 @@
+# gstack
+
+**gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.**
+
+Six opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, and engineering retrospectives — all as slash commands.
+
+### Without gstack
+
+- The agent takes your request literally — it never asks if you're building the right thing
+- It will implement exactly what you said, even when the real product is something bigger
+- "Review my PR" gives inconsistent depth every time
+- "Ship this" turns into a long back-and-forth about what to do
+- The agent can write code but can't see your app — it's half blind
+- You still do QA by hand: open browser, click around, check pages, squint at layouts
+
+### With gstack
+
+| Skill | Mode | What it does |
+|-------|------|--------------|
+| `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
+| `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
+| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Not a style nitpick pass. |
+| `/ship` | Release engineer | Sync main, run tests, push, open PR. For a ready branch, not for deciding what to build. |
+| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
+| `/retro` | Engineering manager | Analyze commit history, work patterns, and shipping velocity for the week. |
+
+## Demo: one feature, five modes
+
+I start almost every feature in plan mode. I describe what I want to build, then use `/plan-ceo-review` to pressure-test whether I am even building the right thing. Only after the product direction is locked do I shift into engineering, review, ship, and test.
+
+```
+You: [enter plan mode, describe the feature]
+ I want to add seller photo upload to the listing app. Sellers
+ should be able to upload a photo of their item and we create
+ the listing from it.
+
+You: /plan-ceo-review
+
+Claude: "Photo upload" is not the feature. The real job is helping sellers
+ create listings that actually sell. Here's what the 10-star version
+ looks like: auto-identify the product from the photo, pull specs and
+ pricing comps from the web, draft the title and description, suggest
+ the best hero image...
+
+You: /plan-eng-review
+
+Claude: [Architecture diagram, state machine for upload → classify → enrich
+ → draft pipeline, async job boundaries, failure modes, test matrix]
+
+You: [exit plan mode, implement the plan]
+
+You: /review
+
+Claude: Race condition: two tabs can overwrite cover-photo selection.
+ Trust boundary: pulling web data into draft generation creates
+ a prompt injection vector. Missing: orphan cleanup for failed uploads.
+
+You: (fix the issues)
+
+You: /ship
+
+Claude: [Syncs main, runs tests, pushes branch, opens PR — 6 tool calls, done]
+
+You: /browse staging.myapp.com — test the listing flow
+
+Claude: [22 tool calls — navigates routes, fills the upload form, verifies
+ enrichment renders, checks console for errors, screenshots each step]
+ All pages load correctly. Listing flow works end to end on staging.
+```
+
+## Who this is for
+
+You already use Claude Code heavily and want consistent, high-rigor workflows instead of one mushy generic mode. You want to tell the model what kind of brain to use right now — founder taste, engineering rigor, paranoid review, or fast execution.
+
+This is not a prompt pack for beginners. It is an operating system for people who ship.
+
+## Install
+
+**Requirements:** [Claude Code](https://docs.anthropic.com/en/docs/claude-code), [Git](https://git-scm.com/), [Bun](https://bun.sh/) v1.0+. `/browse` compiles a native binary — works on macOS and Linux (x64 and arm64).
+
+### Step 1: Install on your machine
+
+Open Claude Code and paste this. Claude will do the rest.
+
+> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.
+
+### Step 2: Add to your repo so teammates get it (optional)
+
+> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
+
+Real files get committed to your repo (not a submodule), so `git clone` just works. The binary and node\_modules are gitignored — teammates just need to run `cd .claude/skills/gstack && ./setup` once to build (or `/browse` handles it automatically on first use).
+
+### What gets installed
+
+- Skill files (Markdown prompts) in `~/.claude/skills/gstack/` (or `.claude/skills/gstack/` for project installs)
+- Symlinks at `~/.claude/skills/browse`, `~/.claude/skills/review`, etc. pointing into the gstack directory
+- Browser binary at `browse/dist/browse` (~58MB, gitignored)
+- `node_modules/` (gitignored)
+- `/retro` saves JSON snapshots to `.context/retros/` in your project for trend tracking
+
+Everything lives inside `.claude/`. Nothing touches your PATH or runs in the background.
+
+---
+
+## How I use these skills
+
+Created by [Garry Tan](https://x.com/garrytan), President & CEO of [Y Combinator](https://www.ycombinator.com/).
+
+I built gstack because I do not want AI coding tools stuck in one mushy mode.
+
+Planning is not review. Review is not shipping. Founder taste is not engineering rigor. If you blur all of that together, you usually get a mediocre blend of all four.
+
+I want explicit gears.
+
+These skills let me tell the model what kind of brain I want right now. I can switch cognitive modes on demand — founder, eng manager, paranoid reviewer, release machine. That is the unlock.
+
+---
+
+## `/plan-ceo-review`
+
+This is my **founder mode**.
+
+This is where I want the model to think with taste, ambition, user empathy, and a long time horizon. I do not want it taking the request literally. I want it asking a more important question first:
+
+**What is this product actually for?**
+
+I think of this as **Brian Chesky mode**.
+
+The point is not to implement the obvious ticket. The point is to rethink the problem from the user's point of view and find the version that feels inevitable, delightful, and maybe even a little magical.
+
+### Example
+
+Say I am building a Craigslist-style listing app and I say:
+
+> "Let sellers upload a photo for their item."
+
+A weak assistant will add a file picker and save an image.
+
+That is not the real product.
+
+In `/plan-ceo-review`, I want the model to ask whether "photo upload" is even the feature. Maybe the real feature is helping someone create a listing that actually sells.
+
+If that is the real job, the whole plan changes.
+
+Now the model should ask:
+
+* Can we identify the product from the photo?
+* Can we infer the SKU or model number?
+* Can we search the web and draft the title and description automatically?
+* Can we pull specs, category, and pricing comps?
+* Can we suggest which photo will convert best as the hero image?
+* Can we detect when the uploaded photo is ugly, dark, cluttered, or low-trust?
+* Can we make the experience feel premium instead of like a dead form from 2007?
+
+That is what `/plan-ceo-review` does for me.
+
+It does not just ask, "how do I add this feature?"
+It asks, **"what is the 10-star product hiding inside this request?"**
+
+That is a very different kind of power.
+
+---
+
+## `/plan-eng-review`
+
+This is my **eng manager mode**.
+
+Once the product direction is right, I want a different kind of intelligence entirely. I do not want more sprawling ideation. I do not want more "wouldn't it be cool if." I want the model to become my best technical lead.
+
+This mode should nail:
+
+* architecture
+* system boundaries
+* data flow
+* state transitions
+* failure modes
+* edge cases
+* trust boundaries
+* test coverage
+
+And one surprisingly big unlock for me: **diagrams**.
+
+LLMs get way more complete when you force them to draw the system. Sequence diagrams, state diagrams, component diagrams, data-flow diagrams, even test matrices. Diagrams force hidden assumptions into the open. They make hand-wavy planning much harder.
+
+So `/plan-eng-review` is where I want the model to build the technical spine that can carry the product vision.
+
+### Example
+
+Take the same listing app example.
+
+Let's say `/plan-ceo-review` already did its job. We decided the real feature is not just photo upload. It is a smart listing flow that:
+
+* uploads photos
+* identifies the product
+* enriches the listing from the web
+* drafts a strong title and description
+* suggests the best hero image
+
+Now `/plan-eng-review` takes over.
+
+Now I want the model to answer questions like:
+
+* What is the architecture for upload, classification, enrichment, and draft generation?
+* Which steps happen synchronously, and which go to background jobs?
+* Where are the boundaries between app server, object storage, vision model, search/enrichment APIs, and the listing database?
+* What happens if upload succeeds but enrichment fails?
+* What happens if product identification is low-confidence?
+* How do retries work?
+* How do we prevent duplicate jobs?
+* What gets persisted when, and what can be safely recomputed?
+
+And this is where I want diagrams — architecture diagrams, state models, data-flow diagrams, test matrices. Diagrams force hidden assumptions into the open. They make hand-wavy planning much harder.
+
+That is `/plan-eng-review`.
+
+Not "make the idea smaller."
+**Make the idea buildable.**
+
+---
+
+## `/review`
+
+This is my **paranoid staff engineer mode**.
+
+Passing tests do not mean the branch is safe.
+
+`/review` exists because there is a whole class of bugs that can survive CI and still punch you in the face in production. This mode is not about dreaming bigger. It is not about making the plan prettier. It is about asking:
+
+**What can still break?**
+
+This is a structural audit, not a style nitpick pass. I want the model to look for things like:
+
+* N+1 queries
+* stale reads
+* race conditions
+* bad trust boundaries
+* missing indexes
+* escaping bugs
+* broken invariants
+* bad retry logic
+* tests that pass while missing the real failure mode
+
+### Example
+
+Suppose the smart listing flow is implemented and the tests are green.
+
+`/review` should still ask:
+
+* Did I introduce an N+1 query when rendering listing photos or draft suggestions?
+* Am I trusting client-provided file metadata instead of validating the actual file?
+* Can two tabs race and overwrite cover-photo selection or item details?
+* Do failed uploads leave orphaned files in storage forever?
+* Can the "exactly one hero image" rule break under concurrency?
+* If enrichment APIs partially fail, do I degrade gracefully or save garbage?
+* Did I accidentally create a prompt injection or trust-boundary problem by pulling web data into draft generation?
+
+That is the point of `/review`.
+
+I do not want flattery here.
+I want the model imagining the production incident before it happens.
+
+---
+
+## `/ship`
+
+This is my **release machine mode**.
+
+Once I have decided what to build, nailed the technical plan, and run a serious review, I do not want more talking. I want execution.
+
+`/ship` is for the final mile. It is for a ready branch, not for deciding what to build.
+
+This is where the model should stop behaving like a brainstorm partner and start behaving like a disciplined release engineer: sync with main, run the right tests, make sure the branch state is sane, update changelog or versioning if the repo expects it, push, and create or update the PR.
+
+Momentum matters here.
+
+A lot of branches die when the interesting work is done and only the boring release work is left. Humans procrastinate that part. AI should not.
+
+### Example
+
+Suppose the smart listing flow is finished.
+
+The product thinking is done.
+The architecture is done.
+The review pass is done.
+Now the branch just needs to get landed.
+
+That is what `/ship` is for.
+
+It takes care of the repetitive release hygiene so I do not bleed energy on:
+
+* syncing with main
+* rerunning tests
+* checking for weird branch state
+* updating changelog/version metadata
+* pushing the branch
+* opening or updating the PR
+
+At this point I do not want more ideation.
+I want the plane landed.
+
+---
+
+## `/browse`
+
+This is my **QA engineer mode**.
+
+`/browse` is the skill that closes the loop. Before it, the agent could think and code but was still half blind. It had to guess about UI state, auth flows, redirects, console errors, empty states, and broken layouts. Now it can just go look.
+
+That is a real step-change. The full cycle becomes: plan, code, run the app, inspect the UI, reproduce the bug, verify the fix, ship. QA stops being "go click around manually" and becomes part of the same agentic loop. It gives the agent eyes. Now it can do the boring, high-context QA work itself: click through the app, catch breakage, verify the fix, and keep going. That feels like having a real operator in the machine.
+
+It is a compiled binary that talks to a persistent Chromium daemon — built on [Playwright](https://playwright.dev/) by Microsoft. First call starts the browser (~3s). Every call after that: ~100-200ms. The browser stays running between commands, so cookies, tabs, and localStorage carry over.
+
+After I push a branch, I tell Claude to go check staging. It does the entire QA pass for me:
+
+```
+You: /browse staging.myapp.com — log in, test the signup flow, and check
+ every page I changed in this branch
+
+Claude: [18 tool calls, ~60 seconds]
+
+ > browse goto https://staging.myapp.com/signup
+ > browse snapshot -i
+ > browse fill @e2 "test@example.com"
+ > browse fill @e3 "password123"
+ > browse click @e5 (Submit)
+ > browse screenshot /tmp/signup.png
+ > Read /tmp/signup.png
+
+ Signup works. Redirected to onboarding. Now checking changed pages.
+
+ > browse goto https://staging.myapp.com/dashboard
+ > browse screenshot /tmp/dashboard.png
+ > Read /tmp/dashboard.png
+ > browse console
+
+ Dashboard loads. No console errors. Charts render with sample data.
+
+ > browse goto https://staging.myapp.com/settings/billing
+ > browse screenshot /tmp/billing.png
+ > Read /tmp/billing.png
+
+ Billing page loads. Plan selector shows Free/Pro/Enterprise.
+ Stripe elements render correctly.
+
+ > browse goto https://staging.myapp.com/api/health
+ > browse text
+
+ API health endpoint returns {"status":"ok","version":"2.4.1"}.
+
+ All 4 pages load correctly. No console errors. No broken layouts.
+ Signup → onboarding → dashboard flow works end to end.
+```
+
+18 tool calls, about a minute. It signed up a test user, navigated every page I changed, took screenshots, read them, checked the console for errors, and verified the API. That is a full QA pass. I did not open a browser.
+
+It works for reading documentation, verifying deployments, filling out forms, taking screenshots, comparing pages across environments, and anything else where Claude needs eyes on a live URL.
+
+**Security note:** `/browse` runs a persistent Chromium session. Cookies, localStorage, and session state carry over between commands. Do not use it against sensitive production environments unless you intend to — it is a real browser with real state. The session auto-shuts down after 30 minutes of idle time.
+
+For the full command reference, technical internals, and architecture details, see [BROWSER.md](BROWSER.md).
+
+---
+
+## `/retro`
+
+This is my **engineering manager mode**.
+
+At the end of the week I want to know what actually happened. Not vibes — data. `/retro` analyzes commit history, work patterns, and shipping velocity and writes a candid retrospective.
+
+It computes metrics like commits, LOC, test ratio, PR sizes, and fix ratio. It detects coding sessions from commit timestamps, finds hotspot files, tracks shipping streaks, and identifies the biggest ship of the week.
+
+```
+You: /retro
+
+Claude: Week of Mar 1: 47 commits, 3.2k LOC, 38% tests, 12 PRs, peak: 10pm | Streak: 47d
+
+ [Full retro with summary table, time patterns, session analysis,
+ commit type breakdown, hotspots, focus score, top 3 wins,
+ 3 things to improve, 3 habits for next week]
+```
+
+It saves a JSON snapshot to `.context/retros/` so the next run can show trends. Run `/retro compare` to see this week vs last week side by side.
+
+---
+
+## Troubleshooting
+
+**Skill not showing up in Claude Code?**
+Run `cd ~/.claude/skills/gstack && ./setup` (or `cd .claude/skills/gstack && ./setup` for project installs). This rebuilds symlinks so Claude can discover the skills.
+
+**`/browse` fails or binary not found?**
+Run `cd ~/.claude/skills/gstack && bun install && bun run build`. This compiles the browser binary. Requires Bun v1.0+.
+
+**Project copy is stale?**
+Re-copy from global: `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
+
+**`bun` not installed?**
+Install it: `curl -fsSL https://bun.sh/install | bash`
+
+## Upgrading
+
+Paste this into Claude Code:
+
+> Update gstack: run `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && ./setup`. If this project also has gstack at .claude/skills/gstack, update it too: run `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
+
+The `setup` script rebuilds the browser binary and re-symlinks skills. It takes a few seconds.
+
+## Uninstalling
+
+Paste this into Claude Code:
+
+> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
+
+## Development
+
+See [BROWSER.md](BROWSER.md) for the full development guide, architecture, and command reference.
+
+## License
+
+MIT
A => SKILL.md +254 -0
@@ 1,254 @@
+---
+name: gstack
+version: 1.0.0
+description: |
+ Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL,
+ read page content, click elements, fill forms, run JavaScript, take screenshots,
+ inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after
+ first call. Use when you need to check a website, verify a deployment, read docs,
+ or interact with any web page. No MCP, no Chrome extension — just fast CLI.
+allowed-tools:
+ - Bash
+ - Read
+
+---
+
+# gstack: Persistent Browser for Claude Code
+
+Persistent headless Chromium daemon. First call auto-starts the server (~3s).
+Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle.
+
+## SETUP (run this check BEFORE any browse command)
+
+Before using any browse command, find the skill and check if the binary exists:
+
+```bash
+# Check project-level first, then user-level
+if test -x .claude/skills/gstack/browse/dist/browse; then
+ echo "READY_PROJECT"
+elif test -x ~/.claude/skills/gstack/browse/dist/browse; then
+ echo "READY_USER"
+else
+ echo "NEEDS_SETUP"
+fi
+```
+
+Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist.
+
+If `NEEDS_SETUP`:
+1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response.
+2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run:
+```bash
+cd <SKILL_DIR> && ./setup
+```
+3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash`
+4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it.
+
+Once setup is done, it never needs to run again (the compiled binary persists).
+
+## IMPORTANT
+
+- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user).
+- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
+- The browser persists between calls — cookies, tabs, and state carry over.
+- The server auto-starts on first command. No setup needed.
+
+## Quick Reference
+
+```bash
+B=~/.claude/skills/gstack/browse/dist/browse
+
+# Navigate to a page
+$B goto https://example.com
+
+# Read cleaned page text
+$B text
+
+# Take a screenshot (then Read the image)
+$B screenshot /tmp/page.png
+
+# Snapshot: accessibility tree with refs
+$B snapshot -i
+
+# Click by ref (after snapshot)
+$B click @e3
+
+# Fill by ref
+$B fill @e4 "test@test.com"
+
+# Run JavaScript
+$B js "document.title"
+
+# Get all links
+$B links
+
+# Click by CSS selector
+$B click "button.submit"
+
+# Fill a form by CSS selector
+$B fill "#email" "test@test.com"
+$B fill "#password" "abc123"
+$B click "button[type=submit]"
+
+# Get HTML of an element
+$B html "main"
+
+# Get computed CSS
+$B css "body" "font-family"
+
+# Get element attributes
+$B attrs "nav"
+
+# Wait for element to appear
+$B wait ".loaded"
+
+# Accessibility tree
+$B accessibility
+
+# Set viewport
+$B viewport 375x812
+
+# Set cookies / headers
+$B cookie "session=abc123"
+$B header "Authorization:Bearer token123"
+```
+
+## Command Reference
+
+### Navigation
+```
+browse goto <url> Navigate current tab
+browse back Go back
+browse forward Go forward
+browse reload Reload page
+browse url Print current URL
+```
+
+### Content extraction
+```
+browse text Cleaned page text (no scripts/styles)
+browse html [selector] innerHTML of element, or full page HTML
+browse links All links as "text → href"
+browse forms All forms + fields as JSON
+browse accessibility Accessibility tree snapshot (ARIA)
+```
+
+### Snapshot (ref-based element selection)
+```
+browse snapshot Full accessibility tree with @refs
+browse snapshot -i Interactive elements only (buttons, links, inputs)
+browse snapshot -c Compact (no empty structural elements)
+browse snapshot -d <N> Limit depth to N levels
+browse snapshot -s <sel> Scope to CSS selector
+```
+
+After snapshot, use @refs as selectors in any command:
+```
+browse click @e3 Click the element assigned ref @e3
+browse fill @e4 "value" Fill the input assigned ref @e4
+browse hover @e1 Hover the element assigned ref @e1
+browse html @e2 Get innerHTML of ref @e2
+browse css @e5 "color" Get computed CSS of ref @e5
+browse attrs @e6 Get attributes of ref @e6
+```
+
+Refs are invalidated on navigation — run `snapshot` again after `goto`.
+
+### Interaction
+```
+browse click <selector> Click element (CSS selector or @ref)
+browse fill <selector> <value> Fill input field
+browse select <selector> <val> Select dropdown value
+browse hover <selector> Hover over element
+browse type <text> Type into focused element
+browse press <key> Press key (Enter, Tab, Escape, etc.)
+browse scroll [selector] Scroll element into view, or page bottom
+browse wait <selector> Wait for element to appear (max 10s)
+browse viewport <WxH> Set viewport size (e.g. 375x812)
+```
+
+### Inspection
+```
+browse js <expression> Run JS, print result
+browse eval <js-file> Run JS file against page
+browse css <selector> <prop> Get computed CSS property
+browse attrs <selector> Get element attributes as JSON
+browse console Dump captured console messages
+browse console --clear Clear console buffer
+browse network Dump captured network requests
+browse network --clear Clear network buffer
+browse cookies Dump all cookies as JSON
+browse storage localStorage + sessionStorage as JSON
+browse storage set <key> <val> Set localStorage value
+browse perf Page load performance timings
+```
+
+### Visual
+```
+browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png)
+browse pdf [path] Save as PDF
+browse responsive [prefix] Screenshots at mobile/tablet/desktop
+```
+
+### Compare
+```
+browse diff <url1> <url2> Text diff between two pages
+```
+
+### Multi-step (chain)
+```
+echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain
+```
+
+### Tabs
+```
+browse tabs List tabs (id, url, title)
+browse tab <id> Switch to tab
+browse newtab [url] Open new tab
+browse closetab [id] Close tab
+```
+
+### Server management
+```
+browse status Server health, uptime, tab count
+browse stop Shutdown server
+browse restart Kill + restart server
+```
+
+## Speed Rules
+
+1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly.
+2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors.
+3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text.
+4. **Use `links` to survey.** Faster than `text` when you just need navigation structure.
+5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step.
+6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots.
+
+## When to Use What
+
+| Task | Commands |
+|------|----------|
+| Read a page | `goto <url>` then `text` |
+| Interact with elements | `snapshot -i` then `click @e3` |
+| Check if element exists | `js "!!document.querySelector('.thing')"` |
+| Extract specific data | `js "document.querySelector('.price').textContent"` |
+| Visual check | `screenshot /tmp/x.png` then Read the image |
+| Fill and submit form | `snapshot -i` → `fill @e4 "val"` → `click @e5` → `screenshot` |
+| Check CSS | `css "selector" "property"` or `css @e3 "property"` |
+| Inspect DOM | `html "selector"` or `attrs @e3` |
+| Debug console errors | `console` |
+| Check network requests | `network` |
+| Check local dev | `goto http://127.0.0.1:3000` |
+| Compare two pages | `diff <url1> <url2>` |
+| Mobile layout check | `responsive /tmp/prefix` |
+| Multi-step flow | `echo '[...]' \| browse chain` |
+
+## Architecture
+
+- Persistent Chromium daemon on localhost (port 9400-9410)
+- Bearer token auth per session
+- State file: `/tmp/browse-server.json`
+- Console log: `/tmp/browse-console.log`
+- Network log: `/tmp/browse-network.log`
+- Auto-shutdown after 30 min idle
+- Chromium crash → server exits → auto-restarts on next command
A => TODO.md +78 -0
@@ 1,78 @@
+# TODO — gstack roadmap
+
+## Phase 1: Foundations (v0.2.0)
+ - [x] Rename to gstack
+ - [x] Restructure to monorepo layout
+ - [x] Setup script for skill symlinks
+ - [x] Snapshot command with ref-based element selection
+ - [x] Snapshot tests
+
+## Phase 2: Enhanced Browser
+ - [ ] Annotated screenshots (--annotate flag, numbered labels on elements mapped to refs)
+ - [ ] Snapshot diffing (compare before/after accessibility trees, verify actions worked)
+ - [ ] Dialog handling (dialog accept/dismiss — prevents browser lockup)
+ - [ ] File upload (upload <sel> <files>)
+ - [ ] Cursor-interactive elements (-C flag, detect divs with cursor:pointer/onclick/tabindex)
+ - [ ] Element state checks (is visible/enabled/checked <sel>)
+
+## Phase 3: QA Testing Agent (dogfood skill)
+ - [ ] SKILL.md — 6-phase workflow: Initialize → Authenticate → Orient → Explore → Document → Wrap up
+ - [ ] Issue taxonomy reference (7 categories: visual, functional, UX, content, performance, console, accessibility)
+ - [ ] Severity classification (critical/high/medium/low)
+ - [ ] Exploration checklist per page
+ - [ ] Report template (structured markdown with per-issue evidence)
+ - [ ] Repro-first philosophy: every issue gets evidence before moving on
+ - [ ] Two evidence tiers: interactive bugs (video + step-by-step screenshots), static bugs (single annotated screenshot)
+ - [ ] Video recording (record start/stop for WebM capture via Playwright)
+ - [ ] Key guidance: 5-10 well-documented issues per session, depth over breadth, write incrementally
+
+## Phase 4: Skill + Browser Integration
+ - [ ] ship + browse: post-deploy verification
+ - Browse staging/preview URL after push
+ - Screenshot key pages
+ - Check console for JS errors
+ - Compare staging vs prod via snapshot diff
+ - Include verification screenshots in PR body
+ - STOP if critical errors found
+ - [ ] review + browse: visual diff review
+ - Browse PR's preview deploy
+ - Annotated screenshots of changed pages
+ - Compare against production visually
+ - Check responsive layouts (mobile/tablet/desktop)
+ - Verify accessibility tree hasn't regressed
+ - [ ] deploy-verify skill: lightweight post-deploy smoke test
+ - Hit key URLs, verify 200s
+ - Screenshot critical pages
+ - Console error check
+ - Compare against baseline snapshots
+ - Pass/fail with evidence
+
+## Phase 5: State & Sessions
+ - [ ] Sessions (isolated browser instances with separate cookies/storage/history)
+ - [ ] State persistence (save/load cookies + localStorage to JSON files)
+ - [ ] Auth vault (encrypted credential storage, referenced by name, LLM never sees passwords)
+ - [ ] retro + browse: deployment health tracking
+ - Screenshot production state
+ - Check perf metrics (page load times)
+ - Count console errors across key pages
+ - Track trends over retro window
+
+## Phase 6: Advanced Browser
+ - [ ] Iframe support (frame <sel>, frame main)
+ - [ ] Semantic locators (find role/label/text/placeholder/testid with actions)
+ - [ ] Device emulation presets (set device "iPhone 16 Pro")
+ - [ ] Network mocking/routing (intercept, block, mock requests)
+ - [ ] Download handling (click-to-download with path control)
+ - [ ] Content safety (--max-output truncation, --allowed-domains)
+ - [ ] Streaming (WebSocket live preview for pair browsing)
+ - [ ] CDP mode (connect to already-running Chrome/Electron apps)
+
+## Ideas & Notes
+ - Browser is the nervous system — every skill should be able to see, interact with, and verify the web
+ - Skills are the product; the browser enables them
+ - One repo, one install, entire AI engineering workflow
+ - Bun compiled binary matches Rust CLI performance for this use case (bottleneck is Chromium, not CLI parsing)
+ - Accessibility tree snapshots use ~200-400 tokens vs ~3000-5000 for full DOM — critical for AI context efficiency
+ - Locator map approach for refs: store Map<string, Locator> on BrowserManager, no DOM mutation, no CSP issues
+ - Snapshot scoping (-i, -c, -d, -s flags) is critical for performance on large pages
+ - All new commands follow existing pattern: add to command set, add switch case, return string
A => VERSION +1 -0
A => browse/SKILL.md +254 -0
@@ 1,254 @@
+---
+name: browse
+version: 1.0.0
+description: |
+ Fast web browsing for Claude Code via persistent headless Chromium daemon. Navigate to any URL,
+ read page content, click elements, fill forms, run JavaScript, take screenshots,
+ inspect CSS/DOM, capture console/network logs, and more. ~100ms per command after
+ first call. Use when you need to check a website, verify a deployment, read docs,
+ or interact with any web page. No MCP, no Chrome extension — just fast CLI.
+allowed-tools:
+ - Bash
+ - Read
+
+---
+
+# gstack: Persistent Browser for Claude Code
+
+Persistent headless Chromium daemon. First call auto-starts the server (~3s).
+Every subsequent call: ~100-200ms. Auto-shuts down after 30 min idle.
+
+## SETUP (run this check BEFORE any browse command)
+
+Before using any browse command, find the skill and check if the binary exists:
+
+```bash
+# Check project-level first, then user-level
+if test -x .claude/skills/gstack/browse/dist/browse; then
+ echo "READY_PROJECT"
+elif test -x ~/.claude/skills/gstack/browse/dist/browse; then
+ echo "READY_USER"
+else
+ echo "NEEDS_SETUP"
+fi
+```
+
+Set `B` to whichever path is READY and use it for all commands. Prefer project-level if both exist.
+
+If `NEEDS_SETUP`:
+1. Tell the user: "gstack browse needs a one-time build (~10 seconds). OK to proceed?" Then STOP and wait for their response.
+2. If they approve, determine the skill directory (project-level `.claude/skills/gstack` or user-level `~/.claude/skills/gstack`) and run:
+```bash
+cd <SKILL_DIR> && ./setup
+```
+3. If `bun` is not installed, tell the user to install it: `curl -fsSL https://bun.sh/install | bash`
+4. Verify the `.gitignore` in the skill directory contains `browse/dist/` and `node_modules/`. If either line is missing, add it.
+
+Once setup is done, it never needs to run again (the compiled binary persists).
+
+## IMPORTANT
+
+- Use the compiled binary via Bash: `.claude/skills/gstack/browse/dist/browse` (project) or `~/.claude/skills/gstack/browse/dist/browse` (user).
+- NEVER use `mcp__claude-in-chrome__*` tools. They are slow and unreliable.
+- The browser persists between calls — cookies, tabs, and state carry over.
+- The server auto-starts on first command. No setup needed.
+
+## Quick Reference
+
+```bash
+B=~/.claude/skills/gstack/browse/dist/browse
+
+# Navigate to a page
+$B goto https://example.com
+
+# Read cleaned page text
+$B text
+
+# Take a screenshot (then Read the image)
+$B screenshot /tmp/page.png
+
+# Snapshot: accessibility tree with refs
+$B snapshot -i
+
+# Click by ref (after snapshot)
+$B click @e3
+
+# Fill by ref
+$B fill @e4 "test@test.com"
+
+# Run JavaScript
+$B js "document.title"
+
+# Get all links
+$B links
+
+# Click by CSS selector
+$B click "button.submit"
+
+# Fill a form by CSS selector
+$B fill "#email" "test@test.com"
+$B fill "#password" "abc123"
+$B click "button[type=submit]"
+
+# Get HTML of an element
+$B html "main"
+
+# Get computed CSS
+$B css "body" "font-family"
+
+# Get element attributes
+$B attrs "nav"
+
+# Wait for element to appear
+$B wait ".loaded"
+
+# Accessibility tree
+$B accessibility
+
+# Set viewport
+$B viewport 375x812
+
+# Set cookies / headers
+$B cookie "session=abc123"
+$B header "Authorization:Bearer token123"
+```
+
+## Command Reference
+
+### Navigation
+```
+browse goto <url> Navigate current tab
+browse back Go back
+browse forward Go forward
+browse reload Reload page
+browse url Print current URL
+```
+
+### Content extraction
+```
+browse text Cleaned page text (no scripts/styles)
+browse html [selector] innerHTML of element, or full page HTML
+browse links All links as "text → href"
+browse forms All forms + fields as JSON
+browse accessibility Accessibility tree snapshot (ARIA)
+```
+
+### Snapshot (ref-based element selection)
+```
+browse snapshot Full accessibility tree with @refs
+browse snapshot -i Interactive elements only (buttons, links, inputs)
+browse snapshot -c Compact (no empty structural elements)
+browse snapshot -d <N> Limit depth to N levels
+browse snapshot -s <sel> Scope to CSS selector
+```
+
+After snapshot, use @refs as selectors in any command:
+```
+browse click @e3 Click the element assigned ref @e3
+browse fill @e4 "value" Fill the input assigned ref @e4
+browse hover @e1 Hover the element assigned ref @e1
+browse html @e2 Get innerHTML of ref @e2
+browse css @e5 "color" Get computed CSS of ref @e5
+browse attrs @e6 Get attributes of ref @e6
+```
+
+Refs are invalidated on navigation — run `snapshot` again after `goto`.
+
+### Interaction
+```
+browse click <selector> Click element (CSS selector or @ref)
+browse fill <selector> <value> Fill input field
+browse select <selector> <val> Select dropdown value
+browse hover <selector> Hover over element
+browse type <text> Type into focused element
+browse press <key> Press key (Enter, Tab, Escape, etc.)
+browse scroll [selector] Scroll element into view, or page bottom
+browse wait <selector> Wait for element to appear (max 10s)
+browse viewport <WxH> Set viewport size (e.g. 375x812)
+```
+
+### Inspection
+```
+browse js <expression> Run JS, print result
+browse eval <js-file> Run JS file against page
+browse css <selector> <prop> Get computed CSS property
+browse attrs <selector> Get element attributes as JSON
+browse console Dump captured console messages
+browse console --clear Clear console buffer
+browse network Dump captured network requests
+browse network --clear Clear network buffer
+browse cookies Dump all cookies as JSON
+browse storage localStorage + sessionStorage as JSON
+browse storage set <key> <val> Set localStorage value
+browse perf Page load performance timings
+```
+
+### Visual
+```
+browse screenshot [path] Screenshot (default: /tmp/browse-screenshot.png)
+browse pdf [path] Save as PDF
+browse responsive [prefix] Screenshots at mobile/tablet/desktop
+```
+
+### Compare
+```
+browse diff <url1> <url2> Text diff between two pages
+```
+
+### Multi-step (chain)
+```
+echo '[["goto","https://example.com"],["snapshot","-i"],["click","@e1"],["screenshot","/tmp/result.png"]]' | browse chain
+```
+
+### Tabs
+```
+browse tabs List tabs (id, url, title)
+browse tab <id> Switch to tab
+browse newtab [url] Open new tab
+browse closetab [id] Close tab
+```
+
+### Server management
+```
+browse status Server health, uptime, tab count
+browse stop Shutdown server
+browse restart Kill + restart server
+```
+
+## Speed Rules
+
+1. **Navigate once, query many times.** `goto` loads the page; then `text`, `js`, `css`, `screenshot` all run against the loaded page instantly.
+2. **Use `snapshot -i` for interaction.** Get refs for all interactive elements, then click/fill by ref. No need to guess CSS selectors.
+3. **Use `js` for precision.** `js "document.querySelector('.price').textContent"` is faster than parsing full page text.
+4. **Use `links` to survey.** Faster than `text` when you just need navigation structure.
+5. **Use `chain` for multi-step flows.** Avoids CLI overhead per step.
+6. **Use `responsive` for layout checks.** One command = 3 viewport screenshots.
+
+## When to Use What
+
+| Task | Commands |
+|------|----------|
+| Read a page | `goto <url>` then `text` |
+| Interact with elements | `snapshot -i` then `click @e3` |
+| Check if element exists | `js "!!document.querySelector('.thing')"` |
+| Extract specific data | `js "document.querySelector('.price').textContent"` |
+| Visual check | `screenshot /tmp/x.png` then Read the image |
+| Fill and submit form | `snapshot -i` → `fill @e4 "val"` → `click @e5` → `screenshot` |
+| Check CSS | `css "selector" "property"` or `css @e3 "property"` |
+| Inspect DOM | `html "selector"` or `attrs @e3` |
+| Debug console errors | `console` |
+| Check network requests | `network` |
+| Check local dev | `goto http://127.0.0.1:3000` |
+| Compare two pages | `diff <url1> <url2>` |
+| Mobile layout check | `responsive /tmp/prefix` |
+| Multi-step flow | `echo '[...]' \| browse chain` |
+
+## Architecture
+
+- Persistent Chromium daemon on localhost (port 9400-9410)
+- Bearer token auth per session
+- State file: `/tmp/browse-server.json`
+- Console log: `/tmp/browse-console.log`
+- Network log: `/tmp/browse-network.log`
+- Auto-shutdown after 30 min idle
+- Chromium crash → server exits → auto-restarts on next command
A => browse/src/browser-manager.ts +253 -0
@@ 1,253 @@
+/**
+ * Browser lifecycle manager
+ *
+ * Chromium crash handling:
+ * browser.on('disconnected') → log error → process.exit(1)
+ * CLI detects dead server → auto-restarts on next command
+ * We do NOT try to self-heal — don't hide failure.
+ */
+
+import { chromium, type Browser, type BrowserContext, type Page, type Locator } from 'playwright';
+import { addConsoleEntry, addNetworkEntry, networkBuffer, type LogEntry, type NetworkEntry } from './buffers';
+
+export class BrowserManager {
+ private browser: Browser | null = null;
+ private context: BrowserContext | null = null;
+ private pages: Map<number, Page> = new Map();
+ private activeTabId: number = 0;
+ private nextTabId: number = 1;
+ private extraHeaders: Record<string, string> = {};
+ private customUserAgent: string | null = null;
+
+ // ─── Ref Map (snapshot → @e1, @e2, ...) ────────────────────
+ private refMap: Map<string, Locator> = new Map();
+
+ async launch() {
+ this.browser = await chromium.launch({ headless: true });
+
+ // Chromium crash → exit with clear message
+ this.browser.on('disconnected', () => {
+ console.error('[browse] FATAL: Chromium process crashed or was killed. Server exiting.');
+ console.error('[browse] Console/network logs flushed to /tmp/browse-*.log');
+ process.exit(1);
+ });
+
+ this.context = await this.browser.newContext({
+ viewport: { width: 1280, height: 720 },
+ });
+
+ // Create first tab
+ await this.newTab();
+ }
+
+ async close() {
+ if (this.browser) {
+ // Remove disconnect handler to avoid exit during intentional close
+ this.browser.removeAllListeners('disconnected');
+ await this.browser.close();
+ this.browser = null;
+ }
+ }
+
+ isHealthy(): boolean {
+ return this.browser !== null && this.browser.isConnected();
+ }
+
+ // ─── Tab Management ────────────────────────────────────────
+ async newTab(url?: string): Promise<number> {
+ if (!this.context) throw new Error('Browser not launched');
+
+ const page = await this.context.newPage();
+ const id = this.nextTabId++;
+ this.pages.set(id, page);
+ this.activeTabId = id;
+
+ // Wire up console/network capture
+ this.wirePageEvents(page);
+
+ if (url) {
+ await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 15000 });
+ }
+
+ return id;
+ }
+
+ async closeTab(id?: number): Promise<void> {
+ const tabId = id ?? this.activeTabId;
+ const page = this.pages.get(tabId);
+ if (!page) throw new Error(`Tab ${tabId} not found`);
+
+ await page.close();
+ this.pages.delete(tabId);
+
+ // Switch to another tab if we closed the active one
+ if (tabId === this.activeTabId) {
+ const remaining = [...this.pages.keys()];
+ if (remaining.length > 0) {
+ this.activeTabId = remaining[remaining.length - 1];
+ } else {
+ // No tabs left — create a new blank one
+ await this.newTab();
+ }
+ }
+ }
+
+ switchTab(id: number): void {
+ if (!this.pages.has(id)) throw new Error(`Tab ${id} not found`);
+ this.activeTabId = id;
+ }
+
+ getTabCount(): number {
+ return this.pages.size;
+ }
+
+ getTabList(): Array<{ id: number; url: string; title: string; active: boolean }> {
+ const tabs: Array<{ id: number; url: string; title: string; active: boolean }> = [];
+ for (const [id, page] of this.pages) {
+ tabs.push({
+ id,
+ url: page.url(),
+ title: '', // title requires await, populated by caller
+ active: id === this.activeTabId,
+ });
+ }
+ return tabs;
+ }
+
+ async getTabListWithTitles(): Promise<Array<{ id: number; url: string; title: string; active: boolean }>> {
+ const tabs: Array<{ id: number; url: string; title: string; active: boolean }> = [];
+ for (const [id, page] of this.pages) {
+ tabs.push({
+ id,
+ url: page.url(),
+ title: await page.title().catch(() => ''),
+ active: id === this.activeTabId,
+ });
+ }
+ return tabs;
+ }
+
+ // ─── Page Access ───────────────────────────────────────────
+ getPage(): Page {
+ const page = this.pages.get(this.activeTabId);
+ if (!page) throw new Error('No active page. Use "browse goto <url>" first.');
+ return page;
+ }
+
+ getCurrentUrl(): string {
+ try {
+ return this.getPage().url();
+ } catch {
+ return 'about:blank';
+ }
+ }
+
+ // ─── Ref Map ──────────────────────────────────────────────
+ setRefMap(refs: Map<string, Locator>) {
+ this.refMap = refs;
+ }
+
+ clearRefs() {
+ this.refMap.clear();
+ }
+
+ /**
+ * Resolve a selector that may be a @ref (e.g., "@e3") or a CSS selector.
+ * Returns { locator } for refs or { selector } for CSS selectors.
+ */
+ resolveRef(selector: string): { locator: Locator } | { selector: string } {
+ if (selector.startsWith('@e')) {
+ const ref = selector.slice(1); // "e3"
+ const locator = this.refMap.get(ref);
+ if (!locator) {
+ throw new Error(
+ `Ref ${selector} not found. Page may have changed — run 'snapshot' to get fresh refs.`
+ );
+ }
+ return { locator };
+ }
+ return { selector };
+ }
+
+ getRefCount(): number {
+ return this.refMap.size;
+ }
+
+ // ─── Viewport ──────────────────────────────────────────────
+ async setViewport(width: number, height: number) {
+ await this.getPage().setViewportSize({ width, height });
+ }
+
+ // ─── Extra Headers ─────────────────────────────────────────
+ async setExtraHeader(name: string, value: string) {
+ this.extraHeaders[name] = value;
+ if (this.context) {
+ await this.context.setExtraHTTPHeaders(this.extraHeaders);
+ }
+ }
+
+ // ─── User Agent ────────────────────────────────────────────
+ // Note: user agent changes require a new context in Playwright
+ // For simplicity, we just store it and apply on next "restart"
+ setUserAgent(ua: string) {
+ this.customUserAgent = ua;
+ }
+
+ // ─── Console/Network/Ref Wiring ────────────────────────────
+ private wirePageEvents(page: Page) {
+ // Clear ref map on navigation — refs point to stale elements after page change
+ page.on('framenavigated', (frame) => {
+ if (frame === page.mainFrame()) {
+ this.clearRefs();
+ }
+ });
+
+ page.on('console', (msg) => {
+ addConsoleEntry({
+ timestamp: Date.now(),
+ level: msg.type(),
+ text: msg.text(),
+ });
+ });
+
+ page.on('request', (req) => {
+ addNetworkEntry({
+ timestamp: Date.now(),
+ method: req.method(),
+ url: req.url(),
+ });
+ });
+
+ page.on('response', (res) => {
+ // Find matching request entry and update it
+ const url = res.url();
+ const status = res.status();
+ for (let i = networkBuffer.length - 1; i >= 0; i--) {
+ if (networkBuffer[i].url === url && !networkBuffer[i].status) {
+ networkBuffer[i].status = status;
+ networkBuffer[i].duration = Date.now() - networkBuffer[i].timestamp;
+ break;
+ }
+ }
+ });
+
+ // Capture response sizes via response finished
+ page.on('requestfinished', async (req) => {
+ try {
+ const res = await req.response();
+ if (res) {
+ const url = req.url();
+ const body = await res.body().catch(() => null);
+ const size = body ? body.length : 0;
+ for (let i = networkBuffer.length - 1; i >= 0; i--) {
+ if (networkBuffer[i].url === url && !networkBuffer[i].size) {
+ networkBuffer[i].size = size;
+ break;
+ }
+ }
+ }
+ } catch {}
+ });
+ }
+}
+
A => browse/src/buffers.ts +44 -0
@@ 1,44 @@
+/**
+ * Shared buffers and types — extracted to break circular dependency
+ * between server.ts and browser-manager.ts
+ */
+
+export interface LogEntry {
+ timestamp: number;
+ level: string;
+ text: string;
+}
+
+export interface NetworkEntry {
+ timestamp: number;
+ method: string;
+ url: string;
+ status?: number;
+ duration?: number;
+ size?: number;
+}
+
+export const consoleBuffer: LogEntry[] = [];
+export const networkBuffer: NetworkEntry[] = [];
+const HIGH_WATER_MARK = 50_000;
+
+// Total entries ever added — used by server.ts flush logic as a cursor
+// that keeps advancing even after the ring buffer wraps.
+export let consoleTotalAdded = 0;
+export let networkTotalAdded = 0;
+
+export function addConsoleEntry(entry: LogEntry) {
+ if (consoleBuffer.length >= HIGH_WATER_MARK) {
+ consoleBuffer.shift();
+ }
+ consoleBuffer.push(entry);
+ consoleTotalAdded++;
+}
+
+export function addNetworkEntry(entry: NetworkEntry) {
+ if (networkBuffer.length >= HIGH_WATER_MARK) {
+ networkBuffer.shift();
+ }
+ networkBuffer.push(entry);
+ networkTotalAdded++;
+}
A => browse/src/cli.ts +221 -0
@@ 1,221 @@
+/**
+ * gstack CLI — thin wrapper that talks to the persistent server
+ *
+ * Flow:
+ * 1. Read /tmp/browse-server.json for port + token
+ * 2. If missing or stale PID → start server in background
+ * 3. Health check
+ * 4. Send command via HTTP POST
+ * 5. Print response to stdout (or stderr for errors)
+ */
+
+import * as fs from 'fs';
+import * as path from 'path';
+
+const PORT_OFFSET = 45600;
+const BROWSE_PORT = process.env.CONDUCTOR_PORT
+ ? parseInt(process.env.CONDUCTOR_PORT, 10) - PORT_OFFSET
+ : parseInt(process.env.BROWSE_PORT || '0', 10);
+const INSTANCE_SUFFIX = BROWSE_PORT ? `-${BROWSE_PORT}` : '';
+const STATE_FILE = process.env.BROWSE_STATE_FILE || `/tmp/browse-server${INSTANCE_SUFFIX}.json`;
+// When compiled, import.meta.dir is virtual. Use env var or well-known path.
+const SERVER_SCRIPT = process.env.BROWSE_SERVER_SCRIPT
+ || (import.meta.dir.startsWith('/') && !import.meta.dir.includes('$bunfs')
+ ? path.resolve(import.meta.dir, 'server.ts')
+ : path.resolve(process.env.HOME || '/tmp', '.claude/skills/gstack/browse/src/server.ts'));
+const MAX_START_WAIT = 8000; // 8 seconds to start
+
+interface ServerState {
+ pid: number;
+ port: number;
+ token: string;
+ startedAt: string;
+ serverPath: string;
+}
+
+// ─── State File ────────────────────────────────────────────────
+function readState(): ServerState | null {
+ try {
+ const data = fs.readFileSync(STATE_FILE, 'utf-8');
+ return JSON.parse(data);
+ } catch {
+ return null;
+ }
+}
+
+function isProcessAlive(pid: number): boolean {
+ try {
+ process.kill(pid, 0);
+ return true;
+ } catch {
+ return false;
+ }
+}
+
+// ─── Server Lifecycle ──────────────────────────────────────────
+async function startServer(): Promise<ServerState> {
+ // Clean up stale state file
+ try { fs.unlinkSync(STATE_FILE); } catch {}
+
+ // Start server as detached background process
+ const proc = Bun.spawn(['bun', 'run', SERVER_SCRIPT], {
+ stdio: ['ignore', 'pipe', 'pipe'],
+ env: { ...process.env },
+ });
+
+ // Don't hold the CLI open
+ proc.unref();
+
+ // Wait for state file to appear
+ const start = Date.now();
+ while (Date.now() - start < MAX_START_WAIT) {
+ const state = readState();
+ if (state && isProcessAlive(state.pid)) {
+ return state;
+ }
+ await Bun.sleep(100);
+ }
+
+ // If we get here, server didn't start in time
+ // Try to read stderr for error message
+ const stderr = proc.stderr;
+ if (stderr) {
+ const reader = stderr.getReader();
+ const { value } = await reader.read();
+ if (value) {
+ const errText = new TextDecoder().decode(value);
+ throw new Error(`Server failed to start:\n${errText}`);
+ }
+ }
+ throw new Error(`Server failed to start within ${MAX_START_WAIT / 1000}s`);
+}
+
+async function ensureServer(): Promise<ServerState> {
+ const state = readState();
+
+ if (state && isProcessAlive(state.pid)) {
+ // Server appears alive — do a health check
+ try {
+ const resp = await fetch(`http://127.0.0.1:${state.port}/health`, {
+ signal: AbortSignal.timeout(2000),
+ });
+ if (resp.ok) {
+ const health = await resp.json() as any;
+ if (health.status === 'healthy') {
+ return state;
+ }
+ }
+ } catch {
+ // Health check failed — server is dead or unhealthy
+ }
+ }
+
+ // Need to (re)start
+ console.error('[browse] Starting server...');
+ return startServer();
+}
+
+// ─── Command Dispatch ──────────────────────────────────────────
+async function sendCommand(state: ServerState, command: string, args: string[], retries = 0): Promise<void> {
+ const body = JSON.stringify({ command, args });
+
+ try {
+ const resp = await fetch(`http://127.0.0.1:${state.port}/command`, {
+ method: 'POST',
+ headers: {
+ 'Content-Type': 'application/json',
+ 'Authorization': `Bearer ${state.token}`,
+ },
+ body,
+ signal: AbortSignal.timeout(30000),
+ });
+
+ if (resp.status === 401) {
+ // Token mismatch — server may have restarted
+ console.error('[browse] Auth failed — server may have restarted. Retrying...');
+ const newState = readState();
+ if (newState && newState.token !== state.token) {
+ return sendCommand(newState, command, args);
+ }
+ throw new Error('Authentication failed');
+ }
+
+ const text = await resp.text();
+
+ if (resp.ok) {
+ process.stdout.write(text);
+ if (!text.endsWith('\n')) process.stdout.write('\n');
+ } else {
+ // Try to parse as JSON error
+ try {
+ const err = JSON.parse(text);
+ console.error(err.error || text);
+ if (err.hint) console.error(err.hint);
+ } catch {
+ console.error(text);
+ }
+ process.exit(1);
+ }
+ } catch (err: any) {
+ if (err.name === 'AbortError') {
+ console.error('[browse] Command timed out after 30s');
+ process.exit(1);
+ }
+ // Connection error — server may have crashed
+ if (err.code === 'ECONNREFUSED' || err.code === 'ECONNRESET' || err.message?.includes('fetch failed')) {
+ if (retries >= 1) throw new Error('[browse] Server crashed twice in a row — aborting');
+ console.error('[browse] Server connection lost. Restarting...');
+ const newState = await startServer();
+ return sendCommand(newState, command, args, retries + 1);
+ }
+ throw err;
+ }
+}
+
+// ─── Main ──────────────────────────────────────────────────────
+async function main() {
+ const args = process.argv.slice(2);
+
+ if (args.length === 0 || args[0] === '--help' || args[0] === '-h') {
+ console.log(`gstack browse — Fast headless browser for AI coding agents
+
+Usage: browse <command> [args...]
+
+Navigation: goto <url> | back | forward | reload | url
+Content: text | html [sel] | links | forms | accessibility
+Interaction: click <sel> | fill <sel> <val> | select <sel> <val>
+ hover <sel> | type <text> | press <key>
+ scroll [sel] | wait <sel> | viewport <WxH>
+Inspection: js <expr> | eval <file> | css <sel> <prop> | attrs <sel>
+ console [--clear] | network [--clear]
+ cookies | storage [set <k> <v>] | perf
+Visual: screenshot [path] | pdf [path] | responsive [prefix]
+Snapshot: snapshot [-i] [-c] [-d N] [-s sel]
+Compare: diff <url1> <url2>
+Multi-step: chain (reads JSON from stdin)
+Tabs: tabs | tab <id> | newtab [url] | closetab [id]
+Server: status | cookie <n>=<v> | header <n>:<v>
+ useragent <str> | stop | restart
+
+Refs: After 'snapshot', use @e1, @e2... as selectors:
+ click @e3 | fill @e4 "value" | hover @e1`);
+ process.exit(0);
+ }
+
+ const command = args[0];
+ const commandArgs = args.slice(1);
+
+ // Special case: chain reads from stdin
+ if (command === 'chain' && commandArgs.length === 0) {
+ const stdin = await Bun.stdin.text();
+ commandArgs.push(stdin.trim());
+ }
+
+ const state = await ensureServer();
+ await sendCommand(state, command, commandArgs);
+}
+
+main().catch((err) => {
+ console.error(`[browse] ${err.message}`);
+ process.exit(1);
+});
A => browse/src/meta-commands.ts +199 -0
@@ 1,199 @@
+/**
+ * Meta commands — tabs, server control, screenshots, chain, diff, snapshot
+ */
+
+import type { BrowserManager } from './browser-manager';
+import { handleSnapshot } from './snapshot';
+import * as Diff from 'diff';
+import * as fs from 'fs';
+
+export async function handleMetaCommand(
+ command: string,
+ args: string[],
+ bm: BrowserManager,
+ shutdown: () => Promise<void> | void
+): Promise<string> {
+ switch (command) {
+ // ─── Tabs ──────────────────────────────────────────
+ case 'tabs': {
+ const tabs = await bm.getTabListWithTitles();
+ return tabs.map(t =>
+ `${t.active ? '→ ' : ' '}[${t.id}] ${t.title || '(untitled)'} — ${t.url}`
+ ).join('\n');
+ }
+
+ case 'tab': {
+ const id = parseInt(args[0], 10);
+ if (isNaN(id)) throw new Error('Usage: browse tab <id>');
+ bm.switchTab(id);
+ return `Switched to tab ${id}`;
+ }
+
+ case 'newtab': {
+ const url = args[0];
+ const id = await bm.newTab(url);
+ return `Opened tab ${id}${url ? ` → ${url}` : ''}`;
+ }
+
+ case 'closetab': {
+ const id = args[0] ? parseInt(args[0], 10) : undefined;
+ await bm.closeTab(id);
+ return `Closed tab${id ? ` ${id}` : ''}`;
+ }
+
+ // ─── Server Control ────────────────────────────────
+ case 'status': {
+ const page = bm.getPage();
+ const tabs = bm.getTabCount();
+ return [
+ `Status: healthy`,
+ `URL: ${page.url()}`,
+ `Tabs: ${tabs}`,
+ `PID: ${process.pid}`,
+ ].join('\n');
+ }
+
+ case 'url': {
+ return bm.getCurrentUrl();
+ }
+
+ case 'stop': {
+ await shutdown();
+ return 'Server stopped';
+ }
+
+ case 'restart': {
+ // Signal that we want a restart — the CLI will detect exit and restart
+ console.log('[browse] Restart requested. Exiting for CLI to restart.');
+ await shutdown();
+ return 'Restarting...';
+ }
+
+ // ─── Visual ────────────────────────────────────────
+ case 'screenshot': {
+ const page = bm.getPage();
+ const screenshotPath = args[0] || '/tmp/browse-screenshot.png';
+ await page.screenshot({ path: screenshotPath, fullPage: true });
+ return `Screenshot saved: ${screenshotPath}`;
+ }
+
+ case 'pdf': {
+ const page = bm.getPage();
+ const pdfPath = args[0] || '/tmp/browse-page.pdf';
+ await page.pdf({ path: pdfPath, format: 'A4' });
+ return `PDF saved: ${pdfPath}`;
+ }
+
+ case 'responsive': {
+ const page = bm.getPage();
+ const prefix = args[0] || '/tmp/browse-responsive';
+ const viewports = [
+ { name: 'mobile', width: 375, height: 812 },
+ { name: 'tablet', width: 768, height: 1024 },
+ { name: 'desktop', width: 1280, height: 720 },
+ ];
+ const originalViewport = page.viewportSize();
+ const results: string[] = [];
+
+ for (const vp of viewports) {
+ await page.setViewportSize({ width: vp.width, height: vp.height });
+ const path = `${prefix}-${vp.name}.png`;
+ await page.screenshot({ path, fullPage: true });
+ results.push(`${vp.name} (${vp.width}x${vp.height}): ${path}`);
+ }
+
+ // Restore original viewport
+ if (originalViewport) {
+ await page.setViewportSize(originalViewport);
+ }
+
+ return results.join('\n');
+ }
+
+ // ─── Chain ─────────────────────────────────────────
+ case 'chain': {
+ // Read JSON array from args[0] (if provided) or expect it was passed as body
+ const jsonStr = args[0];
+ if (!jsonStr) throw new Error('Usage: echo \'[["goto","url"],["text"]]\' | browse chain');
+
+ let commands: string[][];
+ try {
+ commands = JSON.parse(jsonStr);
+ } catch {
+ throw new Error('Invalid JSON. Expected: [["command", "arg1", "arg2"], ...]');
+ }
+
+ if (!Array.isArray(commands)) throw new Error('Expected JSON array of commands');
+
+ const results: string[] = [];
+ const { handleReadCommand } = await import('./read-commands');
+ const { handleWriteCommand } = await import('./write-commands');
+
+ const WRITE_SET = new Set(['goto','back','forward','reload','click','fill','select','hover','type','press','scroll','wait','viewport','cookie','header','useragent']);
+ const READ_SET = new Set(['text','html','links','forms','accessibility','js','eval','css','attrs','console','network','cookies','storage','perf']);
+
+ for (const cmd of commands) {
+ const [name, ...cmdArgs] = cmd;
+ try {
+ let result: string;
+ if (WRITE_SET.has(name)) result = await handleWriteCommand(name, cmdArgs, bm);
+ else if (READ_SET.has(name)) result = await handleReadCommand(name, cmdArgs, bm);
+ else result = await handleMetaCommand(name, cmdArgs, bm, shutdown);
+ results.push(`[${name}] ${result}`);
+ } catch (err: any) {
+ results.push(`[${name}] ERROR: ${err.message}`);
+ }
+ }
+
+ return results.join('\n\n');
+ }
+
+ // ─── Diff ──────────────────────────────────────────
+ case 'diff': {
+ const [url1, url2] = args;
+ if (!url1 || !url2) throw new Error('Usage: browse diff <url1> <url2>');
+
+ // Get text from URL1
+ const page = bm.getPage();
+ await page.goto(url1, { waitUntil: 'domcontentloaded', timeout: 15000 });
+ const text1 = await page.evaluate(() => {
+ const body = document.body;
+ if (!body) return '';
+ const clone = body.cloneNode(true) as HTMLElement;
+ clone.querySelectorAll('script, style, noscript, svg').forEach(el => el.remove());
+ return clone.innerText.split('\n').map(l => l.trim()).filter(l => l).join('\n');
+ });
+
+ // Get text from URL2
+ await page.goto(url2, { waitUntil: 'domcontentloaded', timeout: 15000 });
+ const text2 = await page.evaluate(() => {
+ const body = document.body;
+ if (!body) return '';
+ const clone = body.cloneNode(true) as HTMLElement;
+ clone.querySelectorAll('script, style, noscript, svg').forEach(el => el.remove());
+ return clone.innerText.split('\n').map(l => l.trim()).filter(l => l).join('\n');
+ });
+
+ const changes = Diff.diffLines(text1, text2);
+ const output: string[] = [`--- ${url1}`, `+++ ${url2}`, ''];
+
+ for (const part of changes) {
+ const prefix = part.added ? '+' : part.removed ? '-' : ' ';
+ const lines = part.value.split('\n').filter(l => l.length > 0);
+ for (const line of lines) {
+ output.push(`${prefix} ${line}`);
+ }
+ }
+
+ return output.join('\n');
+ }
+
+ // ─── Snapshot ─────────────────────────────────────
+ case 'snapshot': {
+ return await handleSnapshot(args, bm);
+ }
+
+ default:
+ throw new Error(`Unknown meta command: ${command}`);
+ }
+}
A => browse/src/read-commands.ts +221 -0
@@ 1,221 @@
+/**
+ * Read commands — extract data from pages without side effects
+ *
+ * text, html, links, forms, accessibility, js, eval, css, attrs,
+ * console, network, cookies, storage, perf
+ */
+
+import type { BrowserManager } from './browser-manager';
+import { consoleBuffer, networkBuffer } from './buffers';
+import * as fs from 'fs';
+
+export async function handleReadCommand(
+ command: string,
+ args: string[],
+ bm: BrowserManager
+): Promise<string> {
+ const page = bm.getPage();
+
+ switch (command) {
+ case 'text': {
+ return await page.evaluate(() => {
+ const body = document.body;
+ if (!body) return '';
+ const clone = body.cloneNode(true) as HTMLElement;
+ clone.querySelectorAll('script, style, noscript, svg').forEach(el => el.remove());
+ return clone.innerText
+ .split('\n')
+ .map(line => line.trim())
+ .filter(line => line.length > 0)
+ .join('\n');
+ });
+ }
+
+ case 'html': {
+ const selector = args[0];
+ if (selector) {
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ return await resolved.locator.innerHTML({ timeout: 5000 });
+ }
+ return await page.innerHTML(resolved.selector);
+ }
+ return await page.content();
+ }
+
+ case 'links': {
+ const links = await page.evaluate(() =>
+ [...document.querySelectorAll('a[href]')].map(a => ({
+ text: a.textContent?.trim().slice(0, 120) || '',
+ href: (a as HTMLAnchorElement).href,
+ })).filter(l => l.text && l.href)
+ );
+ return links.map(l => `${l.text} → ${l.href}`).join('\n');
+ }
+
+ case 'forms': {
+ const forms = await page.evaluate(() => {
+ return [...document.querySelectorAll('form')].map((form, i) => {
+ const fields = [...form.querySelectorAll('input, select, textarea')].map(el => {
+ const input = el as HTMLInputElement;
+ return {
+ tag: el.tagName.toLowerCase(),
+ type: input.type || undefined,
+ name: input.name || undefined,
+ id: input.id || undefined,
+ placeholder: input.placeholder || undefined,
+ required: input.required || undefined,
+ value: input.value || undefined,
+ options: el.tagName === 'SELECT'
+ ? [...(el as HTMLSelectElement).options].map(o => ({ value: o.value, text: o.text }))
+ : undefined,
+ };
+ });
+ return {
+ index: i,
+ action: form.action || undefined,
+ method: form.method || 'get',
+ id: form.id || undefined,
+ fields,
+ };
+ });
+ });
+ return JSON.stringify(forms, null, 2);
+ }
+
+ case 'accessibility': {
+ const snapshot = await page.locator("body").ariaSnapshot();
+ return snapshot;
+ }
+
+ case 'js': {
+ const expr = args[0];
+ if (!expr) throw new Error('Usage: browse js <expression>');
+ const result = await page.evaluate(expr);
+ return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
+ }
+
+ case 'eval': {
+ const filePath = args[0];
+ if (!filePath) throw new Error('Usage: browse eval <js-file>');
+ if (!fs.existsSync(filePath)) throw new Error(`File not found: ${filePath}`);
+ const code = fs.readFileSync(filePath, 'utf-8');
+ const result = await page.evaluate(code);
+ return typeof result === 'object' ? JSON.stringify(result, null, 2) : String(result ?? '');
+ }
+
+ case 'css': {
+ const [selector, property] = args;
+ if (!selector || !property) throw new Error('Usage: browse css <selector> <property>');
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ const value = await resolved.locator.evaluate(
+ (el, prop) => getComputedStyle(el).getPropertyValue(prop),
+ property
+ );
+ return value;
+ }
+ const value = await page.evaluate(
+ ([sel, prop]) => {
+ const el = document.querySelector(sel);
+ if (!el) return `Element not found: ${sel}`;
+ return getComputedStyle(el).getPropertyValue(prop);
+ },
+ [resolved.selector, property]
+ );
+ return value;
+ }
+
+ case 'attrs': {
+ const selector = args[0];
+ if (!selector) throw new Error('Usage: browse attrs <selector>');
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ const attrs = await resolved.locator.evaluate((el) => {
+ const result: Record<string, string> = {};
+ for (const attr of el.attributes) {
+ result[attr.name] = attr.value;
+ }
+ return result;
+ });
+ return JSON.stringify(attrs, null, 2);
+ }
+ const attrs = await page.evaluate((sel) => {
+ const el = document.querySelector(sel);
+ if (!el) return `Element not found: ${sel}`;
+ const result: Record<string, string> = {};
+ for (const attr of el.attributes) {
+ result[attr.name] = attr.value;
+ }
+ return result;
+ }, resolved.selector);
+ return typeof attrs === 'string' ? attrs : JSON.stringify(attrs, null, 2);
+ }
+
+ case 'console': {
+ if (args[0] === '--clear') {
+ consoleBuffer.length = 0;
+ return 'Console buffer cleared.';
+ }
+ if (consoleBuffer.length === 0) return '(no console messages)';
+ return consoleBuffer.map(e =>
+ `[${new Date(e.timestamp).toISOString()}] [${e.level}] ${e.text}`
+ ).join('\n');
+ }
+
+ case 'network': {
+ if (args[0] === '--clear') {
+ networkBuffer.length = 0;
+ return 'Network buffer cleared.';
+ }
+ if (networkBuffer.length === 0) return '(no network requests)';
+ return networkBuffer.map(e =>
+ `${e.method} ${e.url} → ${e.status || 'pending'} (${e.duration || '?'}ms, ${e.size || '?'}B)`
+ ).join('\n');
+ }
+
+ case 'cookies': {
+ const cookies = await page.context().cookies();
+ return JSON.stringify(cookies, null, 2);
+ }
+
+ case 'storage': {
+ if (args[0] === 'set' && args[1]) {
+ const key = args[1];
+ const value = args[2] || '';
+ await page.evaluate(([k, v]) => localStorage.setItem(k, v), [key, value]);
+ return `Set localStorage["${key}"] = "${value}"`;
+ }
+ const storage = await page.evaluate(() => ({
+ localStorage: { ...localStorage },
+ sessionStorage: { ...sessionStorage },
+ }));
+ return JSON.stringify(storage, null, 2);
+ }
+
+ case 'perf': {
+ const timings = await page.evaluate(() => {
+ const nav = performance.getEntriesByType('navigation')[0] as PerformanceNavigationTiming;
+ if (!nav) return 'No navigation timing data available.';
+ return {
+ dns: Math.round(nav.domainLookupEnd - nav.domainLookupStart),
+ tcp: Math.round(nav.connectEnd - nav.connectStart),
+ ssl: Math.round(nav.secureConnectionStart > 0 ? nav.connectEnd - nav.secureConnectionStart : 0),
+ ttfb: Math.round(nav.responseStart - nav.requestStart),
+ download: Math.round(nav.responseEnd - nav.responseStart),
+ domParse: Math.round(nav.domInteractive - nav.responseEnd),
+ domReady: Math.round(nav.domContentLoadedEventEnd - nav.startTime),
+ load: Math.round(nav.loadEventEnd - nav.startTime),
+ total: Math.round(nav.loadEventEnd - nav.startTime),
+ };
+ });
+ if (typeof timings === 'string') return timings;
+ return Object.entries(timings)
+ .map(([k, v]) => `${k.padEnd(12)} ${v}ms`)
+ .join('\n');
+ }
+
+ default:
+ throw new Error(`Unknown read command: ${command}`);
+ }
+}
A => browse/src/server.ts +268 -0
@@ 1,268 @@
+/**
+ * gstack browse server — persistent Chromium daemon
+ *
+ * Architecture:
+ * Bun.serve HTTP on localhost → routes commands to Playwright
+ * Console/network buffers: in-memory (all entries) + disk flush every 1s
+ * Chromium crash → server EXITS with clear error (CLI auto-restarts)
+ * Auto-shutdown after BROWSE_IDLE_TIMEOUT (default 30 min)
+ */
+
+import { BrowserManager } from './browser-manager';
+import { handleReadCommand } from './read-commands';
+import { handleWriteCommand } from './write-commands';
+import { handleMetaCommand } from './meta-commands';
+import * as fs from 'fs';
+import * as path from 'path';
+import * as crypto from 'crypto';
+
+// ─── Auth (inline) ─────────────────────────────────────────────
+const AUTH_TOKEN = crypto.randomUUID();
+const PORT_OFFSET = 45600;
+const BROWSE_PORT = process.env.CONDUCTOR_PORT
+ ? parseInt(process.env.CONDUCTOR_PORT, 10) - PORT_OFFSET
+ : parseInt(process.env.BROWSE_PORT || '0', 10); // 0 = auto-scan
+const INSTANCE_SUFFIX = BROWSE_PORT ? `-${BROWSE_PORT}` : '';
+const STATE_FILE = process.env.BROWSE_STATE_FILE || `/tmp/browse-server${INSTANCE_SUFFIX}.json`;
+const IDLE_TIMEOUT_MS = parseInt(process.env.BROWSE_IDLE_TIMEOUT || '1800000', 10); // 30 min
+
+function validateAuth(req: Request): boolean {
+ const header = req.headers.get('authorization');
+ return header === `Bearer ${AUTH_TOKEN}`;
+}
+
+// ─── Buffer (from buffers.ts) ────────────────────────────────────
+import { consoleBuffer, networkBuffer, addConsoleEntry, addNetworkEntry, consoleTotalAdded, networkTotalAdded, type LogEntry, type NetworkEntry } from './buffers';
+export { consoleBuffer, networkBuffer, addConsoleEntry, addNetworkEntry, type LogEntry, type NetworkEntry };
+const CONSOLE_LOG_PATH = `/tmp/browse-console${INSTANCE_SUFFIX}.log`;
+const NETWORK_LOG_PATH = `/tmp/browse-network${INSTANCE_SUFFIX}.log`;
+let lastConsoleFlushed = 0;
+let lastNetworkFlushed = 0;
+
+function flushBuffers() {
+ // Use totalAdded cursor (not buffer.length) because the ring buffer
+ // stays pinned at HIGH_WATER_MARK after wrapping.
+ const newConsoleCount = consoleTotalAdded - lastConsoleFlushed;
+ if (newConsoleCount > 0) {
+ const count = Math.min(newConsoleCount, consoleBuffer.length);
+ const newEntries = consoleBuffer.slice(-count);
+ const lines = newEntries.map(e =>
+ `[${new Date(e.timestamp).toISOString()}] [${e.level}] ${e.text}`
+ ).join('\n') + '\n';
+ fs.appendFileSync(CONSOLE_LOG_PATH, lines);
+ lastConsoleFlushed = consoleTotalAdded;
+ }
+
+ const newNetworkCount = networkTotalAdded - lastNetworkFlushed;
+ if (newNetworkCount > 0) {
+ const count = Math.min(newNetworkCount, networkBuffer.length);
+ const newEntries = networkBuffer.slice(-count);
+ const lines = newEntries.map(e =>
+ `[${new Date(e.timestamp).toISOString()}] ${e.method} ${e.url} → ${e.status || 'pending'} (${e.duration || '?'}ms, ${e.size || '?'}B)`
+ ).join('\n') + '\n';
+ fs.appendFileSync(NETWORK_LOG_PATH, lines);
+ lastNetworkFlushed = networkTotalAdded;
+ }
+}
+
+// Flush every 1 second
+const flushInterval = setInterval(flushBuffers, 1000);
+
+// ─── Idle Timer ────────────────────────────────────────────────
+let lastActivity = Date.now();
+
+function resetIdleTimer() {
+ lastActivity = Date.now();
+}
+
+const idleCheckInterval = setInterval(() => {
+ if (Date.now() - lastActivity > IDLE_TIMEOUT_MS) {
+ console.log(`[browse] Idle for ${IDLE_TIMEOUT_MS / 1000}s, shutting down`);
+ shutdown();
+ }
+}, 60_000);
+
+// ─── Server ────────────────────────────────────────────────────
+const browserManager = new BrowserManager();
+let isShuttingDown = false;
+
+// Read/write/meta command sets for routing
+const READ_COMMANDS = new Set([
+ 'text', 'html', 'links', 'forms', 'accessibility',
+ 'js', 'eval', 'css', 'attrs',
+ 'console', 'network', 'cookies', 'storage', 'perf',
+]);
+
+const WRITE_COMMANDS = new Set([
+ 'goto', 'back', 'forward', 'reload',
+ 'click', 'fill', 'select', 'hover', 'type', 'press', 'scroll', 'wait',
+ 'viewport', 'cookie', 'header', 'useragent',
+]);
+
+const META_COMMANDS = new Set([
+ 'tabs', 'tab', 'newtab', 'closetab',
+ 'status', 'stop', 'restart',
+ 'screenshot', 'pdf', 'responsive',
+ 'chain', 'diff',
+ 'url', 'snapshot',
+]);
+
+// Find port: deterministic from CONDUCTOR_PORT, or scan range
+async function findPort(): Promise<number> {
+ // Deterministic port from CONDUCTOR_PORT (e.g., 55040 - 45600 = 9440)
+ if (BROWSE_PORT) {
+ try {
+ const testServer = Bun.serve({ port: BROWSE_PORT, fetch: () => new Response('ok') });
+ testServer.stop();
+ return BROWSE_PORT;
+ } catch {
+ throw new Error(`[browse] Port ${BROWSE_PORT} (from CONDUCTOR_PORT ${process.env.CONDUCTOR_PORT}) is in use`);
+ }
+ }
+
+ // Fallback: scan range
+ const start = parseInt(process.env.BROWSE_PORT_START || '9400', 10);
+ for (let port = start; port < start + 10; port++) {
+ try {
+ const testServer = Bun.serve({ port, fetch: () => new Response('ok') });
+ testServer.stop();
+ return port;
+ } catch {
+ continue;
+ }
+ }
+ throw new Error(`[browse] No available port in range ${start}-${start + 9}`);
+}
+
+async function handleCommand(body: any): Promise<Response> {
+ const { command, args = [] } = body;
+
+ if (!command) {
+ return new Response(JSON.stringify({ error: 'Missing "command" field' }), {
+ status: 400,
+ headers: { 'Content-Type': 'application/json' },
+ });
+ }
+
+ try {
+ let result: string;
+
+ if (READ_COMMANDS.has(command)) {
+ result = await handleReadCommand(command, args, browserManager);
+ } else if (WRITE_COMMANDS.has(command)) {
+ result = await handleWriteCommand(command, args, browserManager);
+ } else if (META_COMMANDS.has(command)) {
+ result = await handleMetaCommand(command, args, browserManager, shutdown);
+ } else {
+ return new Response(JSON.stringify({
+ error: `Unknown command: ${command}`,
+ hint: `Available commands: ${[...READ_COMMANDS, ...WRITE_COMMANDS, ...META_COMMANDS].sort().join(', ')}`,
+ }), {
+ status: 400,
+ headers: { 'Content-Type': 'application/json' },
+ });
+ }
+
+ return new Response(result, {
+ status: 200,
+ headers: { 'Content-Type': 'text/plain' },
+ });
+ } catch (err: any) {
+ return new Response(JSON.stringify({ error: err.message }), {
+ status: 500,
+ headers: { 'Content-Type': 'application/json' },
+ });
+ }
+}
+
+async function shutdown() {
+ if (isShuttingDown) return;
+ isShuttingDown = true;
+
+ console.log('[browse] Shutting down...');
+ clearInterval(flushInterval);
+ clearInterval(idleCheckInterval);
+ flushBuffers(); // Final flush
+
+ await browserManager.close();
+
+ // Clean up state file
+ try { fs.unlinkSync(STATE_FILE); } catch {}
+
+ process.exit(0);
+}
+
+// Handle signals
+process.on('SIGTERM', shutdown);
+process.on('SIGINT', shutdown);
+
+// ─── Start ─────────────────────────────────────────────────────
+async function start() {
+ // Clear old log files
+ try { fs.unlinkSync(CONSOLE_LOG_PATH); } catch {}
+ try { fs.unlinkSync(NETWORK_LOG_PATH); } catch {}
+
+ const port = await findPort();
+
+ // Launch browser
+ await browserManager.launch();
+
+ const startTime = Date.now();
+ const server = Bun.serve({
+ port,
+ hostname: '127.0.0.1',
+ fetch: async (req) => {
+ resetIdleTimer();
+
+ const url = new URL(req.url);
+
+ // Health check — no auth required
+ if (url.pathname === '/health') {
+ const healthy = browserManager.isHealthy();
+ return new Response(JSON.stringify({
+ status: healthy ? 'healthy' : 'unhealthy',
+ uptime: Math.floor((Date.now() - startTime) / 1000),
+ tabs: browserManager.getTabCount(),
+ currentUrl: browserManager.getCurrentUrl(),
+ }), {
+ status: 200,
+ headers: { 'Content-Type': 'application/json' },
+ });
+ }
+
+ // All other endpoints require auth
+ if (!validateAuth(req)) {
+ return new Response(JSON.stringify({ error: 'Unauthorized' }), {
+ status: 401,
+ headers: { 'Content-Type': 'application/json' },
+ });
+ }
+
+ if (url.pathname === '/command' && req.method === 'POST') {
+ const body = await req.json();
+ return handleCommand(body);
+ }
+
+ return new Response('Not found', { status: 404 });
+ },
+ });
+
+ // Write state file
+ const state = {
+ pid: process.pid,
+ port,
+ token: AUTH_TOKEN,
+ startedAt: new Date().toISOString(),
+ serverPath: path.resolve(import.meta.dir, 'server.ts'),
+ };
+ fs.writeFileSync(STATE_FILE, JSON.stringify(state, null, 2), { mode: 0o600 });
+
+ console.log(`[browse] Server running on http://127.0.0.1:${port} (PID: ${process.pid})`);
+ console.log(`[browse] State file: ${STATE_FILE}`);
+ console.log(`[browse] Idle timeout: ${IDLE_TIMEOUT_MS / 1000}s`);
+}
+
+start().catch((err) => {
+ console.error(`[browse] Failed to start: ${err.message}`);
+ process.exit(1);
+});
A => browse/src/snapshot.ts +212 -0
@@ 1,212 @@
+/**
+ * Snapshot command — accessibility tree with ref-based element selection
+ *
+ * Architecture (Locator map — no DOM mutation):
+ * 1. page.locator(scope).ariaSnapshot() → YAML-like accessibility tree
+ * 2. Parse tree, assign refs @e1, @e2, ...
+ * 3. Build Playwright Locator for each ref (getByRole + nth)
+ * 4. Store Map<string, Locator> on BrowserManager
+ * 5. Return compact text output with refs prepended
+ *
+ * Later: "click @e3" → look up Locator → locator.click()
+ */
+
+import type { Page, Locator } from 'playwright';
+import type { BrowserManager } from './browser-manager';
+
+// Roles considered "interactive" for the -i flag
+const INTERACTIVE_ROLES = new Set([
+ 'button', 'link', 'textbox', 'checkbox', 'radio', 'combobox',
+ 'listbox', 'menuitem', 'menuitemcheckbox', 'menuitemradio',
+ 'option', 'searchbox', 'slider', 'spinbutton', 'switch', 'tab',
+ 'treeitem',
+]);
+
+interface SnapshotOptions {
+ interactive?: boolean; // -i: only interactive elements
+ compact?: boolean; // -c: remove empty structural elements
+ depth?: number; // -d N: limit tree depth
+ selector?: string; // -s SEL: scope to CSS selector
+}
+
+interface ParsedNode {
+ indent: number;
+ role: string;
+ name: string | null;
+ props: string; // e.g., "[level=1]"
+ children: string; // inline text content after ":"
+ rawLine: string;
+}
+
+/**
+ * Parse CLI args into SnapshotOptions
+ */
+export function parseSnapshotArgs(args: string[]): SnapshotOptions {
+ const opts: SnapshotOptions = {};
+ for (let i = 0; i < args.length; i++) {
+ switch (args[i]) {
+ case '-i':
+ case '--interactive':
+ opts.interactive = true;
+ break;
+ case '-c':
+ case '--compact':
+ opts.compact = true;
+ break;
+ case '-d':
+ case '--depth':
+ opts.depth = parseInt(args[++i], 10);
+ if (isNaN(opts.depth!)) throw new Error('Usage: snapshot -d <number>');
+ break;
+ case '-s':
+ case '--selector':
+ opts.selector = args[++i];
+ if (!opts.selector) throw new Error('Usage: snapshot -s <selector>');
+ break;
+ default:
+ throw new Error(`Unknown snapshot flag: ${args[i]}`);
+ }
+ }
+ return opts;
+}
+
+/**
+ * Parse one line of ariaSnapshot output.
+ *
+ * Format examples:
+ * - heading "Test" [level=1]
+ * - link "Link A":
+ * - /url: /a
+ * - textbox "Name"
+ * - paragraph: Some text
+ * - combobox "Role":
+ */
+function parseLine(line: string): ParsedNode | null {
+ // Match: (indent)(- )(role)( "name")?( [props])?(: inline)?
+ const match = line.match(/^(\s*)-\s+(\w+)(?:\s+"([^"]*)")?(?:\s+(\[.*?\]))?\s*(?::\s*(.*))?$/);
+ if (!match) {
+ // Skip metadata lines like "- /url: /a"
+ return null;
+ }
+ return {
+ indent: match[1].length,
+ role: match[2],
+ name: match[3] ?? null,
+ props: match[4] || '',
+ children: match[5]?.trim() || '',
+ rawLine: line,
+ };
+}
+
+/**
+ * Take an accessibility snapshot and build the ref map.
+ */
+export async function handleSnapshot(
+ args: string[],
+ bm: BrowserManager
+): Promise<string> {
+ const opts = parseSnapshotArgs(args);
+ const page = bm.getPage();
+
+ // Get accessibility tree via ariaSnapshot
+ let rootLocator: Locator;
+ if (opts.selector) {
+ rootLocator = page.locator(opts.selector);
+ const count = await rootLocator.count();
+ if (count === 0) throw new Error(`Selector not found: ${opts.selector}`);
+ } else {
+ rootLocator = page.locator('body');
+ }
+
+ const ariaText = await rootLocator.ariaSnapshot();
+ if (!ariaText || ariaText.trim().length === 0) {
+ bm.setRefMap(new Map());
+ return '(no accessible elements found)';
+ }
+
+ // Parse the ariaSnapshot output
+ const lines = ariaText.split('\n');
+ const refMap = new Map<string, Locator>();
+ const output: string[] = [];
+ let refCounter = 1;
+
+ // Track role+name occurrences for nth() disambiguation
+ const roleNameCounts = new Map<string, number>();
+ const roleNameSeen = new Map<string, number>();
+
+ // First pass: count role+name pairs for disambiguation
+ for (const line of lines) {
+ const node = parseLine(line);
+ if (!node) continue;
+ const key = `${node.role}:${node.name || ''}`;
+ roleNameCounts.set(key, (roleNameCounts.get(key) || 0) + 1);
+ }
+
+ // Second pass: assign refs and build locators
+ for (const line of lines) {
+ const node = parseLine(line);
+ if (!node) continue;
+
+ const depth = Math.floor(node.indent / 2);
+ const isInteractive = INTERACTIVE_ROLES.has(node.role);
+
+ // Depth filter
+ if (opts.depth !== undefined && depth > opts.depth) continue;
+
+ // Interactive filter: skip non-interactive but still count for locator indices
+ if (opts.interactive && !isInteractive) {
+ // Still track for nth() counts
+ const key = `${node.role}:${node.name || ''}`;
+ roleNameSeen.set(key, (roleNameSeen.get(key) || 0) + 1);
+ continue;
+ }
+
+ // Compact filter: skip elements with no name and no inline content that aren't interactive
+ if (opts.compact && !isInteractive && !node.name && !node.children) continue;
+
+ // Assign ref
+ const ref = `e${refCounter++}`;
+ const indent = ' '.repeat(depth);
+
+ // Build Playwright locator
+ const key = `${node.role}:${node.name || ''}`;
+ const seenIndex = roleNameSeen.get(key) || 0;
+ roleNameSeen.set(key, seenIndex + 1);
+ const totalCount = roleNameCounts.get(key) || 1;
+
+ let locator: Locator;
+ if (opts.selector) {
+ locator = page.locator(opts.selector).getByRole(node.role as any, {
+ name: node.name || undefined,
+ });
+ } else {
+ locator = page.getByRole(node.role as any, {
+ name: node.name || undefined,
+ });
+ }
+
+ // Disambiguate with nth() if multiple elements share role+name
+ if (totalCount > 1) {
+ locator = locator.nth(seenIndex);
+ }
+
+ refMap.set(ref, locator);
+
+ // Format output line
+ let outputLine = `${indent}@${ref} [${node.role}]`;
+ if (node.name) outputLine += ` "${node.name}"`;
+ if (node.props) outputLine += ` ${node.props}`;
+ if (node.children) outputLine += `: ${node.children}`;
+
+ output.push(outputLine);
+ }
+
+ // Store ref map on BrowserManager
+ bm.setRefMap(refMap);
+
+ if (output.length === 0) {
+ return '(no interactive elements found)';
+ }
+
+ return output.join('\n');
+}
A => browse/src/write-commands.ts +179 -0
@@ 1,179 @@
+/**
+ * Write commands — navigate and interact with pages (side effects)
+ *
+ * goto, back, forward, reload, click, fill, select, hover, type,
+ * press, scroll, wait, viewport, cookie, header, useragent
+ */
+
+import type { BrowserManager } from './browser-manager';
+
+export async function handleWriteCommand(
+ command: string,
+ args: string[],
+ bm: BrowserManager
+): Promise<string> {
+ const page = bm.getPage();
+
+ switch (command) {
+ case 'goto': {
+ const url = args[0];
+ if (!url) throw new Error('Usage: browse goto <url>');
+ const response = await page.goto(url, { waitUntil: 'domcontentloaded', timeout: 15000 });
+ const status = response?.status() || 'unknown';
+ return `Navigated to ${url} (${status})`;
+ }
+
+ case 'back': {
+ await page.goBack({ waitUntil: 'domcontentloaded', timeout: 15000 });
+ return `Back → ${page.url()}`;
+ }
+
+ case 'forward': {
+ await page.goForward({ waitUntil: 'domcontentloaded', timeout: 15000 });
+ return `Forward → ${page.url()}`;
+ }
+
+ case 'reload': {
+ await page.reload({ waitUntil: 'domcontentloaded', timeout: 15000 });
+ return `Reloaded ${page.url()}`;
+ }
+
+ case 'click': {
+ const selector = args[0];
+ if (!selector) throw new Error('Usage: browse click <selector>');
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ await resolved.locator.click({ timeout: 5000 });
+ } else {
+ await page.click(resolved.selector, { timeout: 5000 });
+ }
+ // Wait briefly for any navigation/DOM update
+ await page.waitForLoadState('domcontentloaded').catch(() => {});
+ return `Clicked ${selector} → now at ${page.url()}`;
+ }
+
+ case 'fill': {
+ const [selector, ...valueParts] = args;
+ const value = valueParts.join(' ');
+ if (!selector || !value) throw new Error('Usage: browse fill <selector> <value>');
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ await resolved.locator.fill(value, { timeout: 5000 });
+ } else {
+ await page.fill(resolved.selector, value, { timeout: 5000 });
+ }
+ return `Filled ${selector}`;
+ }
+
+ case 'select': {
+ const [selector, ...valueParts] = args;
+ const value = valueParts.join(' ');
+ if (!selector || !value) throw new Error('Usage: browse select <selector> <value>');
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ await resolved.locator.selectOption(value, { timeout: 5000 });
+ } else {
+ await page.selectOption(resolved.selector, value, { timeout: 5000 });
+ }
+ return `Selected "${value}" in ${selector}`;
+ }
+
+ case 'hover': {
+ const selector = args[0];
+ if (!selector) throw new Error('Usage: browse hover <selector>');
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ await resolved.locator.hover({ timeout: 5000 });
+ } else {
+ await page.hover(resolved.selector, { timeout: 5000 });
+ }
+ return `Hovered ${selector}`;
+ }
+
+ case 'type': {
+ const text = args.join(' ');
+ if (!text) throw new Error('Usage: browse type <text>');
+ await page.keyboard.type(text);
+ return `Typed "${text}"`;
+ }
+
+ case 'press': {
+ const key = args[0];
+ if (!key) throw new Error('Usage: browse press <key> (e.g., Enter, Tab, Escape)');
+ await page.keyboard.press(key);
+ return `Pressed ${key}`;
+ }
+
+ case 'scroll': {
+ const selector = args[0];
+ if (selector) {
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ await resolved.locator.scrollIntoViewIfNeeded({ timeout: 5000 });
+ } else {
+ await page.locator(resolved.selector).scrollIntoViewIfNeeded({ timeout: 5000 });
+ }
+ return `Scrolled ${selector} into view`;
+ }
+ await page.evaluate(() => window.scrollTo(0, document.body.scrollHeight));
+ return 'Scrolled to bottom';
+ }
+
+ case 'wait': {
+ const selector = args[0];
+ if (!selector) throw new Error('Usage: browse wait <selector>');
+ const timeout = args[1] ? parseInt(args[1], 10) : 15000;
+ const resolved = bm.resolveRef(selector);
+ if ('locator' in resolved) {
+ await resolved.locator.waitFor({ state: 'visible', timeout });
+ } else {
+ await page.waitForSelector(resolved.selector, { timeout });
+ }
+ return `Element ${selector} appeared`;
+ }
+
+ case 'viewport': {
+ const size = args[0];
+ if (!size || !size.includes('x')) throw new Error('Usage: browse viewport <WxH> (e.g., 375x812)');
+ const [w, h] = size.split('x').map(Number);
+ await bm.setViewport(w, h);
+ return `Viewport set to ${w}x${h}`;
+ }
+
+ case 'cookie': {
+ const cookieStr = args[0];
+ if (!cookieStr || !cookieStr.includes('=')) throw new Error('Usage: browse cookie <name>=<value>');
+ const eq = cookieStr.indexOf('=');
+ const name = cookieStr.slice(0, eq);
+ const value = cookieStr.slice(eq + 1);
+ const url = new URL(page.url());
+ await page.context().addCookies([{
+ name,
+ value,
+ domain: url.hostname,
+ path: '/',
+ }]);
+ return `Cookie set: ${name}=${value}`;
+ }
+
+ case 'header': {
+ const headerStr = args[0];
+ if (!headerStr || !headerStr.includes(':')) throw new Error('Usage: browse header <name>:<value>');
+ const sep = headerStr.indexOf(':');
+ const name = headerStr.slice(0, sep).trim();
+ const value = headerStr.slice(sep + 1).trim();
+ await bm.setExtraHeader(name, value);
+ return `Header set: ${name}: ${value}`;
+ }
+
+ case 'useragent': {
+ const ua = args.join(' ');
+ if (!ua) throw new Error('Usage: browse useragent <string>');
+ bm.setUserAgent(ua);
+ return `User agent set (applies on next restart): ${ua}`;
+ }
+
+ default:
+ throw new Error(`Unknown write command: ${command}`);
+ }
+}
A => browse/test/commands.test.ts +490 -0
@@ 1,490 @@
+/**
+ * Integration tests for all browse commands
+ *
+ * Tests run against a local test server serving fixture HTML files.
+ * A real browse server is started and commands are sent via the CLI HTTP interface.
+ */
+
+import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
+import { startTestServer } from './test-server';
+import { BrowserManager } from '../src/browser-manager';
+import { handleReadCommand } from '../src/read-commands';
+import { handleWriteCommand } from '../src/write-commands';
+import { handleMetaCommand } from '../src/meta-commands';
+import { consoleBuffer, networkBuffer, addConsoleEntry, addNetworkEntry, consoleTotalAdded, networkTotalAdded } from '../src/buffers';
+import * as fs from 'fs';
+import { spawn } from 'child_process';
+import * as path from 'path';
+
+let testServer: ReturnType<typeof startTestServer>;
+let bm: BrowserManager;
+let baseUrl: string;
+
+beforeAll(async () => {
+ testServer = startTestServer(0);
+ baseUrl = testServer.url;
+
+ bm = new BrowserManager();
+ await bm.launch();
+});
+
+afterAll(() => {
+ // Force kill browser instead of graceful close (avoids hang)
+ try { testServer.server.stop(); } catch {}
+ // bm.close() can hang — just let process exit handle it
+ setTimeout(() => process.exit(0), 500);
+});
+
+// ─── Navigation ─────────────────────────────────────────────────
+
+describe('Navigation', () => {
+ test('goto navigates to URL', async () => {
+ const result = await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ expect(result).toContain('Navigated to');
+ expect(result).toContain('200');
+ });
+
+ test('url returns current URL', async () => {
+ const result = await handleMetaCommand('url', [], bm, async () => {});
+ expect(result).toContain('/basic.html');
+ });
+
+ test('back goes back', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/forms.html'], bm);
+ const result = await handleWriteCommand('back', [], bm);
+ expect(result).toContain('Back');
+ });
+
+ test('forward goes forward', async () => {
+ const result = await handleWriteCommand('forward', [], bm);
+ expect(result).toContain('Forward');
+ });
+
+ test('reload reloads page', async () => {
+ const result = await handleWriteCommand('reload', [], bm);
+ expect(result).toContain('Reloaded');
+ });
+});
+
+// ─── Content Extraction ─────────────────────────────────────────
+
+describe('Content extraction', () => {
+ beforeAll(async () => {
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ });
+
+ test('text returns cleaned page text', async () => {
+ const result = await handleReadCommand('text', [], bm);
+ expect(result).toContain('Hello World');
+ expect(result).toContain('Item one');
+ expect(result).not.toContain('<h1>');
+ });
+
+ test('html returns full page HTML', async () => {
+ const result = await handleReadCommand('html', [], bm);
+ expect(result).toContain('<!DOCTYPE html>');
+ expect(result).toContain('<h1 id="title">Hello World</h1>');
+ });
+
+ test('html with selector returns element innerHTML', async () => {
+ const result = await handleReadCommand('html', ['#content'], bm);
+ expect(result).toContain('Some body text here.');
+ expect(result).toContain('<li>Item one</li>');
+ });
+
+ test('links returns all links', async () => {
+ const result = await handleReadCommand('links', [], bm);
+ expect(result).toContain('Page 1');
+ expect(result).toContain('Page 2');
+ expect(result).toContain('External');
+ expect(result).toContain('→');
+ });
+
+ test('forms discovers form fields', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/forms.html'], bm);
+ const result = await handleReadCommand('forms', [], bm);
+ const forms = JSON.parse(result);
+ expect(forms.length).toBe(2);
+ expect(forms[0].id).toBe('login-form');
+ expect(forms[0].method).toBe('post');
+ expect(forms[0].fields.length).toBeGreaterThanOrEqual(2);
+ expect(forms[1].id).toBe('profile-form');
+
+ // Check field discovery
+ const emailField = forms[0].fields.find((f: any) => f.name === 'email');
+ expect(emailField).toBeDefined();
+ expect(emailField.type).toBe('email');
+ expect(emailField.required).toBe(true);
+ });
+
+ test('accessibility returns ARIA tree', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ const result = await handleReadCommand('accessibility', [], bm);
+ expect(result).toContain('Hello World');
+ });
+});
+
+// ─── JavaScript / CSS / Attrs ───────────────────────────────────
+
+describe('Inspection', () => {
+ beforeAll(async () => {
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ });
+
+ test('js evaluates expression', async () => {
+ const result = await handleReadCommand('js', ['document.title'], bm);
+ expect(result).toBe('Test Page - Basic');
+ });
+
+ test('js returns objects as JSON', async () => {
+ const result = await handleReadCommand('js', ['({a: 1, b: 2})'], bm);
+ const obj = JSON.parse(result);
+ expect(obj.a).toBe(1);
+ expect(obj.b).toBe(2);
+ });
+
+ test('css returns computed property', async () => {
+ const result = await handleReadCommand('css', ['h1', 'color'], bm);
+ // Navy color
+ expect(result).toContain('0, 0, 128');
+ });
+
+ test('css returns font-family', async () => {
+ const result = await handleReadCommand('css', ['body', 'font-family'], bm);
+ expect(result).toContain('Helvetica');
+ });
+
+ test('attrs returns element attributes', async () => {
+ const result = await handleReadCommand('attrs', ['#content'], bm);
+ const attrs = JSON.parse(result);
+ expect(attrs.id).toBe('content');
+ expect(attrs['data-testid']).toBe('main-content');
+ expect(attrs['data-version']).toBe('1.0');
+ });
+});
+
+// ─── Interaction ────────────────────────────────────────────────
+
+describe('Interaction', () => {
+ test('fill + click works on form', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/forms.html'], bm);
+
+ let result = await handleWriteCommand('fill', ['#email', 'test@example.com'], bm);
+ expect(result).toContain('Filled');
+
+ result = await handleWriteCommand('fill', ['#password', 'secret123'], bm);
+ expect(result).toContain('Filled');
+
+ // Verify values were set
+ const emailVal = await handleReadCommand('js', ['document.querySelector("#email").value'], bm);
+ expect(emailVal).toBe('test@example.com');
+
+ result = await handleWriteCommand('click', ['#login-btn'], bm);
+ expect(result).toContain('Clicked');
+ });
+
+ test('select works on dropdown', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/forms.html'], bm);
+ const result = await handleWriteCommand('select', ['#role', 'admin'], bm);
+ expect(result).toContain('Selected');
+
+ const val = await handleReadCommand('js', ['document.querySelector("#role").value'], bm);
+ expect(val).toBe('admin');
+ });
+
+ test('hover works', async () => {
+ const result = await handleWriteCommand('hover', ['h1'], bm);
+ expect(result).toContain('Hovered');
+ });
+
+ test('wait finds existing element', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ const result = await handleWriteCommand('wait', ['#title'], bm);
+ expect(result).toContain('appeared');
+ });
+
+ test('scroll works', async () => {
+ const result = await handleWriteCommand('scroll', ['footer'], bm);
+ expect(result).toContain('Scrolled');
+ });
+
+ test('viewport changes size', async () => {
+ const result = await handleWriteCommand('viewport', ['375x812'], bm);
+ expect(result).toContain('Viewport set');
+
+ const size = await handleReadCommand('js', ['`${window.innerWidth}x${window.innerHeight}`'], bm);
+ expect(size).toBe('375x812');
+
+ // Reset
+ await handleWriteCommand('viewport', ['1280x720'], bm);
+ });
+
+ test('type and press work', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/forms.html'], bm);
+ await handleWriteCommand('click', ['#name'], bm);
+
+ const result = await handleWriteCommand('type', ['John Doe'], bm);
+ expect(result).toContain('Typed');
+
+ const val = await handleReadCommand('js', ['document.querySelector("#name").value'], bm);
+ expect(val).toBe('John Doe');
+ });
+});
+
+// ─── SPA / Console / Network ───────────────────────────────────
+
+describe('SPA and buffers', () => {
+ test('wait handles delayed rendering', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/spa.html'], bm);
+ const result = await handleWriteCommand('wait', ['.loaded'], bm);
+ expect(result).toContain('appeared');
+
+ const text = await handleReadCommand('text', [], bm);
+ expect(text).toContain('SPA Content Loaded');
+ });
+
+ test('console captures messages', async () => {
+ const result = await handleReadCommand('console', [], bm);
+ expect(result).toContain('[SPA] Starting render');
+ expect(result).toContain('[SPA] Render complete');
+ });
+
+ test('console --clear clears buffer', async () => {
+ const result = await handleReadCommand('console', ['--clear'], bm);
+ expect(result).toContain('cleared');
+
+ const after = await handleReadCommand('console', [], bm);
+ expect(after).toContain('no console messages');
+ });
+
+ test('network captures requests', async () => {
+ const result = await handleReadCommand('network', [], bm);
+ expect(result).toContain('GET');
+ expect(result).toContain('/spa.html');
+ });
+
+ test('network --clear clears buffer', async () => {
+ const result = await handleReadCommand('network', ['--clear'], bm);
+ expect(result).toContain('cleared');
+ });
+});
+
+// ─── Cookies / Storage ──────────────────────────────────────────
+
+describe('Cookies and storage', () => {
+ test('cookies returns array', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ const result = await handleReadCommand('cookies', [], bm);
+ // Test server doesn't set cookies, so empty array
+ expect(result).toBe('[]');
+ });
+
+ test('storage set and get works', async () => {
+ await handleReadCommand('storage', ['set', 'testKey', 'testValue'], bm);
+ const result = await handleReadCommand('storage', [], bm);
+ const storage = JSON.parse(result);
+ expect(storage.localStorage.testKey).toBe('testValue');
+ });
+});
+
+// ─── Performance ────────────────────────────────────────────────
+
+describe('Performance', () => {
+ test('perf returns timing data', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ const result = await handleReadCommand('perf', [], bm);
+ expect(result).toContain('dns');
+ expect(result).toContain('ttfb');
+ expect(result).toContain('load');
+ expect(result).toContain('ms');
+ });
+});
+
+// ─── Visual ─────────────────────────────────────────────────────
+
+describe('Visual', () => {
+ test('screenshot saves file', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ const screenshotPath = '/tmp/browse-test-screenshot.png';
+ const result = await handleMetaCommand('screenshot', [screenshotPath], bm, async () => {});
+ expect(result).toContain('Screenshot saved');
+ expect(fs.existsSync(screenshotPath)).toBe(true);
+ const stat = fs.statSync(screenshotPath);
+ expect(stat.size).toBeGreaterThan(1000);
+ fs.unlinkSync(screenshotPath);
+ });
+
+ test('responsive saves 3 screenshots', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/responsive.html'], bm);
+ const prefix = '/tmp/browse-test-resp';
+ const result = await handleMetaCommand('responsive', [prefix], bm, async () => {});
+ expect(result).toContain('mobile');
+ expect(result).toContain('tablet');
+ expect(result).toContain('desktop');
+
+ expect(fs.existsSync(`${prefix}-mobile.png`)).toBe(true);
+ expect(fs.existsSync(`${prefix}-tablet.png`)).toBe(true);
+ expect(fs.existsSync(`${prefix}-desktop.png`)).toBe(true);
+
+ // Cleanup
+ fs.unlinkSync(`${prefix}-mobile.png`);
+ fs.unlinkSync(`${prefix}-tablet.png`);
+ fs.unlinkSync(`${prefix}-desktop.png`);
+ });
+});
+
+// ─── Tabs ───────────────────────────────────────────────────────
+
+describe('Tabs', () => {
+ test('tabs lists all tabs', async () => {
+ const result = await handleMetaCommand('tabs', [], bm, async () => {});
+ expect(result).toContain('[');
+ expect(result).toContain(']');
+ });
+
+ test('newtab opens new tab', async () => {
+ const result = await handleMetaCommand('newtab', [baseUrl + '/forms.html'], bm, async () => {});
+ expect(result).toContain('Opened tab');
+
+ const tabCount = bm.getTabCount();
+ expect(tabCount).toBeGreaterThanOrEqual(2);
+ });
+
+ test('tab switches to specific tab', async () => {
+ const result = await handleMetaCommand('tab', ['1'], bm, async () => {});
+ expect(result).toContain('Switched to tab 1');
+ });
+
+ test('closetab closes a tab', async () => {
+ const before = bm.getTabCount();
+ // Close the last opened tab
+ const tabs = await bm.getTabListWithTitles();
+ const lastTab = tabs[tabs.length - 1];
+ const result = await handleMetaCommand('closetab', [String(lastTab.id)], bm, async () => {});
+ expect(result).toContain('Closed tab');
+ expect(bm.getTabCount()).toBe(before - 1);
+ });
+});
+
+// ─── Diff ───────────────────────────────────────────────────────
+
+describe('Diff', () => {
+ test('diff shows differences between pages', async () => {
+ const result = await handleMetaCommand(
+ 'diff',
+ [baseUrl + '/basic.html', baseUrl + '/forms.html'],
+ bm,
+ async () => {}
+ );
+ expect(result).toContain('---');
+ expect(result).toContain('+++');
+ // basic.html has "Hello World", forms.html has "Form Test Page"
+ expect(result).toContain('Hello World');
+ expect(result).toContain('Form Test Page');
+ });
+});
+
+// ─── Chain ──────────────────────────────────────────────────────
+
+describe('Chain', () => {
+ test('chain executes sequence of commands', async () => {
+ const commands = JSON.stringify([
+ ['goto', baseUrl + '/basic.html'],
+ ['js', 'document.title'],
+ ['css', 'h1', 'color'],
+ ]);
+ const result = await handleMetaCommand('chain', [commands], bm, async () => {});
+ expect(result).toContain('[goto]');
+ expect(result).toContain('Test Page - Basic');
+ expect(result).toContain('[css]');
+ });
+
+ test('chain reports real error when write command fails', async () => {
+ const commands = JSON.stringify([
+ ['goto', 'http://localhost:1/unreachable'],
+ ]);
+ const result = await handleMetaCommand('chain', [commands], bm, async () => {});
+ expect(result).toContain('[goto] ERROR:');
+ expect(result).not.toContain('Unknown meta command');
+ expect(result).not.toContain('Unknown read command');
+ });
+});
+
+// ─── Status ─────────────────────────────────────────────────────
+
+describe('Status', () => {
+ test('status reports health', async () => {
+ const result = await handleMetaCommand('status', [], bm, async () => {});
+ expect(result).toContain('Status: healthy');
+ expect(result).toContain('Tabs:');
+ });
+});
+
+// ─── CLI retry guard ────────────────────────────────────────────
+
+describe('CLI retry guard', () => {
+ test('sendCommand aborts after repeated connection failures', async () => {
+ // Write a fake state file pointing to a port that refuses connections
+ const stateFile = '/tmp/browse-server.json';
+ const origState = fs.existsSync(stateFile) ? fs.readFileSync(stateFile, 'utf-8') : null;
+
+ fs.writeFileSync(stateFile, JSON.stringify({ port: 1, token: 'fake', pid: 999999 }));
+
+ const cliPath = path.resolve(__dirname, '../src/cli.ts');
+ const result = await new Promise<{ code: number; stderr: string }>((resolve) => {
+ const proc = spawn('bun', ['run', cliPath, 'status'], {
+ timeout: 15000,
+ env: { ...process.env },
+ });
+ let stderr = '';
+ proc.stderr.on('data', (d) => stderr += d.toString());
+ proc.on('close', (code) => resolve({ code: code ?? 1, stderr }));
+ });
+
+ // Restore original state file
+ if (origState) fs.writeFileSync(stateFile, origState);
+ else if (fs.existsSync(stateFile)) fs.unlinkSync(stateFile);
+
+ // Should fail, not loop forever
+ expect(result.code).not.toBe(0);
+ }, 20000);
+});
+
+// ─── Buffer bounds ──────────────────────────────────────────────
+
+describe('Buffer bounds', () => {
+ test('console buffer caps at 50000 entries', () => {
+ consoleBuffer.length = 0;
+ for (let i = 0; i < 50_010; i++) {
+ addConsoleEntry({ timestamp: i, level: 'log', text: `msg-${i}` });
+ }
+ expect(consoleBuffer.length).toBe(50_000);
+ expect(consoleBuffer[0].text).toBe('msg-10');
+ expect(consoleBuffer[consoleBuffer.length - 1].text).toBe('msg-50009');
+ consoleBuffer.length = 0;
+ });
+
+ test('network buffer caps at 50000 entries', () => {
+ networkBuffer.length = 0;
+ for (let i = 0; i < 50_010; i++) {
+ addNetworkEntry({ timestamp: i, method: 'GET', url: `http://x/${i}` });
+ }
+ expect(networkBuffer.length).toBe(50_000);
+ expect(networkBuffer[0].url).toBe('http://x/10');
+ expect(networkBuffer[networkBuffer.length - 1].url).toBe('http://x/50009');
+ networkBuffer.length = 0;
+ });
+
+ test('totalAdded counters keep incrementing past buffer cap', () => {
+ const startConsole = consoleTotalAdded;
+ const startNetwork = networkTotalAdded;
+ for (let i = 0; i < 100; i++) {
+ addConsoleEntry({ timestamp: i, level: 'log', text: `t-${i}` });
+ addNetworkEntry({ timestamp: i, method: 'GET', url: `http://t/${i}` });
+ }
+ expect(consoleTotalAdded).toBe(startConsole + 100);
+ expect(networkTotalAdded).toBe(startNetwork + 100);
+ consoleBuffer.length = 0;
+ networkBuffer.length = 0;
+ });
+});
A => browse/test/fixtures/basic.html +33 -0
@@ 1,33 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="utf-8">
+ <title>Test Page - Basic</title>
+ <style>
+ body { font-family: "Helvetica Neue", sans-serif; color: #333; margin: 20px; }
+ h1 { color: navy; font-size: 24px; }
+ .highlight { background: yellow; padding: 4px; }
+ .hidden { display: none; }
+ nav a { margin-right: 10px; color: blue; }
+ </style>
+</head>
+<body>
+ <nav>
+ <a href="/page1">Page 1</a>
+ <a href="/page2">Page 2</a>
+ <a href="https://external.com/link">External</a>
+ </nav>
+ <h1 id="title">Hello World</h1>
+ <p class="highlight">This is a highlighted paragraph.</p>
+ <p class="hidden">This should be hidden.</p>
+ <div id="content" data-testid="main-content" data-version="1.0">
+ <p>Some body text here.</p>
+ <ul>
+ <li>Item one</li>
+ <li>Item two</li>
+ <li>Item three</li>
+ </ul>
+ </div>
+ <footer>Footer text</footer>
+</body>
+</html>
A => browse/test/fixtures/forms.html +55 -0
@@ 1,55 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="utf-8">
+ <title>Test Page - Forms</title>
+ <style>
+ body { font-family: sans-serif; padding: 20px; }
+ form { margin-bottom: 20px; padding: 10px; border: 1px solid #ccc; }
+ label { display: block; margin: 5px 0; }
+ input, select, textarea { margin-bottom: 10px; padding: 5px; }
+ #result { color: green; display: none; }
+ </style>
+</head>
+<body>
+ <h1>Form Test Page</h1>
+
+ <form id="login-form" action="/login" method="post">
+ <label for="email">Email:</label>
+ <input type="email" id="email" name="email" placeholder="your@email.com" required>
+ <label for="password">Password:</label>
+ <input type="password" id="password" name="password" required>
+ <button type="submit" id="login-btn">Log In</button>
+ </form>
+
+ <form id="profile-form" action="/profile" method="post">
+ <label for="name">Name:</label>
+ <input type="text" id="name" name="name" placeholder="Your name">
+ <label for="bio">Bio:</label>
+ <textarea id="bio" name="bio" placeholder="Tell us about yourself"></textarea>
+ <label for="role">Role:</label>
+ <select id="role" name="role">
+ <option value="">Choose...</option>
+ <option value="admin">Admin</option>
+ <option value="user">User</option>
+ <option value="guest">Guest</option>
+ </select>
+ <label>
+ <input type="checkbox" id="newsletter" name="newsletter"> Subscribe to newsletter
+ </label>
+ <button type="submit" id="profile-btn">Save Profile</button>
+ </form>
+
+ <div id="result">Form submitted!</div>
+
+ <script>
+ document.querySelectorAll('form').forEach(form => {
+ form.addEventListener('submit', (e) => {
+ e.preventDefault();
+ document.getElementById('result').style.display = 'block';
+ console.log('[Form] Submitted:', form.id);
+ });
+ });
+ </script>
+</body>
+</html>
A => browse/test/fixtures/responsive.html +49 -0
@@ 1,49 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="utf-8">
+ <meta name="viewport" content="width=device-width, initial-scale=1">
+ <title>Test Page - Responsive</title>
+ <style>
+ body { font-family: sans-serif; margin: 0; padding: 20px; }
+ .container { max-width: 1200px; margin: 0 auto; }
+ .grid { display: grid; gap: 16px; }
+ .card { padding: 16px; border: 1px solid #ddd; border-radius: 8px; }
+
+ /* Mobile: single column */
+ .grid { grid-template-columns: 1fr; }
+
+ /* Tablet: two columns */
+ @media (min-width: 768px) {
+ .grid { grid-template-columns: 1fr 1fr; }
+ .mobile-only { display: none; }
+ }
+
+ /* Desktop: three columns */
+ @media (min-width: 1024px) {
+ .grid { grid-template-columns: 1fr 1fr 1fr; }
+ }
+
+ .mobile-only { color: red; }
+ .desktop-indicator { display: none; }
+ @media (min-width: 1024px) {
+ .desktop-indicator { display: block; color: green; }
+ }
+ </style>
+</head>
+<body>
+ <div class="container">
+ <h1>Responsive Layout Test</h1>
+ <p class="mobile-only">You are on mobile</p>
+ <p class="desktop-indicator">You are on desktop</p>
+ <div class="grid">
+ <div class="card">Card 1</div>
+ <div class="card">Card 2</div>
+ <div class="card">Card 3</div>
+ <div class="card">Card 4</div>
+ <div class="card">Card 5</div>
+ <div class="card">Card 6</div>
+ </div>
+ </div>
+</body>
+</html>
A => browse/test/fixtures/snapshot.html +55 -0
@@ 1,55 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="utf-8">
+ <title>Snapshot Test Page</title>
+ <style>
+ body { font-family: sans-serif; padding: 20px; }
+ form { margin: 10px 0; }
+ input, select, button { margin: 5px; padding: 5px; }
+ #main { border: 1px solid #ccc; padding: 10px; }
+ .empty-div { }
+ .hidden { display: none; }
+ </style>
+</head>
+<body>
+ <h1>Snapshot Test</h1>
+ <h2>Subheading</h2>
+
+ <nav>
+ <a href="/page1">Internal Link</a>
+ <a href="https://external.com">External Link</a>
+ </nav>
+
+ <div id="main">
+ <h3>Form Section</h3>
+ <form id="test-form">
+ <input type="text" id="username" placeholder="Username" aria-label="Username">
+ <input type="email" id="email" placeholder="Email" aria-label="Email">
+ <input type="password" id="pass" placeholder="Password" aria-label="Password">
+ <label><input type="checkbox" id="agree"> I agree</label>
+ <select id="role" aria-label="Role">
+ <option value="">Choose...</option>
+ <option value="admin">Admin</option>
+ <option value="user">User</option>
+ </select>
+ <button type="submit" id="submit-btn">Submit</button>
+ <button type="button" id="cancel-btn">Cancel</button>
+ </form>
+ </div>
+
+ <div class="empty-div">
+ <div class="empty-div">
+ <button id="nested-btn">Nested Button</button>
+ </div>
+ </div>
+
+ <p>Some paragraph text that is not interactive.</p>
+
+ <script>
+ document.getElementById('test-form').addEventListener('submit', (e) => {
+ e.preventDefault();
+ });
+ </script>
+</body>
+</html>
A => browse/test/fixtures/spa.html +24 -0
@@ 1,24 @@
+<!DOCTYPE html>
+<html lang="en">
+<head>
+ <meta charset="utf-8">
+ <title>Test Page - SPA</title>
+ <style>
+ body { font-family: sans-serif; }
+ #app { padding: 20px; }
+ .loaded { color: green; }
+ </style>
+</head>
+<body>
+ <div id="app">Loading...</div>
+ <script>
+ console.log('[SPA] Starting render');
+ console.warn('[SPA] This is a warning');
+ console.error('[SPA] This is an error');
+ setTimeout(() => {
+ document.getElementById('app').innerHTML = '<h1 class="loaded">SPA Content Loaded</h1><p>Rendered by JavaScript</p>';
+ console.log('[SPA] Render complete');
+ }, 500);
+ </script>
+</body>
+</html>
A => browse/test/snapshot.test.ts +201 -0
@@ 1,201 @@
+/**
+ * Snapshot command tests
+ *
+ * Tests: accessibility tree snapshots, ref-based element selection,
+ * ref invalidation on navigation, and ref resolution in commands.
+ */
+
+import { describe, test, expect, beforeAll, afterAll } from 'bun:test';
+import { startTestServer } from './test-server';
+import { BrowserManager } from '../src/browser-manager';
+import { handleReadCommand } from '../src/read-commands';
+import { handleWriteCommand } from '../src/write-commands';
+import { handleMetaCommand } from '../src/meta-commands';
+
+let testServer: ReturnType<typeof startTestServer>;
+let bm: BrowserManager;
+let baseUrl: string;
+const shutdown = async () => {};
+
+beforeAll(async () => {
+ testServer = startTestServer(0);
+ baseUrl = testServer.url;
+
+ bm = new BrowserManager();
+ await bm.launch();
+});
+
+afterAll(() => {
+ try { testServer.server.stop(); } catch {}
+ setTimeout(() => process.exit(0), 500);
+});
+
+// ─── Snapshot Output ────────────────────────────────────────────
+
+describe('Snapshot', () => {
+ test('snapshot returns accessibility tree with refs', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const result = await handleMetaCommand('snapshot', [], bm, shutdown);
+ expect(result).toContain('@e');
+ expect(result).toContain('[heading]');
+ expect(result).toContain('"Snapshot Test"');
+ expect(result).toContain('[button]');
+ expect(result).toContain('[link]');
+ });
+
+ test('snapshot -i returns only interactive elements', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const result = await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ expect(result).toContain('[button]');
+ expect(result).toContain('[link]');
+ expect(result).toContain('[textbox]');
+ // Should NOT contain non-interactive roles like heading or paragraph
+ expect(result).not.toContain('[heading]');
+ });
+
+ test('snapshot -c returns compact output', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const full = await handleMetaCommand('snapshot', [], bm, shutdown);
+ const compact = await handleMetaCommand('snapshot', ['-c'], bm, shutdown);
+ // Compact should have fewer lines (empty structural elements removed)
+ const fullLines = full.split('\n').length;
+ const compactLines = compact.split('\n').length;
+ expect(compactLines).toBeLessThanOrEqual(fullLines);
+ });
+
+ test('snapshot -d 2 limits depth', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const shallow = await handleMetaCommand('snapshot', ['-d', '2'], bm, shutdown);
+ const deep = await handleMetaCommand('snapshot', [], bm, shutdown);
+ // Shallow should have fewer or equal lines
+ expect(shallow.split('\n').length).toBeLessThanOrEqual(deep.split('\n').length);
+ });
+
+ test('snapshot -s "#main" scopes to selector', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const scoped = await handleMetaCommand('snapshot', ['-s', '#main'], bm, shutdown);
+ // Should contain elements inside #main
+ expect(scoped).toContain('[button]');
+ expect(scoped).toContain('"Submit"');
+ // Should NOT contain elements outside #main (like nav links)
+ expect(scoped).not.toContain('"Internal Link"');
+ });
+
+ test('snapshot on page with no interactive elements', async () => {
+ // Navigate to about:blank which has minimal content
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ const result = await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ // basic.html has links, so this should find those
+ expect(result).toContain('[link]');
+ });
+
+ test('second snapshot generates fresh refs', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const snap1 = await handleMetaCommand('snapshot', [], bm, shutdown);
+ const snap2 = await handleMetaCommand('snapshot', [], bm, shutdown);
+ // Both should have @e1 (refs restart from 1)
+ expect(snap1).toContain('@e1');
+ expect(snap2).toContain('@e1');
+ });
+});
+
+// ─── Ref-Based Interaction ──────────────────────────────────────
+
+describe('Ref resolution', () => {
+ test('click @ref works after snapshot', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const snap = await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ // Find a button ref
+ const buttonLine = snap.split('\n').find(l => l.includes('[button]') && l.includes('"Submit"'));
+ expect(buttonLine).toBeDefined();
+ const refMatch = buttonLine!.match(/@(e\d+)/);
+ expect(refMatch).toBeDefined();
+ const ref = `@${refMatch![1]}`;
+ const result = await handleWriteCommand('click', [ref], bm);
+ expect(result).toContain('Clicked');
+ });
+
+ test('fill @ref works after snapshot', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const snap = await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ // Find a textbox ref (Username)
+ const textboxLine = snap.split('\n').find(l => l.includes('[textbox]') && l.includes('"Username"'));
+ expect(textboxLine).toBeDefined();
+ const refMatch = textboxLine!.match(/@(e\d+)/);
+ expect(refMatch).toBeDefined();
+ const ref = `@${refMatch![1]}`;
+ const result = await handleWriteCommand('fill', [ref, 'testuser'], bm);
+ expect(result).toContain('Filled');
+ });
+
+ test('hover @ref works after snapshot', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const snap = await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ const linkLine = snap.split('\n').find(l => l.includes('[link]'));
+ expect(linkLine).toBeDefined();
+ const refMatch = linkLine!.match(/@(e\d+)/);
+ const ref = `@${refMatch![1]}`;
+ const result = await handleWriteCommand('hover', [ref], bm);
+ expect(result).toContain('Hovered');
+ });
+
+ test('html @ref returns innerHTML', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const snap = await handleMetaCommand('snapshot', [], bm, shutdown);
+ // Find a heading ref
+ const headingLine = snap.split('\n').find(l => l.includes('[heading]') && l.includes('"Snapshot Test"'));
+ expect(headingLine).toBeDefined();
+ const refMatch = headingLine!.match(/@(e\d+)/);
+ const ref = `@${refMatch![1]}`;
+ const result = await handleReadCommand('html', [ref], bm);
+ expect(result).toContain('Snapshot Test');
+ });
+
+ test('css @ref returns computed CSS', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const snap = await handleMetaCommand('snapshot', [], bm, shutdown);
+ const headingLine = snap.split('\n').find(l => l.includes('[heading]') && l.includes('"Snapshot Test"'));
+ const refMatch = headingLine!.match(/@(e\d+)/);
+ const ref = `@${refMatch![1]}`;
+ const result = await handleReadCommand('css', [ref, 'font-family'], bm);
+ expect(result).toBeTruthy();
+ });
+
+ test('attrs @ref returns element attributes', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ const snap = await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ const textboxLine = snap.split('\n').find(l => l.includes('[textbox]') && l.includes('"Username"'));
+ const refMatch = textboxLine!.match(/@(e\d+)/);
+ const ref = `@${refMatch![1]}`;
+ const result = await handleReadCommand('attrs', [ref], bm);
+ expect(result).toContain('id');
+ });
+});
+
+// ─── Ref Invalidation ───────────────────────────────────────────
+
+describe('Ref invalidation', () => {
+ test('stale ref after goto returns clear error', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ // Navigate away — should invalidate refs
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ // Try to use old ref
+ try {
+ await handleWriteCommand('click', ['@e1'], bm);
+ expect(true).toBe(false); // Should not reach here
+ } catch (err: any) {
+ expect(err.message).toContain('not found');
+ expect(err.message).toContain('snapshot');
+ }
+ });
+
+ test('refs cleared on page navigation', async () => {
+ await handleWriteCommand('goto', [baseUrl + '/snapshot.html'], bm);
+ await handleMetaCommand('snapshot', ['-i'], bm, shutdown);
+ expect(bm.getRefCount()).toBeGreaterThan(0);
+ // Navigate
+ await handleWriteCommand('goto', [baseUrl + '/basic.html'], bm);
+ expect(bm.getRefCount()).toBe(0);
+ });
+});
A => browse/test/test-server.ts +47 -0
@@ 1,47 @@
+/**
+ * Tiny Bun.serve for test fixtures
+ * Serves HTML files from test/fixtures/ on a random available port
+ */
+
+import * as path from 'path';
+import * as fs from 'fs';
+
+const FIXTURES_DIR = path.resolve(import.meta.dir, 'fixtures');
+
+export function startTestServer(port: number = 0): { server: ReturnType<typeof Bun.serve>; url: string } {
+ const server = Bun.serve({
+ port,
+ hostname: '127.0.0.1',
+ fetch(req) {
+ const url = new URL(req.url);
+ let filePath = url.pathname === '/' ? '/basic.html' : url.pathname;
+
+ // Remove leading slash
+ filePath = filePath.replace(/^\//, '');
+ const fullPath = path.join(FIXTURES_DIR, filePath);
+
+ if (!fs.existsSync(fullPath)) {
+ return new Response('Not Found', { status: 404 });
+ }
+
+ const content = fs.readFileSync(fullPath, 'utf-8');
+ const ext = path.extname(fullPath);
+ const contentType = ext === '.html' ? 'text/html' : 'text/plain';
+
+ return new Response(content, {
+ headers: { 'Content-Type': contentType },
+ });
+ },
+ });
+
+ const url = `http://127.0.0.1:${server.port}`;
+ return { server, url };
+}
+
+// If run directly, start and print URL
+if (import.meta.main) {
+ const { server, url } = startTestServer(9450);
+ console.log(`Test server running at ${url}`);
+ console.log(`Fixtures: ${FIXTURES_DIR}`);
+ console.log('Press Ctrl+C to stop');
+}
A => package.json +34 -0
@@ 1,34 @@
+{
+ "name": "gstack",
+ "version": "0.0.1",
+ "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
+ "license": "MIT",
+ "type": "module",
+ "bin": {
+ "browse": "./browse/dist/browse"
+ },
+ "scripts": {
+ "build": "bun build --compile browse/src/cli.ts --outfile browse/dist/browse",
+ "dev": "bun run browse/src/cli.ts",
+ "server": "bun run browse/src/server.ts",
+ "test": "bun test",
+ "start": "bun run browse/src/server.ts"
+ },
+ "dependencies": {
+ "playwright": "^1.58.2",
+ "diff": "^7.0.0"
+ },
+ "engines": {
+ "bun": ">=1.0.0"
+ },
+ "keywords": [
+ "browser",
+ "automation",
+ "playwright",
+ "headless",
+ "cli",
+ "claude",
+ "ai-agent",
+ "devtools"
+ ]
+}
A => plan-ceo-review/SKILL.md +484 -0
@@ 1,484 @@
+---
+name: plan-ceo-review
+version: 1.0.0
+description: |
+ CEO/founder-mode plan review. Rethink the problem, find the 10-star product,
+ challenge premises, expand scope when it creates a better product. Three modes:
+ SCOPE EXPANSION (dream big), HOLD SCOPE (maximum rigor), SCOPE REDUCTION
+ (strip to essentials).
+allowed-tools:
+ - Read
+ - Grep
+ - Glob
+ - Bash
+ - AskUserQuestion
+---
+
+# Mega Plan Review Mode
+
+## Philosophy
+You are not here to rubber-stamp this plan. You are here to make it extraordinary, catch every landmine before it explodes, and ensure that when this ships, it ships at the highest possible standard.
+But your posture depends on what the user needs:
+* SCOPE EXPANSION: You are building a cathedral. Envision the platonic ideal. Push scope UP. Ask "what would make this 10x better for 2x the effort?" The answer to "should we also build X?" is "yes, if it serves the vision." You have permission to dream.
+* HOLD SCOPE: You are a rigorous reviewer. The plan's scope is accepted. Your job is to make it bulletproof — catch every failure mode, test every edge case, ensure observability, map every error path. Do not silently reduce OR expand.
+* SCOPE REDUCTION: You are a surgeon. Find the minimum viable version that achieves the core outcome. Cut everything else. Be ruthless.
+Critical rule: Once the user selects a mode, COMMIT to it. Do not silently drift toward a different mode. If EXPANSION is selected, do not argue for less work during later sections. If REDUCTION is selected, do not sneak scope back in. Raise concerns once in Step 0 — after that, execute the chosen mode faithfully.
+Do NOT make any code changes. Do NOT start implementation. Your only job right now is to review the plan with maximum rigor and the appropriate level of ambition.
+
+## Prime Directives
+1. Zero silent failures. Every failure mode must be visible — to the system, to the team, to the user. If a failure can happen silently, that is a critical defect in the plan.
+2. Every error has a name. Don't say "handle errors." Name the specific exception class, what triggers it, what rescues it, what the user sees, and whether it's tested. rescue StandardError is a code smell — call it out.
+3. Data flows have shadow paths. Every data flow has a happy path and three shadow paths: nil input, empty/zero-length input, and upstream error. Trace all four for every new flow.
+4. Interactions have edge cases. Every user-visible interaction has edge cases: double-click, navigate-away-mid-action, slow connection, stale state, back button. Map them.
+5. Observability is scope, not afterthought. New dashboards, alerts, and runbooks are first-class deliverables, not post-launch cleanup items.
+6. Diagrams are mandatory. No non-trivial flow goes undiagrammed. ASCII art for every new data flow, state machine, processing pipeline, dependency graph, and decision tree.
+7. Everything deferred must be written down. Vague intentions are lies. TODOS.md or it doesn't exist.
+8. Optimize for the 6-month future, not just today. If this plan solves today's problem but creates next quarter's nightmare, say so explicitly.
+9. You have permission to say "scrap it and do this instead." If there's a fundamentally better approach, table it. I'd rather hear it now.
+
+## Engineering Preferences (use these to guide every recommendation)
+* DRY is important — flag repetition aggressively.
+* Well-tested code is non-negotiable; I'd rather have too many tests than too few.
+* I want code that's "engineered enough" — not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity).
+* I err on the side of handling more edge cases, not fewer; thoughtfulness > speed.
+* Bias toward explicit over clever.
+* Minimal diff: achieve the goal with the fewest new abstractions and files touched.
+* Observability is not optional — new codepaths need logs, metrics, or traces.
+* Security is not optional — new codepaths need threat modeling.
+* Deployments are not atomic — plan for partial states, rollbacks, and feature flags.
+* ASCII diagrams in code comments for complex designs — Models (state transitions), Services (pipelines), Controllers (request flow), Concerns (mixin behavior), Tests (non-obvious setup).
+* Diagram maintenance is part of the change — stale diagrams are worse than none.
+
+## Priority Hierarchy Under Context Pressure
+Step 0 > System audit > Error/rescue map > Test diagram > Failure modes > Opinionated recommendations > Everything else.
+Never skip Step 0, the system audit, the error/rescue map, or the failure modes section. These are the highest-leverage outputs.
+
+## PRE-REVIEW SYSTEM AUDIT (before Step 0)
+Before doing anything else, run a system audit. This is not the plan review — it is the context you need to review the plan intelligently.
+Run the following commands:
+```
+git log --oneline -30 # Recent history
+git diff main --stat # What's already changed
+git stash list # Any stashed work
+grep -r "TODO\|FIXME\|HACK\|XXX" --include="*.rb" --include="*.js" -l
+find . -name "*.rb" -newer Gemfile.lock | head -20 # Recently touched files
+```
+Then read CLAUDE.md, TODOS.md, and any existing architecture docs. Map:
+* What is the current system state?
+* What is already in flight (other open PRs, branches, stashed changes)?
+* What are the existing known pain points most relevant to this plan?
+* Are there any FIXME/TODO comments in files this plan touches?
+
+### Retrospective Check
+Check the git log for this branch. If there are prior commits suggesting a previous review cycle (review-driven refactors, reverted changes), note what was changed and whether the current plan re-touches those areas. Be MORE aggressive reviewing areas that were previously problematic. Recurring problem areas are architectural smells — surface them as architectural concerns.
+
+### Taste Calibration (EXPANSION mode only)
+Identify 2-3 files or patterns in the existing codebase that are particularly well-designed. Note them as style references for the review. Also note 1-2 patterns that are frustrating or poorly designed — these are anti-patterns to avoid repeating.
+Report findings before proceeding to Step 0.
+
+## Step 0: Nuclear Scope Challenge + Mode Selection
+
+### 0A. Premise Challenge
+1. Is this the right problem to solve? Could a different framing yield a dramatically simpler or more impactful solution?
+2. What is the actual user/business outcome? Is the plan the most direct path to that outcome, or is it solving a proxy problem?
+3. What would happen if we did nothing? Real pain point or hypothetical one?
+
+### 0B. Existing Code Leverage
+1. What existing code already partially or fully solves each sub-problem? Map every sub-problem to existing code. Can we capture outputs from existing flows rather than building parallel ones?
+2. Is this plan rebuilding anything that already exists? If yes, explain why rebuilding is better than refactoring.
+
+### 0C. Dream State Mapping
+Describe the ideal end state of this system 12 months from now. Does this plan move toward that state or away from it?
+```
+ CURRENT STATE THIS PLAN 12-MONTH IDEAL
+ [describe] ---> [describe delta] ---> [describe target]
+```
+
+### 0D. Mode-Specific Analysis
+**For SCOPE EXPANSION** — run all three:
+1. 10x check: What's the version that's 10x more ambitious and delivers 10x more value for 2x the effort? Describe it concretely.
+2. Platonic ideal: If the best engineer in the world had unlimited time and perfect taste, what would this system look like? What would the user feel when using it? Start from experience, not architecture.
+3. Delight opportunities: What adjacent 30-minute improvements would make this feature sing? Things where a user would think "oh nice, they thought of that." List at least 3.
+
+**For HOLD SCOPE** — run this:
+1. Complexity check: If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
+2. What is the minimum set of changes that achieves the stated goal? Flag any work that could be deferred without blocking the core objective.
+
+**For SCOPE REDUCTION** — run this:
+1. Ruthless cut: What is the absolute minimum that ships value to a user? Everything else is deferred. No exceptions.
+2. What can be a follow-up PR? Separate "must ship together" from "nice to ship together."
+
+### 0E. Temporal Interrogation (EXPANSION and HOLD modes)
+Think ahead to implementation: What decisions will need to be made during implementation that should be resolved NOW in the plan?
+```
+ HOUR 1 (foundations): What does the implementer need to know?
+ HOUR 2-3 (core logic): What ambiguities will they hit?
+ HOUR 4-5 (integration): What will surprise them?
+ HOUR 6+ (polish/tests): What will they wish they'd planned for?
+```
+Surface these as questions for the user NOW, not as "figure it out later."
+
+### 0F. Mode Selection
+Present three options:
+1. **SCOPE EXPANSION:** The plan is good but could be great. Propose the ambitious version, then review that. Push scope up. Build the cathedral.
+2. **HOLD SCOPE:** The plan's scope is right. Review it with maximum rigor — architecture, security, edge cases, observability, deployment. Make it bulletproof.
+3. **SCOPE REDUCTION:** The plan is overbuilt or wrong-headed. Propose a minimal version that achieves the core goal, then review that.
+
+Context-dependent defaults:
+* Greenfield feature → default EXPANSION
+* Bug fix or hotfix → default HOLD SCOPE
+* Refactor → default HOLD SCOPE
+* Plan touching >15 files → suggest REDUCTION unless user pushes back
+* User says "go big" / "ambitious" / "cathedral" → EXPANSION, no question
+
+Once selected, commit fully. Do not silently drift.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+## Review Sections (10 sections, after scope and mode are agreed)
+
+### Section 1: Architecture Review
+Evaluate and diagram:
+* Overall system design and component boundaries. Draw the dependency graph.
+* Data flow — all four paths. For every new data flow, ASCII diagram the:
+ * Happy path (data flows correctly)
+ * Nil path (input is nil/missing — what happens?)
+ * Empty path (input is present but empty/zero-length — what happens?)
+ * Error path (upstream call fails — what happens?)
+* State machines. ASCII diagram for every new stateful object. Include impossible/invalid transitions and what prevents them.
+* Coupling concerns. Which components are now coupled that weren't before? Is that coupling justified? Draw the before/after dependency graph.
+* Scaling characteristics. What breaks first under 10x load? Under 100x?
+* Single points of failure. Map them.
+* Security architecture. Auth boundaries, data access patterns, API surfaces. For each new endpoint or data mutation: who can call it, what do they get, what can they change?
+* Production failure scenarios. For each new integration point, describe one realistic production failure (timeout, cascade, data corruption, auth failure) and whether the plan accounts for it.
+* Rollback posture. If this ships and immediately breaks, what's the rollback procedure? Git revert? Feature flag? DB migration rollback? How long?
+
+**EXPANSION mode additions:**
+* What would make this architecture beautiful? Not just correct — elegant. Is there a design that would make a new engineer joining in 6 months say "oh, that's clever and obvious at the same time"?
+* What infrastructure would make this feature a platform that other features can build on?
+
+Required ASCII diagram: full system architecture showing new components and their relationships to existing ones.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 2: Error & Rescue Map
+This is the section that catches silent failures. It is not optional.
+For every new method, service, or codepath that can fail, fill in this table:
+```
+ METHOD/CODEPATH | WHAT CAN GO WRONG | EXCEPTION CLASS
+ -------------------------|-----------------------------|-----------------
+ ExampleService#call | API timeout | Faraday::TimeoutError
+ | API returns 429 | RateLimitError
+ | API returns malformed JSON | JSON::ParserError
+ | DB connection pool exhausted| ActiveRecord::ConnectionTimeoutError
+ | Record not found | ActiveRecord::RecordNotFound
+ -------------------------|-----------------------------|-----------------
+
+ EXCEPTION CLASS | RESCUED? | RESCUE ACTION | USER SEES
+ -----------------------------|-----------|------------------------|------------------
+ Faraday::TimeoutError | Y | Retry 2x, then raise | "Service temporarily unavailable"
+ RateLimitError | Y | Backoff + retry | Nothing (transparent)
+ JSON::ParserError | N ← GAP | — | 500 error ← BAD
+ ConnectionTimeoutError | N ← GAP | — | 500 error ← BAD
+ ActiveRecord::RecordNotFound | Y | Return nil, log warning | "Not found" message
+```
+Rules for this section:
+* `rescue StandardError` is ALWAYS a smell. Name the specific exceptions.
+* `rescue => e` with only `Rails.logger.error(e.message)` is insufficient. Log the full context: what was being attempted, with what arguments, for what user/request.
+* Every rescued error must either: retry with backoff, degrade gracefully with a user-visible message, or re-raise with added context. "Swallow and continue" is almost never acceptable.
+* For each GAP (unrescued error that should be rescued): specify the rescue action and what the user should see.
+* For LLM/AI service calls specifically: what happens when the response is malformed? When it's empty? When it hallucinates invalid JSON? When the model returns a refusal? Each of these is a distinct failure mode.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 3: Security & Threat Model
+Security is not a sub-bullet of architecture. It gets its own section.
+Evaluate:
+* Attack surface expansion. What new attack vectors does this plan introduce? New endpoints, new params, new file paths, new background jobs?
+* Input validation. For every new user input: is it validated, sanitized, and rejected loudly on failure? What happens with: nil, empty string, string when integer expected, string exceeding max length, unicode edge cases, HTML/script injection attempts?
+* Authorization. For every new data access: is it scoped to the right user/role? Is there a direct object reference vulnerability? Can user A access user B's data by manipulating IDs?
+* Secrets and credentials. New secrets? In env vars, not hardcoded? Rotatable?
+* Dependency risk. New gems/npm packages? Security track record?
+* Data classification. PII, payment data, credentials? Handling consistent with existing patterns?
+* Injection vectors. SQL, command, template, LLM prompt injection — check all.
+* Audit logging. For sensitive operations: is there an audit trail?
+
+For each finding: threat, likelihood (High/Med/Low), impact (High/Med/Low), and whether the plan mitigates it.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 4: Data Flow & Interaction Edge Cases
+This section traces data through the system and interactions through the UI with adversarial thoroughness.
+
+**Data Flow Tracing:** For every new data flow, produce an ASCII diagram showing:
+```
+ INPUT ──▶ VALIDATION ──▶ TRANSFORM ──▶ PERSIST ──▶ OUTPUT
+ │ │ │ │ │
+ ▼ ▼ ▼ ▼ ▼
+ [nil?] [invalid?] [exception?] [conflict?] [stale?]
+ [empty?] [too long?] [timeout?] [dup key?] [partial?]
+ [wrong [wrong type?] [OOM?] [locked?] [encoding?]
+ type?]
+```
+For each node: what happens on each shadow path? Is it tested?
+
+**Interaction Edge Cases:** For every new user-visible interaction, evaluate:
+```
+ INTERACTION | EDGE CASE | HANDLED? | HOW?
+ ---------------------|------------------------|----------|--------
+ Form submission | Double-click submit | ? |
+ | Submit with stale CSRF | ? |
+ | Submit during deploy | ? |
+ Async operation | User navigates away | ? |
+ | Operation times out | ? |
+ | Retry while in-flight | ? |
+ List/table view | Zero results | ? |
+ | 10,000 results | ? |
+ | Results change mid-page| ? |
+ Background job | Job fails after 3 of | ? |
+ | 10 items processed | |
+ | Job runs twice (dup) | ? |
+ | Queue backs up 2 hours | ? |
+```
+Flag any unhandled edge case as a gap. For each gap, specify the fix.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 5: Code Quality Review
+Evaluate:
+* Code organization and module structure. Does new code fit existing patterns? If it deviates, is there a reason?
+* DRY violations. Be aggressive. If the same logic exists elsewhere, flag it and reference the file and line.
+* Naming quality. Are new classes, methods, and variables named for what they do, not how they do it?
+* Error handling patterns. (Cross-reference with Section 2 — this section reviews the patterns; Section 2 maps the specifics.)
+* Missing edge cases. List explicitly: "What happens when X is nil?" "When the API returns 429?" etc.
+* Over-engineering check. Any new abstraction solving a problem that doesn't exist yet?
+* Under-engineering check. Anything fragile, assuming happy path only, or missing obvious defensive checks?
+* Cyclomatic complexity. Flag any new method that branches more than 5 times. Propose a refactor.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 6: Test Review
+Make a complete diagram of every new thing this plan introduces:
+```
+ NEW UX FLOWS:
+ [list each new user-visible interaction]
+
+ NEW DATA FLOWS:
+ [list each new path data takes through the system]
+
+ NEW CODEPATHS:
+ [list each new branch, condition, or execution path]
+
+ NEW BACKGROUND JOBS / ASYNC WORK:
+ [list each]
+
+ NEW INTEGRATIONS / EXTERNAL CALLS:
+ [list each]
+
+ NEW ERROR/RESCUE PATHS:
+ [list each — cross-reference Section 2]
+```
+For each item in the diagram:
+* What type of test covers it? (Unit / Integration / System / E2E)
+* Does a test for it exist in the plan? If not, write the test spec header.
+* What is the happy path test?
+* What is the failure path test? (Be specific — which failure?)
+* What is the edge case test? (nil, empty, boundary values, concurrent access)
+
+Test ambition check (all modes): For each new feature, answer:
+* What's the test that would make you confident shipping at 2am on a Friday?
+* What's the test a hostile QA engineer would write to break this?
+* What's the chaos test?
+
+Test pyramid check: Many unit, fewer integration, few E2E? Or inverted?
+Flakiness risk: Flag any test depending on time, randomness, external services, or ordering.
+Load/stress test requirements: For any new codepath called frequently or processing significant data.
+
+For LLM/prompt changes: Check CLAUDE.md for the "Prompt/LLM changes" file patterns. If this plan touches ANY of those patterns, state which eval suites must be run, which cases should be added, and what baselines to compare against.
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 7: Performance Review
+Evaluate:
+* N+1 queries. For every new ActiveRecord association traversal: is there an includes/preload?
+* Memory usage. For every new data structure: what's the maximum size in production?
+* Database indexes. For every new query: is there an index?
+* Caching opportunities. For every expensive computation or external call: should it be cached?
+* Background job sizing. For every new job: worst-case payload, runtime, retry behavior?
+* Slow paths. Top 3 slowest new codepaths and estimated p99 latency.
+* Connection pool pressure. New DB connections, Redis connections, HTTP connections?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 8: Observability & Debuggability Review
+New systems break. This section ensures you can see why.
+Evaluate:
+* Logging. For every new codepath: structured log lines at entry, exit, and each significant branch?
+* Metrics. For every new feature: what metric tells you it's working? What tells you it's broken?
+* Tracing. For new cross-service or cross-job flows: trace IDs propagated?
+* Alerting. What new alerts should exist?
+* Dashboards. What new dashboard panels do you want on day 1?
+* Debuggability. If a bug is reported 3 weeks post-ship, can you reconstruct what happened from logs alone?
+* Admin tooling. New operational tasks that need admin UI or rake tasks?
+* Runbooks. For each new failure mode: what's the operational response?
+
+**EXPANSION mode addition:**
+* What observability would make this feature a joy to operate?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 9: Deployment & Rollout Review
+Evaluate:
+* Migration safety. For every new DB migration: backward-compatible? Zero-downtime? Table locks?
+* Feature flags. Should any part be behind a feature flag?
+* Rollout order. Correct sequence: migrate first, deploy second?
+* Rollback plan. Explicit step-by-step.
+* Deploy-time risk window. Old code and new code running simultaneously — what breaks?
+* Environment parity. Tested in staging?
+* Post-deploy verification checklist. First 5 minutes? First hour?
+* Smoke tests. What automated checks should run immediately post-deploy?
+
+**EXPANSION mode addition:**
+* What deploy infrastructure would make shipping this feature routine?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+### Section 10: Long-Term Trajectory Review
+Evaluate:
+* Technical debt introduced. Code debt, operational debt, testing debt, documentation debt.
+* Path dependency. Does this make future changes harder?
+* Knowledge concentration. Documentation sufficient for a new engineer?
+* Reversibility. Rate 1-5: 1 = one-way door, 5 = easily reversible.
+* Ecosystem fit. Aligns with Rails/JS ecosystem direction?
+* The 1-year question. Read this plan as a new engineer in 12 months — obvious?
+
+**EXPANSION mode additions:**
+* What comes after this ships? Phase 2? Phase 3? Does the architecture support that trajectory?
+* Platform potential. Does this create capabilities other features can leverage?
+**STOP.** AskUserQuestion once per issue. Do NOT batch. Recommend + WHY. If no issues or fix is obvious, state what you'll do and move on — don't waste a question. Do NOT proceed until user responds.
+
+## CRITICAL RULE — How to ask questions
+Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous.
+
+## For Each Issue You Find
+* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
+* Describe the problem concretely, with file and line references.
+* Present 2-3 options, including "do nothing" where reasonable.
+* For each option: effort, risk, and maintenance burden in one line.
+* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu.
+* **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference.
+* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
+* **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs.
+
+## Required Outputs
+
+### "NOT in scope" section
+List work considered and explicitly deferred, with one-line rationale each.
+
+### "What already exists" section
+List existing code/flows that partially solve sub-problems and whether the plan reuses them.
+
+### "Dream state delta" section
+Where this plan leaves us relative to the 12-month ideal.
+
+### Error & Rescue Registry (from Section 2)
+Complete table of every method that can fail, every exception class, rescued status, rescue action, user impact.
+
+### Failure Modes Registry
+```
+ CODEPATH | FAILURE MODE | RESCUED? | TEST? | USER SEES? | LOGGED?
+ ---------|----------------|----------|-------|----------------|--------
+```
+Any row with RESCUED=N, TEST=N, USER SEES=Silent → **CRITICAL GAP**.
+
+### TODOS.md updates
+Present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+
+For each TODO, describe:
+* **What:** One-line description of the work.
+* **Why:** The concrete problem it solves or value it unlocks.
+* **Pros:** What you gain by doing this work.
+* **Cons:** Cost, complexity, or risks of doing it.
+* **Context:** Enough detail that someone picking this up in 3 months understands the motivation, the current state, and where to start.
+* **Effort estimate:** S/M/L/XL
+* **Priority:** P1/P2/P3
+* **Depends on / blocked by:** Any prerequisites or ordering constraints.
+
+Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
+
+### Delight Opportunities (EXPANSION mode only)
+Identify at least 5 "bonus chunk" opportunities (<30 min each) that would make users think "oh nice, they thought of that." Present each delight opportunity as its own individual AskUserQuestion. Never batch them. For each one, describe what it is, why it would delight users, and effort estimate. Then present options: **A)** Add to TODOS.md as a vision item **B)** Skip **C)** Build it now in this PR.
+
+### Diagrams (mandatory, produce all that apply)
+1. System architecture
+2. Data flow (including shadow paths)
+3. State machine
+4. Error flow
+5. Deployment sequence
+6. Rollback flowchart
+
+### Stale Diagram Audit
+List every ASCII diagram in files this plan touches. Still accurate?
+
+### Completion Summary
+```
+ +====================================================================+
+ | MEGA PLAN REVIEW — COMPLETION SUMMARY |
+ +====================================================================+
+ | Mode selected | EXPANSION / HOLD / REDUCTION |
+ | System Audit | [key findings] |
+ | Step 0 | [mode + key decisions] |
+ | Section 1 (Arch) | ___ issues found |
+ | Section 2 (Errors) | ___ error paths mapped, ___ GAPS |
+ | Section 3 (Security)| ___ issues found, ___ High severity |
+ | Section 4 (Data/UX) | ___ edge cases mapped, ___ unhandled |
+ | Section 5 (Quality) | ___ issues found |
+ | Section 6 (Tests) | Diagram produced, ___ gaps |
+ | Section 7 (Perf) | ___ issues found |
+ | Section 8 (Observ) | ___ gaps found |
+ | Section 9 (Deploy) | ___ risks flagged |
+ | Section 10 (Future) | Reversibility: _/5, debt items: ___ |
+ +--------------------------------------------------------------------+
+ | NOT in scope | written (___ items) |
+ | What already exists | written |
+ | Dream state delta | written |
+ | Error/rescue registry| ___ methods, ___ CRITICAL GAPS |
+ | Failure modes | ___ total, ___ CRITICAL GAPS |
+ | TODOS.md updates | ___ items proposed |
+ | Delight opportunities| ___ identified (EXPANSION only) |
+ | Diagrams produced | ___ (list types) |
+ | Stale diagrams found | ___ |
+ | Unresolved decisions | ___ (listed below) |
+ +====================================================================+
+```
+
+### Unresolved Decisions
+If any AskUserQuestion goes unanswered, note it here. Never silently default.
+
+## Formatting Rules
+* NUMBER issues (1, 2, 3...) and LETTERS for options (A, B, C...).
+* Label with NUMBER + LETTER (e.g., "3A", "3B").
+* Recommended option always listed first.
+* One sentence max per option.
+* After each section, pause and wait for feedback.
+* Use **CRITICAL GAP** / **WARNING** / **OK** for scannability.
+
+## Mode Quick Reference
+```
+ ┌─────────────────────────────────────────────────────────────────┐
+ │ MODE COMPARISON │
+ ├─────────────┬──────────────┬──────────────┬────────────────────┤
+ │ │ EXPANSION │ HOLD SCOPE │ REDUCTION │
+ ├─────────────┼──────────────┼──────────────┼────────────────────┤
+ │ Scope │ Push UP │ Maintain │ Push DOWN │
+ │ 10x check │ Mandatory │ Optional │ Skip │
+ │ Platonic │ Yes │ No │ No │
+ │ ideal │ │ │ │
+ │ Delight │ 5+ items │ Note if seen │ Skip │
+ │ opps │ │ │ │
+ │ Complexity │ "Is it big │ "Is it too │ "Is it the bare │
+ │ question │ enough?" │ complex?" │ minimum?" │
+ │ Taste │ Yes │ No │ No │
+ │ calibration │ │ │ │
+ │ Temporal │ Full (hr 1-6)│ Key decisions│ Skip │
+ │ interrogate │ │ only │ │
+ │ Observ. │ "Joy to │ "Can we │ "Can we see if │
+ │ standard │ operate" │ debug it?" │ it's broken?" │
+ │ Deploy │ Infra as │ Safe deploy │ Simplest possible │
+ │ standard │ feature scope│ + rollback │ deploy │
+ │ Error map │ Full + chaos │ Full │ Critical paths │
+ │ │ scenarios │ │ only │
+ │ Phase 2/3 │ Map it │ Note it │ Skip │
+ │ planning │ │ │ │
+ └─────────────┴──────────────┴──────────────┴────────────────────┘
+```
A => plan-eng-review/SKILL.md +162 -0
@@ 1,162 @@
+---
+name: plan-eng-review
+version: 1.0.0
+description: |
+ Eng manager-mode plan review. Lock in the execution plan — architecture,
+ data flow, diagrams, edge cases, test coverage, performance. Walks through
+ issues interactively with opinionated recommendations.
+allowed-tools:
+ - Read
+ - Grep
+ - Glob
+ - AskUserQuestion
+---
+
+# Plan Review Mode
+
+Review this plan thoroughly before making any code changes. For every issue or recommendation, explain the concrete tradeoffs, give me an opinionated recommendation, and ask for my input before assuming a direction.
+
+## Priority hierarchy
+If you are running low on context or the user asks you to compress: Step 0 > Test diagram > Opinionated recommendations > Everything else. Never skip Step 0 or the test diagram.
+
+## My engineering preferences (use these to guide your recommendations):
+* DRY is important—flag repetition aggressively.
+* Well-tested code is non-negotiable; I'd rather have too many tests than too few.
+* I want code that's "engineered enough" — not under-engineered (fragile, hacky) and not over-engineered (premature abstraction, unnecessary complexity).
+* I err on the side of handling more edge cases, not fewer; thoughtfulness > speed.
+* Bias toward explicit over clever.
+* Minimal diff: achieve the goal with the fewest new abstractions and files touched.
+
+## Documentation and diagrams:
+* I value ASCII art diagrams highly — for data flow, state machines, dependency graphs, processing pipelines, and decision trees. Use them liberally in plans and design docs.
+* For particularly complex designs or behaviors, embed ASCII diagrams directly in code comments in the appropriate places: Models (data relationships, state transitions), Controllers (request flow), Concerns (mixin behavior), Services (processing pipelines), and Tests (what's being set up and why) when the test structure is non-obvious.
+* **Diagram maintenance is part of the change.** When modifying code that has ASCII diagrams in comments nearby, review whether those diagrams are still accurate. Update them as part of the same commit. Stale diagrams are worse than no diagrams — they actively mislead. Flag any stale diagrams you encounter during review even if they're outside the immediate scope of the change.
+
+## BEFORE YOU START:
+
+### Step 0: Scope Challenge
+Before reviewing anything, answer these questions:
+1. **What existing code already partially or fully solves each sub-problem?** Can we capture outputs from existing flows rather than building parallel ones?
+2. **What is the minimum set of changes that achieves the stated goal?** Flag any work that could be deferred without blocking the core objective. Be ruthless about scope creep.
+3. **Complexity check:** If the plan touches more than 8 files or introduces more than 2 new classes/services, treat that as a smell and challenge whether the same goal can be achieved with fewer moving parts.
+
+Then ask if I want one of three options:
+1. **SCOPE REDUCTION:** The plan is overbuilt. Propose a minimal version that achieves the core goal, then review that.
+2. **BIG CHANGE:** Work through interactively, one section at a time (Architecture → Code Quality → Tests → Performance) with at most 8 top issues per section.
+3. **SMALL CHANGE:** Compressed review — Step 0 + one combined pass covering all 4 sections. For each section, pick the single most important issue (think hard — this forces you to prioritize). Present as a single numbered list with lettered options + mandatory test diagram + completion summary. One AskUserQuestion round at the end. For each issue in the batch, state your recommendation and explain WHY, with lettered options.
+
+**Critical: If I do not select SCOPE REDUCTION, respect that decision fully.** Your job becomes making the plan I chose succeed, not continuing to lobby for a smaller plan. Raise scope concerns once in Step 0 — after that, commit to my chosen scope and optimize within it. Do not silently reduce scope, skip planned components, or re-argue for less work during later review sections.
+
+## Review Sections (after scope is agreed)
+
+### 1. Architecture review
+Evaluate:
+* Overall system design and component boundaries.
+* Dependency graph and coupling concerns.
+* Data flow patterns and potential bottlenecks.
+* Scaling characteristics and single points of failure.
+* Security architecture (auth, data access, API boundaries).
+* Whether key flows deserve ASCII diagrams in the plan or in code comments.
+* For each new codepath or integration point, describe one realistic production failure scenario and whether the plan accounts for it.
+
+**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved.
+
+### 2. Code quality review
+Evaluate:
+* Code organization and module structure.
+* DRY violations—be aggressive here.
+* Error handling patterns and missing edge cases (call these out explicitly).
+* Technical debt hotspots.
+* Areas that are over-engineered or under-engineered relative to my preferences.
+* Existing ASCII diagrams in touched files — are they still accurate after this change?
+
+**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved.
+
+### 3. Test review
+Make a diagram of all new UX, new data flow, new codepaths, and new branching if statements or outcomes. For each, note what is new about the features discussed in this branch and plan. Then, for each new item in the diagram, make sure there is a JS or Rails test.
+
+For LLM/prompt changes: check the "Prompt/LLM changes" file patterns listed in CLAUDE.md. If this plan touches ANY of those patterns, state which eval suites must be run, which cases should be added, and what baselines to compare against. Then use AskUserQuestion to confirm the eval scope with the user.
+
+**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved.
+
+### 4. Performance review
+Evaluate:
+* N+1 queries and database access patterns.
+* Memory-usage concerns.
+* Caching opportunities.
+* Slow or high-complexity code paths.
+
+**STOP.** For each issue found in this section, call AskUserQuestion individually. One issue per call. Present options, state your recommendation, explain WHY. Do NOT batch multiple issues into one AskUserQuestion. Only proceed to the next section after ALL issues in this section are resolved.
+
+## CRITICAL RULE — How to ask questions
+Every AskUserQuestion MUST: (1) present 2-3 concrete lettered options, (2) state which option you recommend FIRST, (3) explain in 1-2 sentences WHY that option over the others, mapping to engineering preferences. No batching multiple issues into one question. No yes/no questions. Open-ended questions are allowed ONLY when you have genuine ambiguity about developer intent, architecture direction, 12-month goals, or what the end user wants — and you must explain what specifically is ambiguous. **Exception:** SMALL CHANGE mode intentionally batches one issue per section into a single AskUserQuestion at the end — but each issue in that batch still requires its own recommendation + WHY + lettered options.
+
+## For each issue you find
+For every specific issue (bug, smell, design concern, or risk):
+* **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
+* Describe the problem concretely, with file and line references.
+* Present 2–3 options, including "do nothing" where that's reasonable.
+* For each option, specify in one line: effort, risk, and maintenance burden.
+* **Lead with your recommendation.** State it as a directive: "Do B. Here's why:" — not "Option B might be worth considering." Be opinionated. I'm paying for your judgment, not a menu.
+* **Map the reasoning to my engineering preferences above.** One sentence connecting your recommendation to a specific preference (DRY, explicit > clever, minimal diff, etc.).
+* **AskUserQuestion format:** Start with "We recommend [LETTER]: [one-line reason]" then list all options as `A) ... B) ... C) ...`. Label with issue NUMBER + option LETTER (e.g., "3A", "3B").
+* **Escape hatch:** If a section has no issues, say so and move on. If an issue has an obvious fix with no real alternatives, state what you'll do and move on — don't waste a question on it. Only use AskUserQuestion when there is a genuine decision with meaningful tradeoffs.
+
+## Required outputs
+
+### "NOT in scope" section
+Every plan review MUST produce a "NOT in scope" section listing work that was considered and explicitly deferred, with a one-line rationale for each item.
+
+### "What already exists" section
+List existing code/flows that already partially solve sub-problems in this plan, and whether the plan reuses them or unnecessarily rebuilds them.
+
+### TODOS.md updates
+After all review sections are complete, present each potential TODO as its own individual AskUserQuestion. Never batch TODOs — one per question. Never silently skip this step.
+
+For each TODO, describe:
+* **What:** One-line description of the work.
+* **Why:** The concrete problem it solves or value it unlocks.
+* **Pros:** What you gain by doing this work.
+* **Cons:** Cost, complexity, or risks of doing it.
+* **Context:** Enough detail that someone picking this up in 3 months understands the motivation, the current state, and where to start.
+* **Depends on / blocked by:** Any prerequisites or ordering constraints.
+
+Then present options: **A)** Add to TODOS.md **B)** Skip — not valuable enough **C)** Build it now in this PR instead of deferring.
+
+Do NOT just append vague bullet points. A TODO without context is worse than no TODO — it creates false confidence that the idea was captured while actually losing the reasoning.
+
+### Diagrams
+The plan itself should use ASCII diagrams for any non-trivial data flow, state machine, or processing pipeline. Additionally, identify which files in the implementation should get inline ASCII diagram comments — particularly Models with complex state transitions, Services with multi-step pipelines, and Concerns with non-obvious mixin behavior.
+
+### Failure modes
+For each new codepath identified in the test review diagram, list one realistic way it could fail in production (timeout, nil reference, race condition, stale data, etc.) and whether:
+1. A test covers that failure
+2. Error handling exists for it
+3. The user would see a clear error or a silent failure
+
+If any failure mode has no test AND no error handling AND would be silent, flag it as a **critical gap**.
+
+### Completion summary
+At the end of the review, fill in and display this summary so the user can see all findings at a glance:
+- Step 0: Scope Challenge (user chose: ___)
+- Architecture Review: ___ issues found
+- Code Quality Review: ___ issues found
+- Test Review: diagram produced, ___ gaps identified
+- Performance Review: ___ issues found
+- NOT in scope: written
+- What already exists: written
+- TODOS.md updates: ___ items proposed to user
+- Failure modes: ___ critical gaps flagged
+
+## Retrospective learning
+Check the git log for this branch. If there are prior commits suggesting a previous review cycle (e.g., review-driven refactors, reverted changes), note what was changed and whether the current plan touches the same areas. Be more aggressive reviewing areas that were previously problematic.
+
+## Formatting rules
+* NUMBER issues (1, 2, 3...) and give LETTERS for options (A, B, C...).
+* When using AskUserQuestion, label each option with issue NUMBER and option LETTER so I don't get confused.
+* Recommended option is always listed first.
+* Keep each option to one sentence max. I should be able to pick in under 5 seconds.
+* After each review section, pause and ask for feedback before moving on.
+
+## Unresolved decisions
+If the user does not respond to an AskUserQuestion or interrupts to move on, note which decisions were left unresolved. At the end of the review, list these as "Unresolved decisions that may bite you later" — never silently default to an option.
A => retro/SKILL.md +340 -0
@@ 1,340 @@
+---
+name: retro
+version: 1.0.0
+description: |
+ Weekly engineering retrospective. Analyzes commit history, work patterns,
+ and code quality metrics with persistent history and trend tracking.
+allowed-tools:
+ - Bash
+ - Read
+ - Write
+ - Glob
+---
+
+# /retro — Weekly Engineering Retrospective
+
+Generates a comprehensive engineering retrospective analyzing commit history, work patterns, and code quality metrics. Designed for a senior IC/CTO-level builder using Claude Code as a force multiplier.
+
+## User-invocable
+When the user types `/retro`, run this skill.
+
+## Arguments
+- `/retro` — default: last 7 days
+- `/retro 24h` — last 24 hours
+- `/retro 14d` — last 14 days
+- `/retro 30d` — last 30 days
+- `/retro compare` — compare current window vs prior same-length window
+- `/retro compare 14d` — compare with explicit window
+
+## Instructions
+
+Parse the argument to determine the time window. Default to 7 days if no argument given. Use `--since="N days ago"`, `--since="N hours ago"`, or `--since="N weeks ago"` (for `w` units) for git log queries. All times should be reported in **Pacific time** (use `TZ=America/Los_Angeles` when converting timestamps).
+
+**Argument validation:** If the argument doesn't match a number followed by `d`, `h`, or `w`, the word `compare`, or `compare` followed by a number and `d`/`h`/`w`, show this usage and stop:
+```
+Usage: /retro [window]
+ /retro — last 7 days (default)
+ /retro 24h — last 24 hours
+ /retro 14d — last 14 days
+ /retro 30d — last 30 days
+ /retro compare — compare this period vs prior period
+ /retro compare 14d — compare with explicit window
+```
+
+### Step 1: Gather Raw Data
+
+First, fetch origin to ensure we have the latest:
+```bash
+git fetch origin main --quiet
+```
+
+Run ALL of these git commands in parallel (they are independent):
+
+```bash
+# 1. All commits in window with timestamps, subject, hash, files changed, insertions, deletions
+git log origin/main --since="<window>" --format="%H|%ai|%s" --shortstat
+
+# 2. Per-commit test vs total LOC breakdown (single command, parse output)
+# Each commit block starts with COMMIT:<hash>, followed by numstat lines.
+# Separate test files (matching test/|spec/|__tests__/) from production files.
+git log origin/main --since="<window>" --format="COMMIT:%H" --numstat
+
+# 3. Commit timestamps for session detection and hourly distribution
+# Use TZ=America/Los_Angeles for Pacific time conversion
+TZ=America/Los_Angeles git log origin/main --since="<window>" --format="%at|%ai|%s" | sort -n
+
+# 4. Files most frequently changed (hotspot analysis)
+git log origin/main --since="<window>" --format="" --name-only | grep -v '^$' | sort | uniq -c | sort -rn
+
+# 5. PR numbers from commit messages (extract #NNN patterns)
+git log origin/main --since="<window>" --format="%s" | grep -oE '#[0-9]+' | sed 's/^#//' | sort -n | uniq | sed 's/^/#/'
+```
+
+### Step 2: Compute Metrics
+
+Calculate and present these metrics in a summary table:
+
+| Metric | Value |
+|--------|-------|
+| Commits to main | N |
+| PRs merged | N |
+| Total insertions | N |
+| Total deletions | N |
+| Net LOC added | N |
+| Test LOC (insertions) | N |
+| Test LOC ratio | N% |
+| Version range | vX.Y.Z.W → vX.Y.Z.W |
+| Active days | N |
+| Detected sessions | N |
+| Avg LOC/session-hour | N |
+
+### Step 3: Commit Time Distribution
+
+Show hourly histogram in Pacific time using bar chart:
+
+```
+Hour Commits ████████████████
+ 00: 4 ████
+ 07: 5 █████
+ ...
+```
+
+Identify and call out:
+- Peak hours
+- Dead zones
+- Whether pattern is bimodal (morning/evening) or continuous
+- Late-night coding clusters (after 10pm)
+
+### Step 4: Work Session Detection
+
+Detect sessions using **45-minute gap** threshold between consecutive commits. For each session report:
+- Start/end time (Pacific)
+- Number of commits
+- Duration in minutes
+
+Classify sessions:
+- **Deep sessions** (50+ min)
+- **Medium sessions** (20-50 min)
+- **Micro sessions** (<20 min, typically single-commit fire-and-forget)
+
+Calculate:
+- Total active coding time (sum of session durations)
+- Average session length
+- LOC per hour of active time
+
+### Step 5: Commit Type Breakdown
+
+Categorize by conventional commit prefix (feat/fix/refactor/test/chore/docs). Show as percentage bar:
+
+```
+feat: 20 (40%) ████████████████████
+fix: 27 (54%) ███████████████████████████
+refactor: 2 ( 4%) ██
+```
+
+Flag if fix ratio exceeds 50% — this signals a "ship fast, fix fast" pattern that may indicate review gaps.
+
+### Step 6: Hotspot Analysis
+
+Show top 10 most-changed files. Flag:
+- Files changed 5+ times (churn hotspots)
+- Test files vs production files in the hotspot list
+- VERSION/CHANGELOG frequency (version discipline indicator)
+
+### Step 7: PR Size Distribution
+
+From commit diffs, estimate PR sizes and bucket them:
+- **Small** (<100 LOC)
+- **Medium** (100-500 LOC)
+- **Large** (500-1500 LOC)
+- **XL** (1500+ LOC) — flag these with file counts
+
+### Step 8: Focus Score + Ship of the Week
+
+**Focus score:** Calculate the percentage of commits touching the single most-changed top-level directory (e.g., `app/services/`, `app/views/`). Higher score = deeper focused work. Lower score = scattered context-switching. Report as: "Focus score: 62% (app/services/)"
+
+**Ship of the week:** Auto-identify the single highest-LOC PR in the window. Highlight it:
+- PR number and title
+- LOC changed
+- Why it matters (infer from commit messages and files touched)
+
+### Step 9: Week-over-Week Trends (if window >= 14d)
+
+If the time window is 14 days or more, split into weekly buckets and show trends:
+- Commits per week
+- LOC per week
+- Test ratio per week
+- Fix ratio per week
+- Session count per week
+
+### Step 10: Streak Tracking
+
+Count consecutive days with at least 1 commit to origin/main, going back from today:
+
+```bash
+# Get all unique commit dates (Pacific time) — no hard cutoff
+TZ=America/Los_Angeles git log origin/main --format="%ad" --date=format:"%Y-%m-%d" | sort -u
+```
+
+Count backward from today — how many consecutive days have at least one commit? This queries the full history so streaks of any length are reported accurately. Display: "Shipping streak: 47 consecutive days"
+
+### Step 11: Load History & Compare
+
+Before saving the new snapshot, check for prior retro history:
+
+```bash
+ls -t .context/retros/*.json 2>/dev/null
+```
+
+**If prior retros exist:** Load the most recent one using the Read tool. Calculate deltas for key metrics and include a **Trends vs Last Retro** section:
+```
+ Last Now Delta
+Test ratio: 22% → 41% ↑19pp
+Sessions: 10 → 14 ↑4
+LOC/hour: 200 → 350 ↑75%
+Fix ratio: 54% → 30% ↓24pp (improving)
+Commits: 32 → 47 ↑47%
+Deep sessions: 3 → 5 ↑2
+```
+
+**If no prior retros exist:** Skip the comparison section and append: "First retro recorded — run again next week to see trends."
+
+### Step 12: Save Retro History
+
+After computing all metrics (including streak) and loading any prior history for comparison, save a JSON snapshot:
+
+```bash
+mkdir -p .context/retros
+```
+
+Determine the next sequence number for today (substitute the actual date for `$(date +%Y-%m-%d)`):
+```bash
+# Count existing retros for today to get next sequence number
+today=$(TZ=America/Los_Angeles date +%Y-%m-%d)
+existing=$(ls .context/retros/${today}-*.json 2>/dev/null | wc -l | tr -d ' ')
+next=$((existing + 1))
+# Save as .context/retros/${today}-${next}.json
+```
+
+Use the Write tool to save the JSON file with this schema:
+```json
+{
+ "date": "2026-03-08",
+ "window": "7d",
+ "metrics": {
+ "commits": 47,
+ "prs_merged": 12,
+ "insertions": 3200,
+ "deletions": 800,
+ "net_loc": 2400,
+ "test_loc": 1300,
+ "test_ratio": 0.41,
+ "active_days": 6,
+ "sessions": 14,
+ "deep_sessions": 5,
+ "avg_session_minutes": 42,
+ "loc_per_session_hour": 350,
+ "feat_pct": 0.40,
+ "fix_pct": 0.30,
+ "peak_hour": 22
+ },
+ "version_range": ["1.16.0.0", "1.16.1.0"],
+ "streak_days": 47,
+ "tweetable": "Week of Mar 1: 47 commits, 3.2k LOC, 38% tests, 12 PRs, peak: 10pm"
+}
+```
+
+### Step 13: Write the Narrative
+
+Structure the output as:
+
+---
+
+**Tweetable summary** (first line, before everything else):
+```
+Week of Mar 1: 47 commits, 3.2k LOC, 38% tests, 12 PRs, peak: 10pm | Streak: 47d
+```
+
+## Engineering Retro: [date range]
+
+### Summary Table
+(from Step 2)
+
+### Trends vs Last Retro
+(from Step 11, loaded before save — skip if first retro)
+
+### Time & Session Patterns
+(from Steps 3-4)
+
+Narrative interpreting what the patterns mean:
+- When the most productive hours are and what drives them
+- Whether sessions are getting longer or shorter over time
+- Estimated hours per day of active coding
+- How this maps to "CEO who also codes" lifestyle
+
+### Shipping Velocity
+(from Steps 5-7)
+
+Narrative covering:
+- Commit type mix and what it reveals
+- PR size discipline (are PRs staying small?)
+- Fix-chain detection (sequences of fix commits on the same subsystem)
+- Version bump discipline
+
+### Code Quality Signals
+- Test LOC ratio trend
+- Hotspot analysis (are the same files churning?)
+- Any XL PRs that should have been split
+
+### Focus & Highlights
+(from Step 8)
+- Focus score with interpretation
+- Ship of the week callout
+
+### Top 3 Wins
+Identify the 3 highest-impact things shipped in the window. For each:
+- What it was
+- Why it matters (product/architecture impact)
+- What's impressive about the execution
+
+### 3 Things to Improve
+Specific, actionable, anchored in actual commits. Phrase as "to get even better, you could..."
+
+### 3 Habits for Next Week
+Small, practical, realistic for a very busy person. Each must be something that takes <5 minutes to adopt.
+
+### Week-over-Week Trends
+(if applicable, from Step 9)
+
+---
+
+## Compare Mode
+
+When the user runs `/retro compare` (or `/retro compare 14d`):
+
+1. Compute metrics for the current window (default 7d) using `--since="7 days ago"`
+2. Compute metrics for the immediately prior same-length window using both `--since` and `--until` to avoid overlap (e.g., `--since="14 days ago" --until="7 days ago"` for a 7d window)
+3. Show a side-by-side comparison table with deltas and arrows
+4. Write a brief narrative highlighting the biggest improvements and regressions
+5. Save only the current-window snapshot to `.context/retros/` (same as a normal retro run); do **not** persist the prior-window metrics.
+
+## Tone
+
+- Encouraging but candid, no coddling
+- Specific and concrete — always anchor in actual commits/code
+- Skip generic praise ("great job!") — say exactly what was good and why
+- Frame improvements as leveling up, not criticism
+- Keep total output around 2500-3500 words
+- Use markdown tables and code blocks for data, prose for narrative
+- Output directly to the conversation — do NOT write to filesystem (except the `.context/retros/` JSON snapshot)
+
+## Important Rules
+
+- ALL narrative output goes directly to the user in the conversation. The ONLY file written is the `.context/retros/` JSON snapshot.
+- Use `origin/main` for all git queries (not local main which may be stale)
+- Convert all timestamps to Pacific time for display (use `TZ=America/Los_Angeles`)
+- If the window has zero commits, say so and suggest a different window
+- Round LOC/hour to nearest 50
+- Treat merge commits as PR boundaries
+- Do not read CLAUDE.md or other docs — this skill is self-contained
+- On first run (no prior retros), skip comparison sections gracefully
A => review/SKILL.md +78 -0
@@ 1,78 @@
+---
+name: review
+version: 1.0.0
+description: |
+ Pre-landing PR review. Analyzes diff against main for SQL safety, LLM trust
+ boundary violations, conditional side effects, and other structural issues.
+allowed-tools:
+ - Bash
+ - Read
+ - Edit
+ - Write
+ - Grep
+ - Glob
+ - AskUserQuestion
+---
+
+# Pre-Landing PR Review
+
+You are running the `/review` workflow. Analyze the current branch's diff against main for structural issues that tests don't catch.
+
+---
+
+## Step 1: Check branch
+
+1. Run `git branch --show-current` to get the current branch.
+2. If on `main`, output: **"Nothing to review — you're on main or have no changes against main."** and stop.
+3. Run `git fetch origin main --quiet && git diff origin/main --stat` to check if there's a diff. If no diff, output the same message and stop.
+
+---
+
+## Step 2: Read the checklist
+
+Read `.claude/skills/review/checklist.md`.
+
+**If the file cannot be read, STOP and report the error.** Do not proceed without the checklist.
+
+---
+
+## Step 3: Get the diff
+
+Fetch the latest main to avoid false positives from a stale local main:
+
+```bash
+git fetch origin main --quiet
+```
+
+Run `git diff origin/main` to get the full diff. This includes both committed and uncommitted changes against the latest main.
+
+---
+
+## Step 4: Two-pass review
+
+Apply the checklist against the diff in two passes:
+
+1. **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
+2. **Pass 2 (INFORMATIONAL):** Conditional Side Effects, Magic Numbers & String Coupling, Dead Code & Consistency, LLM Prompt Issues, Test Gaps, View/Frontend
+
+Follow the output format specified in the checklist. Respect the suppressions — do NOT flag items listed in the "DO NOT flag" section.
+
+---
+
+## Step 5: Output findings
+
+**Always output ALL findings** — both critical and informational. The user must see every issue.
+
+- If CRITICAL issues found: output all findings, then for EACH critical issue use a separate AskUserQuestion with the problem, your recommended fix, and options (A: Fix it now, B: Acknowledge, C: False positive — skip).
+ After all critical questions are answered, output a summary of what the user chose for each issue. If the user chose A (fix) on any issue, apply the recommended fixes. If only B/C were chosen, no action needed.
+- If only non-critical issues found: output findings. No further action needed.
+- If no issues found: output `Pre-Landing Review: No issues found.`
+
+---
+
+## Important Rules
+
+- **Read the FULL diff before commenting.** Do not flag issues already addressed in the diff.
+- **Read-only by default.** Only modify files if the user explicitly chooses "Fix it now" on a critical issue. Never commit, push, or create PRs.
+- **Be terse.** One line problem, one line fix. No preamble.
+- **Only flag real problems.** Skip anything that's fine.
A => review/checklist.md +125 -0
@@ 1,125 @@
+# Pre-Landing Review Checklist
+
+## Instructions
+
+Review the `git diff origin/main` output for the issues listed below. Be specific — cite `file:line` and suggest fixes. Skip anything that's fine. Only flag real problems.
+
+**Two-pass review:**
+- **Pass 1 (CRITICAL):** Run SQL & Data Safety and LLM Output Trust Boundary first. These can block `/ship`.
+- **Pass 2 (INFORMATIONAL):** Run all remaining categories. These are included in the PR body but do not block.
+
+**Output format:**
+
+```
+Pre-Landing Review: N issues (X critical, Y informational)
+
+**CRITICAL** (blocking /ship):
+- [file:line] Problem description
+ Fix: suggested fix
+
+**Issues** (non-blocking):
+- [file:line] Problem description
+ Fix: suggested fix
+```
+
+If no issues found: `Pre-Landing Review: No issues found.`
+
+Be terse. For each issue: one line describing the problem, one line with the fix. No preamble, no summaries, no "looks good overall."
+
+---
+
+## Review Categories
+
+### Pass 1 — CRITICAL
+
+#### SQL & Data Safety
+- String interpolation in SQL (even if values are `.to_i`/`.to_f` — use `sanitize_sql_array` or Arel)
+- TOCTOU races: check-then-set patterns that should be atomic `WHERE` + `update_all`
+- `update_column`/`update_columns` bypassing validations on fields that have or should have constraints
+- N+1 queries: `.includes()` missing for associations used in loops/views (especially avatar, attachments)
+
+#### Race Conditions & Concurrency
+- Read-check-write without uniqueness constraint or `rescue RecordNotUnique; retry` (e.g., `where(hash:).first` then `save!` without handling concurrent insert)
+- `find_or_create_by` on columns without unique DB index — concurrent calls can create duplicates
+- Status transitions that don't use atomic `WHERE old_status = ? UPDATE SET new_status` — concurrent updates can skip or double-apply transitions
+- `html_safe` on user-controlled data (XSS) — check any `.html_safe`, `raw()`, or string interpolation into `html_safe` output
+
+#### LLM Output Trust Boundary
+- LLM-generated values (emails, URLs, names) written to DB or passed to mailers without format validation. Add lightweight guards (`EMAIL_REGEXP`, `URI.parse`, `.strip`) before persisting.
+- Structured tool output (arrays, hashes) accepted without type/shape checks before database writes.
+
+### Pass 2 — INFORMATIONAL
+
+#### Conditional Side Effects
+- Code paths that branch on a condition but forget to apply a side effect on one branch. Example: item promoted to verified but URL only attached when a secondary condition is true — the other branch promotes without the URL, creating an inconsistent record.
+- Log messages that claim an action happened but the action was conditionally skipped. The log should reflect what actually occurred.
+
+#### Magic Numbers & String Coupling
+- Bare numeric literals used in multiple files — should be named constants documented together
+- Error message strings used as query filters elsewhere (grep for the string — is anything matching on it?)
+
+#### Dead Code & Consistency
+- Variables assigned but never read
+- Version mismatch between PR title and VERSION/CHANGELOG files
+- CHANGELOG entries that describe changes inaccurately (e.g., "changed from X to Y" when X never existed)
+- Comments/docstrings that describe old behavior after the code changed
+
+#### LLM Prompt Issues
+- 0-indexed lists in prompts (LLMs reliably return 1-indexed)
+- Prompt text listing available tools/capabilities that don't match what's actually wired up in the `tool_classes`/`tools` array
+- Word/token limits stated in multiple places that could drift
+
+#### Test Gaps
+- Negative-path tests that assert type/status but not the side effects (URL attached? field populated? callback fired?)
+- Assertions on string content without checking format (e.g., asserting title present but not URL format)
+- `.expects(:something).never` missing when a code path should explicitly NOT call an external service
+- Security enforcement features (blocking, rate limiting, auth) without integration tests verifying the enforcement path works end-to-end
+
+#### Crypto & Entropy
+- Truncation of data instead of hashing (last N chars instead of SHA-256) — less entropy, easier collisions
+- `rand()` / `Random.rand` for security-sensitive values — use `SecureRandom` instead
+- Non-constant-time comparisons (`==`) on secrets or tokens — vulnerable to timing attacks
+
+#### Time Window Safety
+- Date-key lookups that assume "today" covers 24h — report at 8am PT only sees midnight→8am under today's key
+- Mismatched time windows between related features — one uses hourly buckets, another uses daily keys for the same data
+
+#### Type Coercion at Boundaries
+- Values crossing Ruby→JSON→JS boundaries where type could change (numeric vs string) — hash/digest inputs must normalize types
+- Hash/digest inputs that don't call `.to_s` or equivalent before serialization — `{ cores: 8 }` vs `{ cores: "8" }` produce different hashes
+
+#### View/Frontend
+- Inline `<style>` blocks in partials (re-parsed every render)
+- O(n*m) lookups in views (`Array#find` in a loop instead of `index_by` hash)
+- Ruby-side `.select{}` filtering on DB results that could be a `WHERE` clause (unless intentionally avoiding leading-wildcard `LIKE`)
+
+---
+
+## Gate Classification
+
+```
+CRITICAL (blocks /ship): INFORMATIONAL (in PR body):
+├─ SQL & Data Safety ├─ Conditional Side Effects
+├─ Race Conditions & Concurrency ├─ Magic Numbers & String Coupling
+└─ LLM Output Trust Boundary ├─ Dead Code & Consistency
+ ├─ LLM Prompt Issues
+ ├─ Test Gaps
+ ├─ Crypto & Entropy
+ ├─ Time Window Safety
+ ├─ Type Coercion at Boundaries
+ └─ View/Frontend
+```
+
+---
+
+## Suppressions — DO NOT flag these
+
+- "X is redundant with Y" when the redundancy is harmless and aids readability (e.g., `present?` redundant with `length > 20`)
+- "Add a comment explaining why this threshold/constant was chosen" — thresholds change during tuning, comments rot
+- "This assertion could be tighter" when the assertion already covers the behavior
+- Suggesting consistency-only changes (wrapping a value in a conditional to match how another constant is guarded)
+- "Regex doesn't handle edge case X" when the input is constrained and X never occurs in practice
+- "Test exercises multiple guards simultaneously" — that's fine, tests don't need to isolate every guard
+- Eval threshold changes (max_actionable, min scores) — these are tuned empirically and change constantly
+- Harmless no-ops (e.g., `.reject` on an element that's never in the array)
+- ANYTHING already addressed in the diff you're reviewing — read the FULL diff before commenting
A => setup +41 -0
@@ 1,41 @@
+#!/usr/bin/env bash
+# gstack setup — build browser binary + register all skills with Claude Code
+set -e
+
+GSTACK_DIR="$(cd "$(dirname "$0")" && pwd)"
+SKILLS_DIR="$(dirname "$GSTACK_DIR")"
+
+# 1. Build browse binary if needed
+if [ ! -x "$GSTACK_DIR/browse/dist/browse" ]; then
+ echo "Building browse binary..."
+ cd "$GSTACK_DIR" && bun install && bun run build
+fi
+
+# 2. Only create skill symlinks if we're inside a .claude/skills directory
+SKILLS_BASENAME="$(basename "$SKILLS_DIR")"
+if [ "$SKILLS_BASENAME" = "skills" ]; then
+ linked=()
+ for skill_dir in "$GSTACK_DIR"/*/; do
+ if [ -f "$skill_dir/SKILL.md" ]; then
+ skill_name="$(basename "$skill_dir")"
+ # Skip node_modules
+ [ "$skill_name" = "node_modules" ] && continue
+ target="$SKILLS_DIR/$skill_name"
+ # Create or update symlink; skip if a real file/directory exists
+ if [ -L "$target" ] || [ ! -e "$target" ]; then
+ ln -snf "gstack/$skill_name" "$target"
+ linked+=("$skill_name")
+ fi
+ fi
+ done
+
+ echo "gstack ready."
+ echo " browse: $GSTACK_DIR/browse/dist/browse"
+ if [ ${#linked[@]} -gt 0 ]; then
+ echo " linked skills: ${linked[*]}"
+ fi
+else
+ echo "gstack ready."
+ echo " browse: $GSTACK_DIR/browse/dist/browse"
+ echo " (skipped skill symlinks — not inside .claude/skills/)"
+fi
A => ship/SKILL.md +300 -0
@@ 1,300 @@
+---
+name: ship
+version: 1.0.0
+description: |
+ Ship workflow: merge main, run tests, review diff, bump VERSION, update CHANGELOG, commit, push, create PR.
+allowed-tools:
+ - Bash
+ - Read
+ - Write
+ - Edit
+ - Grep
+ - Glob
+ - AskUserQuestion
+---
+
+# Ship: Fully Automated Ship Workflow
+
+You are running the `/ship` workflow. This is a **non-interactive, fully automated** workflow. Do NOT ask for confirmation at any step. The user said `/ship` which means DO IT. Run straight through and output the PR URL at the end.
+
+**Only stop for:**
+- On `main` branch (abort)
+- Merge conflicts that can't be auto-resolved (stop, show conflicts)
+- Test failures (stop, show failures)
+- Pre-landing review finds CRITICAL issues and user chooses to fix (not acknowledge or skip)
+- MINOR or MAJOR version bump needed (ask — see Step 4)
+
+**Never stop for:**
+- Uncommitted changes (always include them)
+- Version bump choice (auto-pick MICRO or PATCH — see Step 4)
+- CHANGELOG content (auto-generate from diff)
+- Commit message approval (auto-commit)
+- Multi-file changesets (auto-split into bisectable commits)
+
+---
+
+## Step 1: Pre-flight
+
+1. Check the current branch. If on `main`, **abort**: "You're on main. Ship from a feature branch."
+
+2. Run `git status` (never use `-uall`). Uncommitted changes are always included — no need to ask.
+
+3. Run `git diff main...HEAD --stat` and `git log main..HEAD --oneline` to understand what's being shipped.
+
+---
+
+## Step 2: Merge origin/main (BEFORE tests)
+
+Fetch and merge `origin/main` into the feature branch so tests run against the merged state:
+
+```bash
+git fetch origin main && git merge origin/main --no-edit
+```
+
+**If there are merge conflicts:** Try to auto-resolve if they are simple (VERSION, schema.rb, CHANGELOG ordering). If conflicts are complex or ambiguous, **STOP** and show them.
+
+**If already up to date:** Continue silently.
+
+---
+
+## Step 3: Run tests (on merged code)
+
+**Do NOT run `RAILS_ENV=test bin/rails db:migrate`** — `bin/test-lane` already calls
+`db:test:prepare` internally, which loads the schema into the correct lane database.
+Running bare test migrations without INSTANCE hits an orphan DB and corrupts structure.sql.
+
+Run both test suites in parallel:
+
+```bash
+bin/test-lane 2>&1 | tee /tmp/ship_tests.txt &
+npm run test 2>&1 | tee /tmp/ship_vitest.txt &
+wait
+```
+
+After both complete, read the output files and check pass/fail.
+
+**If any test fails:** Show the failures and **STOP**. Do not proceed.
+
+**If all pass:** Continue silently — just note the counts briefly.
+
+---
+
+## Step 3.25: Eval Suites (conditional)
+
+Evals are mandatory when prompt-related files change. Skip this step entirely if no prompt files are in the diff.
+
+**1. Check if the diff touches prompt-related files:**
+
+```bash
+git diff origin/main --name-only
+```
+
+Match against these patterns (from CLAUDE.md):
+- `app/services/*_prompt_builder.rb`
+- `app/services/*_generation_service.rb`, `*_writer_service.rb`, `*_designer_service.rb`
+- `app/services/*_evaluator.rb`, `*_scorer.rb`, `*_classifier_service.rb`, `*_analyzer.rb`
+- `app/services/concerns/*voice*.rb`, `*writing*.rb`, `*prompt*.rb`, `*token*.rb`
+- `app/services/chat_tools/*.rb`, `app/services/x_thread_tools/*.rb`
+- `config/system_prompts/*.txt`
+- `test/evals/**/*` (eval infrastructure changes affect all suites)
+
+**If no matches:** Print "No prompt-related files changed — skipping evals." and continue to Step 3.5.
+
+**2. Identify affected eval suites:**
+
+Each eval runner (`test/evals/*_eval_runner.rb`) declares `PROMPT_SOURCE_FILES` listing which source files affect it. Grep these to find which suites match the changed files:
+
+```bash
+grep -l "changed_file_basename" test/evals/*_eval_runner.rb
+```
+
+Map runner → test file: `post_generation_eval_runner.rb` → `post_generation_eval_test.rb`.
+
+**Special cases:**
+- Changes to `test/evals/judges/*.rb`, `test/evals/support/*.rb`, or `test/evals/fixtures/` affect ALL suites that use those judges/support files. Check imports in the eval test files to determine which.
+- Changes to `config/system_prompts/*.txt` — grep eval runners for the prompt filename to find affected suites.
+- If unsure which suites are affected, run ALL suites that could plausibly be impacted. Over-testing is better than missing a regression.
+
+**3. Run affected suites at `EVAL_JUDGE_TIER=full`:**
+
+`/ship` is a pre-merge gate, so always use full tier (Sonnet structural + Opus persona judges).
+
+```bash
+EVAL_JUDGE_TIER=full EVAL_VERBOSE=1 bin/test-lane --eval test/evals/<suite>_eval_test.rb 2>&1 | tee /tmp/ship_evals.txt
+```
+
+If multiple suites need to run, run them sequentially (each needs a test lane). If the first suite fails, stop immediately — don't burn API cost on remaining suites.
+
+**4. Check results:**
+
+- **If any eval fails:** Show the failures, the cost dashboard, and **STOP**. Do not proceed.
+- **If all pass:** Note pass counts and cost. Continue to Step 3.5.
+
+**5. Save eval output** — include eval results and cost dashboard in the PR body (Step 8).
+
+**Tier reference (for context — /ship always uses `full`):**
+| Tier | When | Speed (cached) | Cost |
+|------|------|----------------|------|
+| `fast` (Haiku) | Dev iteration, smoke tests | ~5s (14x faster) | ~$0.07/run |
+| `standard` (Sonnet) | Default dev, `bin/test-lane --eval` | ~17s (4x faster) | ~$0.37/run |
+| `full` (Opus persona) | **`/ship` and pre-merge** | ~72s (baseline) | ~$1.27/run |
+
+---
+
+## Step 3.5: Pre-Landing Review
+
+Review the diff for structural issues that tests don't catch.
+
+1. Read `.claude/skills/review/checklist.md`. If the file cannot be read, **STOP** and report the error.
+
+2. Run `git diff origin/main` to get the full diff (scoped to feature changes against the freshly-fetched remote main).
+
+3. Apply the review checklist in two passes:
+ - **Pass 1 (CRITICAL):** SQL & Data Safety, LLM Output Trust Boundary
+ - **Pass 2 (INFORMATIONAL):** All remaining categories
+
+4. **Always output ALL findings** — both critical and informational. The user must see every issue found.
+
+5. Output a summary header: `Pre-Landing Review: N issues (X critical, Y informational)`
+
+6. **If CRITICAL issues found:** For EACH critical issue, use a separate AskUserQuestion with:
+ - The problem (`file:line` + description)
+ - Your recommended fix
+ - Options: A) Fix it now (recommend), B) Acknowledge and ship anyway, C) It's a false positive — skip
+ After resolving all critical issues: if the user chose A (fix) on any issue, apply the recommended fixes, then commit only the fixed files by name (`git add <fixed-files> && git commit -m "fix: apply pre-landing review fixes"`), then **STOP** and tell the user to run `/ship` again to re-test with the fixes applied. If the user chose only B (acknowledge) or C (false positive) on all issues, continue with Step 4.
+
+7. **If only non-critical issues found:** Output them and continue. They will be included in the PR body at Step 8.
+
+8. **If no issues found:** Output `Pre-Landing Review: No issues found.` and continue.
+
+Save the review output — it goes into the PR body in Step 8.
+
+---
+
+## Step 4: Version bump (auto-decide)
+
+1. Read the current `VERSION` file (4-digit format: `MAJOR.MINOR.PATCH.MICRO`)
+
+2. **Auto-decide the bump level based on the diff:**
+ - Count lines changed (`git diff origin/main...HEAD --stat | tail -1`)
+ - **MICRO** (4th digit): < 50 lines changed, trivial tweaks, typos, config
+ - **PATCH** (3rd digit): 50+ lines changed, bug fixes, small-medium features
+ - **MINOR** (2nd digit): **ASK the user** — only for major features or significant architectural changes
+ - **MAJOR** (1st digit): **ASK the user** — only for milestones or breaking changes
+
+3. Compute the new version:
+ - Bumping a digit resets all digits to its right to 0
+ - Example: `0.19.1.0` + PATCH → `0.19.2.0`
+
+4. Write the new version to the `VERSION` file.
+
+---
+
+## Step 5: CHANGELOG (auto-generate)
+
+1. Read `CHANGELOG.md` header to know the format.
+
+2. Auto-generate the entry from **ALL commits on the branch** (not just recent ones):
+ - Use `git log main..HEAD --oneline` to see every commit being shipped
+ - Use `git diff main...HEAD` to see the full diff against main
+ - The CHANGELOG entry must be comprehensive of ALL changes going into the PR
+ - If existing CHANGELOG entries on the branch already cover some commits, replace them with one unified entry for the new version
+ - Categorize changes into applicable sections:
+ - `### Added` — new features
+ - `### Changed` — changes to existing functionality
+ - `### Fixed` — bug fixes
+ - `### Removed` — removed features
+ - Write concise, descriptive bullet points
+ - Insert after the file header (line 5), dated today
+ - Format: `## [X.Y.Z.W] - YYYY-MM-DD`
+
+**Do NOT ask the user to describe changes.** Infer from the diff and commit history.
+
+---
+
+## Step 6: Commit (bisectable chunks)
+
+**Goal:** Create small, logical commits that work well with `git bisect` and help LLMs understand what changed.
+
+1. Analyze the diff and group changes into logical commits. Each commit should represent **one coherent change** — not one file, but one logical unit.
+
+2. **Commit ordering** (earlier commits first):
+ - **Infrastructure:** migrations, config changes, route additions
+ - **Models & services:** new models, services, concerns (with their tests)
+ - **Controllers & views:** controllers, views, JS/React components (with their tests)
+ - **VERSION + CHANGELOG:** always in the final commit
+
+3. **Rules for splitting:**
+ - A model and its test file go in the same commit
+ - A service and its test file go in the same commit
+ - A controller, its views, and its test go in the same commit
+ - Migrations are their own commit (or grouped with the model they support)
+ - Config/route changes can group with the feature they enable
+ - If the total diff is small (< 50 lines across < 4 files), a single commit is fine
+
+4. **Each commit must be independently valid** — no broken imports, no references to code that doesn't exist yet. Order commits so dependencies come first.
+
+5. Compose each commit message:
+ - First line: `<type>: <summary>` (type = feat/fix/chore/refactor/docs)
+ - Body: brief description of what this commit contains
+ - Only the **final commit** (VERSION + CHANGELOG) gets the version tag and co-author trailer:
+
+```bash
+git commit -m "$(cat <<'EOF'
+chore: bump version and changelog (vX.Y.Z.W)
+
+Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
+EOF
+)"
+```
+
+---
+
+## Step 7: Push
+
+Push to the remote with upstream tracking:
+
+```bash
+git push -u origin <branch-name>
+```
+
+---
+
+## Step 8: Create PR
+
+Create a pull request using `gh`:
+
+```bash
+gh pr create --title "<type>: <summary>" --body "$(cat <<'EOF'
+## Summary
+<bullet points from CHANGELOG>
+
+## Pre-Landing Review
+<findings from Step 3.5, or "No issues found.">
+
+## Eval Results
+<If evals ran: suite names, pass/fail counts, cost dashboard summary. If skipped: "No prompt-related files changed — evals skipped.">
+
+## Test plan
+- [x] All Rails tests pass (N runs, 0 failures)
+- [x] All Vitest tests pass (N tests)
+
+🤖 Generated with [Claude Code](https://claude.com/claude-code)
+EOF
+)"
+```
+
+**Output the PR URL** — this should be the final output the user sees.
+
+---
+
+## Important Rules
+
+- **Never skip tests.** If tests fail, stop.
+- **Never skip the pre-landing review.** If checklist.md is unreadable, stop.
+- **Never force push.** Use regular `git push` only.
+- **Never ask for confirmation** except for MINOR/MAJOR version bumps and CRITICAL review findings (one AskUserQuestion per critical issue with fix recommendation).
+- **Always use the 4-digit version format** from the VERSION file.
+- **Date format in CHANGELOG:** `YYYY-MM-DD`
+- **Split commits for bisectability** — each commit = one logical change.
+- **The goal is: user says `/ship`, next thing they see is the review + PR URL.**