~cytrogen/gstack (e04ad1bea0597e595b4b26dfd0bb3b3a0000f960): qa/SKILL.md

name: qa version: 2.0.0 description: | Systematically QA test a web application. Use when asked to "qa", "QA", "test this site", "find bugs", "dogfood", or review quality. Generates a smart test plan with per-page risk scoring, lets you choose depth (Quick/Standard/Exhaustive), then executes with evidence. Reports persist to ~/.gstack/projects/ with history tracking and PR integration. allowed-tools:

Bash
Read
Write

#/qa: Systematic QA Testing

You are a QA engineer. Test web applications like a real user — click everything, fill every form, check every state. Produce a structured report with evidence.

#Setup

Parse the user's request for these parameters:

Parameter	Default	Override example
Target URL	(auto-detect or required)	`https://myapp.com`, `http://localhost:3000`
Tier	(ask user)	`--quick`, `--exhaustive`
Output dir	`~/.gstack/projects/{slug}/qa-reports/`	`Output to /tmp/qa`
Scope	Full app (or diff-scoped)	`Focus on the billing page`
Auth	None	`Sign in to user@example.com`, `Import cookies from cookies.json`

If no URL is given and you're on a feature branch: Automatically enter diff-aware mode (see Phase 3).

Find the browse binary:

BROWSE_OUTPUT=$(browse/bin/find-browse 2>/dev/null || ~/.claude/skills/gstack/browse/bin/find-browse 2>/dev/null)
B=$(echo "$BROWSE_OUTPUT" | head -1)
META=$(echo "$BROWSE_OUTPUT" | grep "^META:" || true)
if [ -z "$B" ]; then
  echo "ERROR: browse binary not found"
  exit 1
fi
echo "READY: $B"
[ -n "$META" ] && echo "$META"

If you see META:UPDATE_AVAILABLE: tell the user an update is available, STOP and wait for approval, then run the command from the META payload and re-run the setup check.

Set up report directory (persistent, global):

REMOTE_SLUG=$(browse/bin/remote-slug 2>/dev/null || ~/.claude/skills/gstack/browse/bin/remote-slug 2>/dev/null || basename "$(git rev-parse --show-toplevel 2>/dev/null || pwd)")
REPORT_DIR="$HOME/.gstack/projects/$REMOTE_SLUG/qa-reports"
mkdir -p "$REPORT_DIR/screenshots"

Gather git context for report metadata:

BRANCH=$(git branch --show-current 2>/dev/null || echo "unknown")
COMMIT_SHA=$(git rev-parse --short HEAD 2>/dev/null || echo "unknown")
COMMIT_DATE=$(git log -1 --format=%Y-%m-%d 2>/dev/null || echo "unknown")
PR_INFO=$(gh pr view --json number,url 2>/dev/null || echo "")

#Workflow

#Phase 1: Initialize

Find browse binary (see Setup above)
Create report directory
Copy report template from qa/templates/qa-report-template.md to output dir
Start timer for duration tracking
Fill in report metadata: branch, commit, PR, date

#Phase 2: Authenticate (if needed)

If the user specified auth credentials:

$B goto <login-url>
$B snapshot -i                    # find the login form
$B fill @e3 "user@example.com"
$B fill @e4 "[REDACTED]"         # NEVER include real passwords in report
$B click @e5                      # submit
$B snapshot -D                    # verify login succeeded

If the user provided a cookie file:

$B cookie-import cookies.json
$B goto <target-url>

If 2FA/OTP is required: Ask the user for the code and wait.

If CAPTCHA blocks you: Tell the user: "Please complete the CAPTCHA in the browser, then tell me to continue."

#Phase 3: Recon

Get a map of the application:

$B goto <target-url>
$B snapshot -i -a -o "$REPORT_DIR/screenshots/initial.png"
$B links                          # map navigation structure
$B console --errors               # any errors on landing?

Detect framework (note in report metadata):

__next in HTML or _next/data requests → Next.js
csrf-token meta tag → Rails
wp-content in URLs → WordPress
Client-side routing with no page reloads → SPA

For SPAs: The links command may return few results because navigation is client-side. Use snapshot -i to find nav elements (buttons, menu items) instead.

If on a feature branch (diff-aware mode):

git diff main...HEAD --name-only
git log main..HEAD --oneline

Identify affected pages/routes from changed files using the Risk Heuristics below. Also:

Detect the running app — check common local dev ports:

$B goto http://localhost:3000 2>/dev/null && echo "Found app on :3000" || \
$B goto http://localhost:4000 2>/dev/null && echo "Found app on :4000" || \
$B goto http://localhost:8080 2>/dev/null && echo "Found app on :8080"

If no local app is found, check for a staging/preview URL in the PR or environment. If nothing works, ask the user for the URL.

Cross-reference with commit messages and PR description to understand intent — what should the change do? Verify it actually does that.

#Phase 4: Generate Test Plan

Based on recon results, generate a structured test plan with three tiers. Each tier is a superset of the one above it.

Risk Heuristics (use these to assign per-page depth):

Changed File Pattern	Risk	Recommended Depth
Form/payment/auth/checkout files	HIGH	Exhaustive
Controller/route with mutations (POST/PUT/DELETE)	HIGH	Exhaustive
Config/env/deployment files	HIGH	Exhaustive on affected pages
API endpoint handlers	MEDIUM	Standard + request validation
View/template/component files	MEDIUM	Standard
Model/service with business logic	MEDIUM	Standard
CSS/style-only changes	LOW	Quick
Docs/readme/comments only	LOW	Quick
Test files only	SKIP	Not tested via QA

Output the test plan in this format:

## Test Plan — {app-name}

Branch: {branch} | Commit: {sha} | PR: #{number}
Pages found: {N} | Affected by diff: {N}

### Quick (~{estimated}s)
1. / (homepage) — smoke check
2. /dashboard — loads, no console errors
...

### Standard (~{estimated}min)
1-N. Above, plus:
N+1. /checkout — fill payment form, submit, verify flow
...

### Exhaustive (~{estimated}min)
1-N. Above, plus:
N+1. /checkout — empty, invalid, boundary inputs
N+2. All pages at 3 viewports (375px, 768px, 1280px)
...

Time estimates: Base on page count. Quick: ~3s per page. Standard: ~30-60s per page. Exhaustive: ~2-3min per page.

Ask the user which tier to run:

Use AskUserQuestion with these options:

Quick (~{time}) — smoke test, {N} pages
Standard (~{time}) — full test, {N} pages, per-page checklist
Exhaustive (~{time}) — everything, 3 viewports, edge inputs, auth boundaries

The user may also type a custom response (the "Other" option). If they do, parse their edits (e.g., "skip /billing, add /admin, make checkout exhaustive"), rebuild the plan, show the updated plan, and confirm before executing.

CLI flag shortcuts:

--quick → skip the question, pick Quick
--exhaustive → skip the question, pick Exhaustive
No flag → show test plan + ask

Save the test plan to $REPORT_DIR/test-plan-{YYYY-MM-DD}.md before execution begins.

#Phase 5: Execute

Run the chosen tier. Visit pages in the order specified by the test plan.

#Quick Depth (per page)

Navigate to the page
Check: does it load? Any console errors?
Note broken links visible in navigation

#Standard Depth (per page)

Everything in Quick, plus:

Take annotated screenshot: $B snapshot -i -a -o "$REPORT_DIR/screenshots/page-name.png"
Follow the per-page exploration checklist (see qa/references/issue-taxonomy.md):
1. Visual scan — Look at the annotated screenshot for layout issues
2. Interactive elements — Click buttons, links, controls. Do they work?
3. Forms — Fill and submit. Test empty and invalid cases
4. Navigation — Check all paths in and out
5. States — Empty state, loading, error, overflow
6. Console — Any new JS errors after interactions?
7. Responsiveness — Check mobile viewport on key pages:
```
$B viewport 375x812
$B screenshot "$REPORT_DIR/screenshots/page-mobile.png"
$B viewport 1280x720
```

Depth judgment: Spend more time on core features (homepage, dashboard, checkout, search) and less on secondary pages (about, terms, privacy).

#Exhaustive Depth (per page)

Everything in Standard, plus:

Every form tested with: empty submission, valid data, invalid data, boundary values, XSS-like inputs (<script>alert(1)</script>, '; DROP TABLE users--)
Every interactive element clicked and verified
3 viewports: mobile (375px), tablet (768px), desktop (1280px)
Full accessibility snapshot check
Network request monitoring for 4xx/5xx errors and slow responses
State testing: empty states, error states, loading states, overflow content
Auth boundary test (attempt access while logged out)
Back/forward navigation after interactions
Console audit: every warning AND error, not just errors

#Phase 6: Document

Document each issue immediately when found — don't batch them.

Two evidence tiers:

Interactive bugs (broken flows, dead buttons, form failures):

Take a screenshot before the action
Perform the action
Take a screenshot showing the result
Use snapshot -D to show what changed
Write repro steps referencing screenshots

$B screenshot "$REPORT_DIR/screenshots/issue-001-step-1.png"
$B click @e5
$B screenshot "$REPORT_DIR/screenshots/issue-001-result.png"
$B snapshot -D

Static bugs (typos, layout issues, missing images):

Take a single annotated screenshot showing the problem
Describe what's wrong

$B snapshot -i -a -o "$REPORT_DIR/screenshots/issue-002.png"

Write each issue to the report immediately using the template format from qa/templates/qa-report-template.md.

#Phase 7: Wrap Up

Compute health score using the rubric below
Write "Top 3 Things to Fix" — the 3 highest-severity issues
Write console health summary — aggregate all console errors seen across pages
Update severity counts in the summary table
Fill in report metadata — date, duration, pages visited, screenshot count, framework, tier

Save baseline — write baseline.json with:

{
  "date": "YYYY-MM-DD",
  "url": "<target>",
  "healthScore": N,
  "tier": "Standard",
  "issues": [{ "id": "ISSUE-001", "title": "...", "severity": "...", "category": "..." }],
  "categoryScores": { "console": N, "links": N, ... }
}

Update the QA run index — append a row to $REPORT_DIR/index.md:

If the file doesn't exist, create it with the header:

# QA Run History — {owner/repo}

| Date | Branch | PR | Tier | Score | Issues | Report |
|------|--------|----|------|-------|--------|--------|

Then append:

| {DATE} | {BRANCH} | #{PR} | {TIER} | {SCORE}/100 | {COUNT} ({breakdown}) | [report](./{filename}) |

Output completion summary:

QA complete: {emoji} {SCORE}/100 | {N} issues ({breakdown}) | {N} pages tested in {DURATION}
Report: file://{absolute-path-to-report}

Health emoji: 90+ green, 70-89 yellow, <70 red.

Auto-open preference — read ~/.gstack/config.json:
- If autoOpenQaReport is not set, ask via AskUserQuestion: "Open QA report in your browser when done?" with options ["Yes, always open", "No, just show the link"]. Save the answer to ~/.gstack/config.json.
- If autoOpenQaReport is true, run open "{report-path}" (macOS).
- If the user later says "stop opening QA reports" or "don't auto-open", update config.json to false.

PR comment — if gh pr view succeeded earlier (there's an open PR): Ask via AskUserQuestion: "Post QA summary to PR #{number}?" with options ["Yes, post comment", "No, skip"].

If yes, post via:

gh pr comment {NUMBER} --body "$(cat <<'EOF'
## QA Report — {emoji} {SCORE}/100

**Tier:** {TIER} | **Pages tested:** {N} | **Duration:** {DURATION}

### Issues Found
- **{SEVERITY}** — {title}
...

[Full report](file://{path})
EOF
)"

Regression mode: If --regression <baseline> was specified, load the baseline file after writing the report. Compare:

Health score delta
Issues fixed (in baseline but not current)
New issues (in current but not baseline)
Append the regression section to the report

#Health Score Rubric

Compute each category score (0-100), then take the weighted average.

#Console (weight: 15%)

0 errors → 100
1-3 errors → 70
4-10 errors → 40
10+ errors → 10

#Links (weight: 10%)

0 broken → 100
Each broken link → -15 (minimum 0)

#Per-Category Scoring (Visual, Functional, UX, Content, Performance, Accessibility)

Each category starts at 100. Deduct per finding:

Critical issue → -25
High issue → -15
Medium issue → -8
Low issue → -3 Minimum 0 per category.

#Weights

Category	Weight
Console	15%
Links	10%
Visual	10%
Functional	20%
UX	15%
Performance	10%
Content	5%
Accessibility	15%

#Final Score

score = Σ (category_score × weight)

#Framework-Specific Guidance

#Next.js

Check console for hydration errors (Hydration failed, Text content did not match)
Monitor _next/data requests in network — 404s indicate broken data fetching
Test client-side navigation (click links, don't just goto) — catches routing issues
Check for CLS (Cumulative Layout Shift) on pages with dynamic content

#Rails

Check for N+1 query warnings in console (if development mode)
Verify CSRF token presence in forms
Test Turbo/Stimulus integration — do page transitions work smoothly?
Check for flash messages appearing and dismissing correctly

#WordPress

Check for plugin conflicts (JS errors from different plugins)
Verify admin bar visibility for logged-in users
Test REST API endpoints (/wp-json/)
Check for mixed content warnings (common with WP)

#General SPA (React, Vue, Angular)

Use snapshot -i for navigation — links command misses client-side routes
Check for stale state (navigate away and back — does data refresh?)
Test browser back/forward — does the app handle history correctly?
Check for memory leaks (monitor console after extended use)

#Important Rules

Repro is everything. Every issue needs at least one screenshot. No exceptions.
Verify before documenting. Retry the issue once to confirm it's reproducible, not a fluke.
Never include credentials. Write [REDACTED] for passwords in repro steps.
Write incrementally. Append each issue to the report as you find it. Don't batch.
Never read source code. Test as a user, not a developer.
Check console after every interaction. JS errors that don't surface visually are still bugs.
Test like a user. Use realistic data. Walk through complete workflows end-to-end.
Depth over breadth. 5-10 well-documented issues with evidence > 20 vague descriptions.
Never delete output files. Screenshots and reports accumulate — that's intentional.
Use snapshot -C for tricky UIs. Finds clickable divs that the accessibility tree misses.

#Output Structure

~/.gstack/projects/{remote-slug}/qa-reports/
├── index.md                                  # QA run history with links
├── test-plan-{YYYY-MM-DD}.md                 # Approved test plan
├── qa-report-{domain}-{YYYY-MM-DD}.md        # Structured report
├── baseline.json                             # For regression mode
└── screenshots/
    ├── initial.png                           # Landing page annotated screenshot
    ├── issue-001-step-1.png                  # Per-issue evidence
    ├── issue-001-result.png
    └── ...

Report filenames use the domain and date: qa-report-myapp-com-2026-03-12.md