~cytrogen/gstack

3501f5dd0388c8c065ade8364c3b7c909be035a6 — Garry Tan a month ago dc5e053
fix: Windows browse — health-check-first ensureServer, detached startServer, Windows process mgmt (v0.11.11.0) (#431)

* fix: Windows browse — health-check-first ensureServer, detached startServer, Windows process mgmt

Three compounding bugs made browse completely broken on Windows:

1. Bun.spawn().unref() doesn't truly detach on Windows — server dies when
   CLI exits. Fix: use Node's child_process.spawn with { detached: true }
   via a launcher script. Credit: PR #191 by @fqueiro for the approach.

2. process.kill(pid, 0) is broken in Bun's compiled binary on Windows —
   ensureServer() never reaches the health check. Fix: restructure to
   health-check-first (HTTP is definitive proof on all platforms). Extract
   isServerHealthy() helper for DRY (4 call sites).

3. Windows process management: isProcessAlive() falls back to tasklist,
   killServer() uses taskkill /T /F (kills process tree including Chromium),
   cleanupLegacyState() skips on Windows (no /tmp, no ps).

Also: hard-fail on Windows if server-node.mjs is missing instead of
silently falling back to the known-broken Bun path.

Fixes #342.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: disable Chromium sandbox on Windows

Chromium's sandbox fails when the server is spawned through the
Bun→Node process chain on Windows (GitHub #276). Disable
chromiumSandbox on Windows at both launch sites (headless + headed).

Safe: local daemon browsing user-specified URLs, Playwright docs
recommend disabling in CI/container environments.

Fixes #276.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: startup error log + Windows exit handler for browse server

On Windows, the CLI can't capture stderr from the server (stdio: 'ignore'
required for process detachment). Write startup errors to
.gstack/browse-startup-error.log so the CLI can report them on timeout.

Also add process.on('exit') handler on Windows as defense-in-depth for
state file cleanup (primary mechanism is CLI's stale-state detection).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* test: add isServerHealthy + startup error log tests

Tests for the new cross-platform health check helper (isServerHealthy)
that replaces PID-based liveness checks in all polling loops. Covers
healthy, unhealthy, unreachable, and error response cases.

Also tests the startup error log write/read format used on Windows.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.11.11.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: sync ARCHITECTURE.md with health-check-first ensureServer

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
M ARCHITECTURE.md => ARCHITECTURE.md +1 -1
@@ 69,7 69,7 @@ The server writes `.gstack/browse.json` (atomic write via tmp + rename, mode 0o6
{ "pid": 12345, "port": 34567, "token": "uuid-v4", "startedAt": "...", "binaryVersion": "abc123" }
```

The CLI reads this file to find the server. If the file is missing, stale, or the PID is dead, the CLI spawns a new server.
The CLI reads this file to find the server. If the file is missing or the server fails an HTTP health check, the CLI spawns a new server. On Windows, PID-based process detection is unreliable in Bun binaries, so the health check (GET /health) is the primary liveness signal on all platforms.

### Port selection


M CHANGELOG.md => CHANGELOG.md +13 -0
@@ 1,5 1,18 @@
# Changelog

## [0.11.14.0] - 2026-03-24 — Windows Browse Fix

### Fixed

- **Browse engine now works on Windows.** Three compounding bugs blocked all Windows `/browse` users: the server process died when the CLI exited (Bun's `unref()` doesn't truly detach on Windows), the health check never ran because `process.kill(pid, 0)` is broken in Bun binaries on Windows, and Chromium's sandbox failed when spawned through the Bun→Node process chain. All three are now fixed. Credits to @fqueiro (PR #191) for identifying the `detached: true` approach.
- **Health check runs first on all platforms.** `ensureServer()` now tries an HTTP health check before falling back to PID-based detection — more reliable on every OS, not just Windows.
- **Startup errors are logged to disk.** When the server fails to start, errors are written to `~/.gstack/browse-startup-error.log` so Windows users (who lose stderr due to process detachment) can debug.
- **Chromium sandbox disabled on Windows.** Chromium's sandbox requires elevated privileges when spawned through the Bun→Node chain — now disabled on Windows only.

### For contributors

- New tests for `isServerHealthy()` and startup error logging in `browse/test/config.test.ts`

## [0.11.13.0] - 2026-03-24 — Worktree Isolation + Infrastructure Elegance

### Added

M VERSION => VERSION +1 -1
@@ 1,1 1,1 @@
0.11.13.0
0.11.14.0

M browse/src/browser-manager.ts => browse/src/browser-manager.ts +9 -1
@@ 89,6 89,10 @@ export class BrowserManager {

    this.browser = await chromium.launch({
      headless: useHeadless,
      // On Windows, Chromium's sandbox fails when the server is spawned through
      // the Bun→Node process chain (GitHub #276). Disable it — local daemon
      // browsing user-specified URLs has marginal sandbox benefit.
      chromiumSandbox: process.platform !== 'win32',
      ...(launchArgs.length > 0 ? { args: launchArgs } : {}),
    });



@@ 492,7 496,11 @@ export class BrowserManager {
    // 2. Launch new headed browser (try-catch — if this fails, headless stays running)
    let newBrowser: Browser;
    try {
      newBrowser = await chromium.launch({ headless: false, timeout: 15000 });
      newBrowser = await chromium.launch({
        headless: false,
        timeout: 15000,
        chromiumSandbox: process.platform !== 'win32',
      });
    } catch (err: unknown) {
      const msg = err instanceof Error ? err.message : String(err);
      return `ERROR: Cannot open headed browser — ${msg}. Headless browser still running.`;

M browse/src/cli.ts => browse/src/cli.ts +107 -41
@@ 76,6 76,13 @@ export function resolveNodeServerScript(

const NODE_SERVER_SCRIPT = IS_WINDOWS ? resolveNodeServerScript() : null;

// On Windows, hard-fail if server-node.mjs is missing — the Bun path is known broken.
if (IS_WINDOWS && !NODE_SERVER_SCRIPT) {
  throw new Error(
    'server-node.mjs not found. Run `bun run build` to generate the Windows server bundle.'
  );
}

interface ServerState {
  pid: number;
  port: number;


@@ 96,6 103,19 @@ function readState(): ServerState | null {
}

function isProcessAlive(pid: number): boolean {
  if (IS_WINDOWS) {
    // Bun's compiled binary can't signal Windows PIDs (always throws ESRCH).
    // Use tasklist as a fallback. Only for one-shot calls — too slow for polling loops.
    try {
      const result = Bun.spawnSync(
        ['tasklist', '/FI', `PID eq ${pid}`, '/NH', '/FO', 'CSV'],
        { stdout: 'pipe', stderr: 'pipe', timeout: 3000 }
      );
      return result.stdout.toString().includes(`"${pid}"`);
    } catch {
      return false;
    }
  }
  try {
    process.kill(pid, 0);
    return true;


@@ 104,10 124,42 @@ function isProcessAlive(pid: number): boolean {
  }
}

/**
 * HTTP health check — definitive proof the server is alive and responsive.
 * Used in all polling loops instead of isProcessAlive() (which is slow on Windows).
 */
export async function isServerHealthy(port: number): Promise<boolean> {
  try {
    const resp = await fetch(`http://127.0.0.1:${port}/health`, {
      signal: AbortSignal.timeout(2000),
    });
    if (!resp.ok) return false;
    const health = await resp.json() as any;
    return health.status === 'healthy';
  } catch {
    return false;
  }
}

// ─── Process Management ─────────────────────────────────────────
async function killServer(pid: number): Promise<void> {
  if (!isProcessAlive(pid)) return;

  if (IS_WINDOWS) {
    // taskkill /T /F kills the process tree (Node + Chromium)
    try {
      Bun.spawnSync(
        ['taskkill', '/PID', String(pid), '/T', '/F'],
        { stdout: 'pipe', stderr: 'pipe', timeout: 5000 }
      );
    } catch {}
    const deadline = Date.now() + 2000;
    while (Date.now() < deadline && isProcessAlive(pid)) {
      await Bun.sleep(100);
    }
    return;
  }

  try { process.kill(pid, 'SIGTERM'); } catch { return; }

  // Wait up to 2s for graceful shutdown


@@ 127,6 179,10 @@ async function killServer(pid: number): Promise<void> {
 * Verifies PID ownership before sending signals.
 */
function cleanupLegacyState(): void {
  // No legacy state on Windows — /tmp and `ps` don't exist, and gstack
  // never ran on Windows before the Node.js fallback was added.
  if (IS_WINDOWS) return;

  try {
    const files = fs.readdirSync('/tmp').filter(f => f.startsWith('browse-server') && f.endsWith('.json'));
    for (const file of files) {


@@ 164,44 220,65 @@ function cleanupLegacyState(): void {
async function startServer(): Promise<ServerState> {
  ensureStateDir(config);

  // Clean up stale state file
  // Clean up stale state file and error log
  try { fs.unlinkSync(config.stateFile); } catch {}
  try { fs.unlinkSync(path.join(config.stateDir, 'browse-startup-error.log')); } catch {}

  let proc: any = null;

  if (IS_WINDOWS && NODE_SERVER_SCRIPT) {
    // Windows: Bun.spawn() + proc.unref() doesn't truly detach on Windows —
    // when the CLI exits, the server dies with it. Use Node's child_process.spawn
    // with { detached: true } instead, which is the gold standard for Windows
    // process independence. Credit: PR #191 by @fqueiro.
    const launcherCode =
      `const{spawn}=require('child_process');` +
      `spawn(process.execPath,[${JSON.stringify(NODE_SERVER_SCRIPT)}],` +
      `{detached:true,stdio:'ignore',env:Object.assign({},process.env,` +
      `{BROWSE_STATE_FILE:${JSON.stringify(config.stateFile)}})}).unref()`;
    Bun.spawnSync(['node', '-e', launcherCode], { stdio: 'ignore' });
  } else {
    // macOS/Linux: Bun.spawn + unref works correctly
    proc = Bun.spawn(['bun', 'run', SERVER_SCRIPT], {
      stdio: ['ignore', 'pipe', 'pipe'],
      env: { ...process.env, BROWSE_STATE_FILE: config.stateFile },
    });
    proc.unref();
  }

  // Start server as detached background process.
  // On Windows, Bun can't launch/connect to Playwright's Chromium (oven-sh/bun#4253, #9911).
  // Fall back to running the server under Node.js with Bun API polyfills.
  const useNode = IS_WINDOWS && NODE_SERVER_SCRIPT;
  const serverCmd = useNode
    ? ['node', NODE_SERVER_SCRIPT]
    : ['bun', 'run', SERVER_SCRIPT];
  const proc = Bun.spawn(serverCmd, {
    stdio: ['ignore', 'pipe', 'pipe'],
    env: { ...process.env, BROWSE_STATE_FILE: config.stateFile },
  });

  // Don't hold the CLI open
  proc.unref();

  // Wait for state file to appear
  // Wait for server to become healthy.
  // Use HTTP health check (not isProcessAlive) — it's fast (~instant ECONNREFUSED)
  // and works reliably on all platforms including Windows.
  const start = Date.now();
  while (Date.now() - start < MAX_START_WAIT) {
    const state = readState();
    if (state && isProcessAlive(state.pid)) {
    if (state && await isServerHealthy(state.port)) {
      return state;
    }
    await Bun.sleep(100);
  }

  // If we get here, server didn't start in time
  // Try to read stderr for error message
  const stderr = proc.stderr;
  if (stderr) {
    const reader = stderr.getReader();
  // Server didn't start in time — try to get error details
  if (proc?.stderr) {
    // macOS/Linux: read stderr from the spawned process
    const reader = proc.stderr.getReader();
    const { value } = await reader.read();
    if (value) {
      const errText = new TextDecoder().decode(value);
      throw new Error(`Server failed to start:\n${errText}`);
    }
  } else {
    // Windows: check startup error log (server writes errors to disk since
    // stderr is unavailable due to stdio: 'ignore' for detachment)
    const errorLogPath = path.join(config.stateDir, 'browse-startup-error.log');
    try {
      const errorLog = fs.readFileSync(errorLogPath, 'utf-8').trim();
      if (errorLog) {
        throw new Error(`Server failed to start:\n${errorLog}`);
      }
    } catch (e: any) {
      if (e.code !== 'ENOENT') throw e;
    }
  }
  throw new Error(`Server failed to start within ${MAX_START_WAIT / 1000}s`);
}


@@ 237,7 314,10 @@ function acquireServerLock(): (() => void) | null {
async function ensureServer(): Promise<ServerState> {
  const state = readState();

  if (state && isProcessAlive(state.pid)) {
  // Health-check-first: HTTP is definitive proof the server is alive and responsive.
  // This replaces the PID-gated approach which breaks on Windows (Bun's process.kill
  // always throws ESRCH for Windows PIDs in compiled binaries).
  if (state && await isServerHealthy(state.port)) {
    // Check for binary version mismatch (auto-restart on update)
    const currentVersion = readVersionHash();
    if (currentVersion && state.binaryVersion && currentVersion !== state.binaryVersion) {


@@ 245,21 325,7 @@ async function ensureServer(): Promise<ServerState> {
      await killServer(state.pid);
      return startServer();
    }

    // Server appears alive — do a health check
    try {
      const resp = await fetch(`http://127.0.0.1:${state.port}/health`, {
        signal: AbortSignal.timeout(2000),
      });
      if (resp.ok) {
        const health = await resp.json() as any;
        if (health.status === 'healthy') {
          return state;
        }
      }
    } catch {
      // Health check failed — server is dead or unhealthy
    }
    return state;
  }

  // Ensure state directory exists before lock acquisition (lock file lives there)


@@ 273,7 339,7 @@ async function ensureServer(): Promise<ServerState> {
    const start = Date.now();
    while (Date.now() - start < MAX_START_WAIT) {
      const freshState = readState();
      if (freshState && isProcessAlive(freshState.pid)) return freshState;
      if (freshState && await isServerHealthy(freshState.port)) return freshState;
      await Bun.sleep(200);
    }
    throw new Error('Timed out waiting for another instance to start the server');


@@ 282,7 348,7 @@ async function ensureServer(): Promise<ServerState> {
  try {
    // Re-read state under lock in case another process just started the server
    const freshState = readState();
    if (freshState && isProcessAlive(freshState.pid)) {
    if (freshState && await isServerHealthy(freshState.port)) {
      return freshState;
    }


M browse/src/server.ts => browse/src/server.ts +16 -0
@@ 286,6 286,13 @@ async function shutdown() {
// Handle signals
process.on('SIGTERM', shutdown);
process.on('SIGINT', shutdown);
// Windows: taskkill /F bypasses SIGTERM, but 'exit' fires for some shutdown paths.
// Defense-in-depth — primary cleanup is the CLI's stale-state detection via health check.
if (process.platform === 'win32') {
  process.on('exit', () => {
    try { fs.unlinkSync(config.stateFile); } catch {}
  });
}

// ─── Start ─────────────────────────────────────────────────────
async function start() {


@@ 365,5 372,14 @@ async function start() {

start().catch((err) => {
  console.error(`[browse] Failed to start: ${err.message}`);
  // Write error to disk for the CLI to read — on Windows, the CLI can't capture
  // stderr because the server is launched with detached: true, stdio: 'ignore'.
  try {
    const errorLogPath = path.join(config.stateDir, 'browse-startup-error.log');
    fs.mkdirSync(config.stateDir, { recursive: true });
    fs.writeFileSync(errorLogPath, `${new Date().toISOString()} ${err.message}\n${err.stack || ''}\n`);
  } catch {
    // stateDir may not exist — nothing more we can do
  }
  process.exit(1);
});

M browse/test/config.test.ts => browse/test/config.test.ts +66 -0
@@ 248,3 248,69 @@ describe('version mismatch detection', () => {
    expect(shouldRestart).toBe(false);
  });
});

describe('isServerHealthy', () => {
  const { isServerHealthy } = require('../src/cli');
  const http = require('http');

  test('returns true for a healthy server', async () => {
    const server = http.createServer((_req: any, res: any) => {
      res.writeHead(200, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify({ status: 'healthy' }));
    });
    await new Promise<void>(resolve => server.listen(0, resolve));
    const port = server.address().port;
    try {
      expect(await isServerHealthy(port)).toBe(true);
    } finally {
      server.close();
    }
  });

  test('returns false for an unhealthy server', async () => {
    const server = http.createServer((_req: any, res: any) => {
      res.writeHead(200, { 'Content-Type': 'application/json' });
      res.end(JSON.stringify({ status: 'unhealthy' }));
    });
    await new Promise<void>(resolve => server.listen(0, resolve));
    const port = server.address().port;
    try {
      expect(await isServerHealthy(port)).toBe(false);
    } finally {
      server.close();
    }
  });

  test('returns false when server is not running', async () => {
    // Use a port that's almost certainly not in use
    expect(await isServerHealthy(59999)).toBe(false);
  });

  test('returns false on non-200 response', async () => {
    const server = http.createServer((_req: any, res: any) => {
      res.writeHead(500);
      res.end('Internal Server Error');
    });
    await new Promise<void>(resolve => server.listen(0, resolve));
    const port = server.address().port;
    try {
      expect(await isServerHealthy(port)).toBe(false);
    } finally {
      server.close();
    }
  });
});

describe('startup error log', () => {
  test('write and read error log', () => {
    const tmpDir = path.join(os.tmpdir(), `browse-error-log-test-${Date.now()}`);
    fs.mkdirSync(tmpDir, { recursive: true });
    const errorLogPath = path.join(tmpDir, 'browse-startup-error.log');
    const errorMsg = 'Cannot find module playwright';
    fs.writeFileSync(errorLogPath, `2026-03-23T00:00:00.000Z ${errorMsg}\n`);
    const content = fs.readFileSync(errorLogPath, 'utf-8').trim();
    expect(content).toContain(errorMsg);
    expect(content).toMatch(/^\d{4}-\d{2}-\d{2}T/); // ISO timestamp prefix
    fs.rmSync(tmpDir, { recursive: true, force: true });
  });
});

M package.json => package.json +1 -1
@@ 1,6 1,6 @@
{
  "name": "gstack",
  "version": "0.11.13.0",
  "version": "0.11.14.0",
  "description": "Garry's Stack — Claude Code skills + fast headless browser. One repo, one install, entire AI engineering workflow.",
  "license": "MIT",
  "type": "module",