~cytrogen/gstack (8115951284474ae15a90162d33383ca61eb82d28): docs/designs/SELF_LEARNING_V0.md

#Design: GStack Self-Learning Infrastructure

Generated by /office-hours + /plan-ceo-review + /plan-eng-review on 2026-03-28 Branch: garrytan/ce-features Repo: gstack Status: ACTIVE Mode: Open Source / Community

#Problem Statement

GStack runs 30+ skills across sessions but learns nothing between them. A /review session catches an N+1 query pattern, and the next /review on the same codebase starts from scratch. A /ship run discovers the test command, and every future /ship re-discovers it. A /investigate finds a tricky race condition, and no future session knows about it.

Every AI coding tool has this problem. Cursor has per-user memory. Claude Code has CLAUDE.md. Windsurf has persistent context. But none of them compound. None of them structure what they learn. None of them share knowledge across skills.

#What We're Building

Per-project institutional knowledge that compounds across sessions and skills. Structured, typed, confidence-scored learnings that every gstack skill can read and write. The goal: after 20 sessions on the same codebase, gstack knows every architectural decision, every past bug pattern, and every time it was wrong.

#North Star

/autoship (Release 4). A full engineering team in one command. Describe a feature, approve the plan, everything else is automatic. /autoship can't work without learnings, because without memory it repeats the same mistakes. Releases 1-3 are the infrastructure that makes /autoship actually work.

#Audience

YC founders building with AI. The people who run gstack on real codebases 20+ times a week and notice when it asks the same question twice.

#Differentiation

Tool	Memory model	Scope	Structure
Cursor	Per-user chat memory	Per-session	Unstructured
CLAUDE.md	Static file	Per-project	Manual
Windsurf	Persistent context	Per-session	Unstructured
GStack	Per-project JSONL	Cross-session, cross-skill	Typed, scored, decaying

#Release Roadmap

#Release 1: "GStack Learns" (v0.14)

Headline: Every session makes the next one smarter.

What ships:

Learnings persistence at ~/.gstack/projects/{slug}/learnings.jsonl
/learn skill for manual review, search, prune, export
Confidence calibration on all review findings (1-10 scores with display rules)
Confidence decay for observed/inferred learnings (1pt/30d)
Cross-project learnings discovery (opt-in, AskUserQuestion consent)
"Learning applied" callouts when reviews match past learnings
Integration into /review, /ship, /plan-*, /office-hours, /investigate, /retro

Schema (Supabase-compatible):

{
  "ts": "2026-03-28T12:00:00Z",
  "skill": "review",
  "type": "pitfall",
  "key": "n-plus-one-activerecord",
  "insight": "Always check includes() for has_many in list endpoints",
  "confidence": 8,
  "source": "observed",
  "branch": "feature-x",
  "commit": "abc1234",
  "files": ["app/models/user.rb"]
}

Architecture: append-only JSONL. Duplicates resolved at read time ("latest winner" per key+type). No write-time mutation, no race conditions. Follows the existing gstack-review-log pattern.

#Release 2: "Review Army" (v0.15)

Headline: 10 specialist reviewers on every PR.

What ships:

7 parallel specialist subagents: always-on (testing, maintainability) + conditional (security, performance, data-migration, API contract, design) + red team (large diffs / critical findings)
JSON-structured findings with confidence scores + fingerprint dedup across agents
PR quality score (0-10) logged per review + /retro trending (E2)
Learning-informed specialist prompts — past pitfalls injected per domain (E4)
Multi-specialist consensus highlighting — confirmed findings get boosted (E6)
Enhanced Delivery Integrity via PLAN_COMPLETION_AUDIT — investigation depth, commit message fallback, plan-file learnings logging
Checklist refactored: CRITICAL categories stay in main pass, specialist categories extracted to focused checklists in review/specialists/

#Release 2.5: "Review Army Expansions" (v0.15.x)

Headline: Ship after R2 proves stable. Check in on how the core loop is performing.

Pre-check: review R2 quality metrics (PR quality scores, specialist hit rates, false positive rates, E2E test stability). If core loop has issues, fix those first.

What ships:

E1: Adaptive specialist gating — auto-skip specialists with 0-finding track record. Store per-project hit rates via gstack-learnings-log. User can force with --security etc.
E3: Test stub generation — each specialist outputs TEST_STUB alongside findings. Framework detected from project (Jest/Vitest/RSpec/pytest/Go test). Flows into Fix-First: AUTO-FIX applies fix + creates test file.
E5: Cross-review finding dedup — read gstack-review-read for prior review entries. Suppress findings matching a prior user-skipped finding.
E7: Specialist performance tracking — log per-specialist metrics via gstack-review-log. /retro integration: "Top finding specialist: Performance (7 findings)."

#Release 3: "Smart Ceremony" (v0.16)

Headline: GStack respects your time.

What ships:

Scope assessment (TINY/SMALL/MEDIUM/LARGE) in /review, /ship, /autoplan
Ceremony skipping based on diff size and scope category
File-based todo lifecycle (/triage for interactive approval, /resolve for batch resolution via parallel agents)

#Release 4: "/autoship — One Command, Full Feature" (v0.17)

Headline: Describe a feature. Approve the plan. Everything else is automatic.

What ships:

/autoship autonomous pipeline: office-hours → autoplan → build → review → qa → ship → learn. 7 phases, 1 approval gate (the plan).
/ideate brainstorming skill (parallel divergent agents + adversarial filtering)
Research agents in /plan-eng-review (codebase analyst, history analyst, best practices researcher, learnings researcher)

#Release 5: "Studio" (v0.18)

Headline: The full-stack AI engineering studio.

What ships:

Figma design sync (pixel-matching iteration loop)
Feature video recording (auto-generated PR demos)
PR feedback resolution (parallel comment resolver)
Swarm orchestration (multi-worktree parallel builds)
/onboard (auto-generated contributor guide)
/triage-prs (batch PR triage for maintainers)
Codex build delegation (delegate implementation to Codex CLI)
Cross-platform portability (Copilot, Kiro, Windsurf output)

#Acknowledged Inspiration

The self-learning roadmap was inspired by ideas from the Compound Engineering project by Nico Bailon. Their exploration of learnings persistence, parallel review agents, and autonomous pipelines catalyzed the design of GStack's approach. We adapted every concept to fit GStack's template system, voice, and architecture rather than porting directly.