Generated by /office-hours + /plan-ceo-review + /plan-eng-review on 2026-03-28 Updated: 2026-04-01 (post-Session Intelligence, reviewed by Codex) Branch: garrytan/ce-features Repo: gstack Status: ACTIVE Mode: Open Source / Community
GStack runs 30+ skills across sessions but learns nothing between them. A /review session catches an N+1 query pattern, and the next /review on the same codebase starts from scratch. A /ship run discovers the test command, and every future /ship re-discovers it. A /investigate finds a tricky race condition, and no future session knows about it.
Every AI coding tool has this problem. Cursor has per-user memory. Claude Code has CLAUDE.md. Windsurf has persistent context. But none of them compound. None of them structure what they learn. None of them share knowledge across skills.
Per-project institutional knowledge that compounds across sessions and skills. Structured, typed, confidence-scored learnings that every gstack skill can read and write. The goal: after 20 sessions on the same codebase, gstack knows every architectural decision, every past bug pattern, and every time it was wrong.
/autoship (Release 5). A full engineering team in one command. Describe a feature, approve the plan, everything else is automatic. /autoship can't work without learnings (R1), review quality (R2), session persistence (R3), and adaptive ceremony (R4). Releases 1-4 are the infrastructure that makes /autoship actually work.
YC founders building with AI. The people who run gstack on real codebases 20+ times a week and notice when it asks the same question twice.
| Tool | Memory model | Scope | Structure |
|---|---|---|---|
| Cursor | Per-user chat memory | Per-session | Unstructured |
| CLAUDE.md | Static file | Per-project | Manual |
| Windsurf | Persistent context | Per-session | Unstructured |
| GStack | Per-project JSONL | Cross-session, cross-skill | Typed, scored, decaying |
gstack has four distinct persistence layers. They share storage patterns
(JSONL in ~/.gstack/projects/$SLUG/) but serve different purposes:
| System | File | What it stores | Written by | Read by |
|---|---|---|---|---|
| Learnings | learnings.jsonl |
Institutional knowledge (pitfalls, patterns, preferences) | All skills | All skills (preamble) |
| Timeline | timeline.jsonl |
Event history (skill start/complete, branch, outcome) | Preamble (automatic) | /retro, preamble context recovery |
| Checkpoints | checkpoints/*.md |
Working state snapshots (decisions, remaining work, files) | /checkpoint, /ship, /investigate | Preamble context recovery, /checkpoint resume |
| Health | health-history.jsonl |
Code quality scores over time (per-tool, composite) | /health | /retro, /ship (gate), /health (trends) |
These are not overlapping. Learnings = what you know. Timeline = what happened. Checkpoints = where you are. Health = how good the code is. Each answers a different question.
Headline: Every session makes the next one smarter.
What shipped:
~/.gstack/projects/{slug}/learnings.jsonl/learn skill for manual review, search, prune, exportSchema:
{
"ts": "2026-03-28T12:00:00Z",
"skill": "review",
"type": "pitfall",
"key": "n-plus-one-activerecord",
"insight": "Always check includes() for has_many in list endpoints",
"confidence": 8,
"source": "observed",
"branch": "feature-x",
"commit": "abc1234",
"files": ["app/models/user.rb"]
}
Types: pattern | pitfall | preference | architecture | tool
Sources: observed | user-stated | inferred | cross-model
Architecture: append-only JSONL. Duplicates resolved at read time ("latest winner" per key+type). No write-time mutation, no race conditions.
Headline: 10 specialist reviewers on every PR.
What shipped:
Headline: Ship after R2 proves stable. Check in on how the core loop is performing.
Pre-check: review R2 quality metrics (PR quality scores, specialist hit rates, false positive rates, E2E test stability). If core loop has issues, fix those first.
What ships:
Headline: Your AI sessions remember what happened.
What shipped:
~/.gstack/projects/$SLUG/timeline.jsonl. Local-only, never sent anywhere,
always on regardless of telemetry setting./checkpoint skill: save/resume/list working state snapshots. Cross-branch
listing for Conductor workspace handoff between agents./health skill: code quality scorekeeper wrapping project tools (tsc, biome,
knip, shellcheck, tests). Composite 0-10 score, trend tracking, improvement
suggestions when scores drop.bin/gstack-timeline-log and bin/gstack-timeline-read.Design doc: docs/designs/SESSION_INTELLIGENCE.md
Headline: GStack respects your time without compromising your safety.
Ceremony and trust are separate concerns. Ceremony = the set of review/test/QA steps a PR goes through. Trust = a policy engine that determines which ceremony level applies. They interact but don't merge.
What ships:
Ceremony levels:
Trust policy engine:
Scope assessment:
TODO lifecycle:
Headline: Describe a feature. Approve the plan. Everything else is automatic.
/autoship is a resumable state machine, not a linear pipeline. Review and QA can send work back to build/fix. Compaction can interrupt any phase. The system must recover gracefully.
┌──────────┐
│ START │
└────┬─────┘
│
┌────▼─────┐
│ /office- │
│ hours │
└────┬─────┘
│
┌────▼─────┐
│/autoplan │ ◄── single approval gate
└────┬─────┘
│
┌──────────▼──────────┐
│ BUILD │ ◄── /checkpoint auto-save
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ /health │ ◄── quality gate
│ (score >= 7.0) │
└──────────┬──────────┘
│ fail → back to BUILD
┌──────────▼──────────┐
│ /review │
└──────────┬──────────┘
│ ASK items → back to BUILD
┌──────────▼──────────┐
│ /qa │
└──────────┬──────────┘
│ bugs found → back to BUILD
┌──────────▼──────────┐
│ /ship │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ /checkpoint archive │ ◄── preserve, don't destroy
└─────────────────────┘
What ships:
Depends on: R1 (learnings for research agents), R2 (review army for quality), R3 (session intelligence for persistence), R4 (adaptive ceremony for speed).
Headline: Parallel execution infrastructure.
What ships:
Headline: Visual design integration.
What ships:
(Identified by Codex review, 2026-04-01)
/health scores, clean review history, and timeline patterns are useful signals. They are not proof of safety. If those signals feed ceremony reduction AND /autoship, the failure mode is rare, silent, high-severity mistakes. Mitigations:
(Identified by Codex review, 2026-04-01)
Context recovery can inject wrong-branch state, obsolete plans, or invalid checkpoints. Mitigations:
(Identified by Codex review, 2026-04-01)
Before shipping R4 (Adaptive Ceremony), measure:
These metrics should be collected during R3 usage and reviewed before R4 ships.
The self-learning roadmap was inspired by ideas from the Compound Engineering project by Nico Bailon. Their exploration of learnings persistence, parallel review agents, and autonomous pipelines catalyzed the design of GStack's approach. We adapted every concept to fit GStack's template system, voice, and architecture rather than porting directly.