From 247fc3ba0bd4f0374ed8aee81549694e49c76167 Mon Sep 17 00:00:00 2001
From: Garry Tan <garrytan@gmail.com>
Date: Sat, 28 Mar 2026 10:25:37 -0600
Subject: [PATCH] =?UTF-8?q?feat:=20user=20sovereignty=20=E2=80=94=20AI=20m?=
 =?UTF-8?q?odels=20recommend,=20users=20decide=20(v0.13.2.0)=20(#603)?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* feat: user sovereignty — AI models recommend, users decide

When Claude and Codex agree on a scope change, they now present it to the
user instead of auto-incorporating it. Adds User Sovereignty as the third
core principle in ETHOS.md. Fixes the cross-model tension template in
review.ts to present both perspectives neutrally instead of judging. Adds
User Challenge category to autoplan with proper contract updates (intro,
important rules, audit trail, gate handling). Adds Outside Voice Integration
Rule to CEO and eng review templates.

* chore: regenerate SKILL.md files from updated templates

* chore: bump version and changelog (v0.13.2.0)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: proper gstack description in openai.yaml + block Codex from rewriting it

Codex kept overwriting agents/openai.yaml with a browse-only description.
Two fixes: (1) better description covering full PM/dev/eng/CEO/QA scope,
(2) add agents/ to the filesystem boundary so Codex stops modifying it.

* chore: regenerate SKILL.md files with updated filesystem boundary

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
---
 CHANGELOG.md                   | 18 +++++++++++
 ETHOS.md                       | 35 ++++++++++++++++++++
 SKILL.md                       |  2 ++
 VERSION                        |  2 +-
 agents/openai.yaml             |  4 ++-
 autoplan/SKILL.md              | 58 ++++++++++++++++++++++++++++++----
 autoplan/SKILL.md.tmpl         | 56 ++++++++++++++++++++++++++++----
 benchmark/SKILL.md             |  2 ++
 browse/SKILL.md                |  2 ++
 canary/SKILL.md                |  2 ++
 codex/SKILL.md                 | 16 ++++++----
 codex/SKILL.md.tmpl            | 14 ++++----
 connect-chrome/SKILL.md        |  2 ++
 cso/SKILL.md                   |  2 ++
 design-consultation/SKILL.md   |  2 ++
 design-review/SKILL.md         |  2 ++
 design-shotgun/SKILL.md        |  2 ++
 document-release/SKILL.md      |  2 ++
 investigate/SKILL.md           |  2 ++
 land-and-deploy/SKILL.md       |  2 ++
 office-hours/SKILL.md          |  4 ++-
 plan-ceo-review/SKILL.md       | 34 ++++++++++++++++----
 plan-ceo-review/SKILL.md.tmpl  |  8 +++++
 plan-design-review/SKILL.md    |  2 ++
 plan-eng-review/SKILL.md       | 34 ++++++++++++++++----
 plan-eng-review/SKILL.md.tmpl  |  8 +++++
 qa-only/SKILL.md               |  2 ++
 qa/SKILL.md                    |  2 ++
 retro/SKILL.md                 |  2 ++
 review/SKILL.md                |  6 ++--
 scripts/resolvers/preamble.ts  |  6 +++-
 scripts/resolvers/review.ts    | 24 ++++++++++----
 setup-browser-cookies/SKILL.md |  2 ++
 setup-deploy/SKILL.md          |  2 ++
 ship/SKILL.md                  |  6 ++--
 test/gen-skill-docs.test.ts    |  1 +
 36 files changed, 318 insertions(+), 52 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 24b7111a5d16bdcb3a8831f112b700d1566ddea6..8868e5c7fd658a299d60390418e743759ef29681 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,23 @@
 # Changelog
 
+## [0.13.2.0] - 2026-03-28 — User Sovereignty
+
+AI models now recommend instead of override. When Claude and Codex agree on a scope change, they present it to you instead of just doing it. Your direction is the default, not the models' consensus.
+
+### Added
+
+- **User Sovereignty principle in ETHOS.md.** The third core principle: AI models recommend, users decide. Cross-model agreement is a strong signal, not a mandate.
+- **User Challenge category in /autoplan.** When both models agree your stated direction should change, it goes to the final approval gate as a "User Challenge" instead of being auto-decided. Your original direction stands unless you explicitly change it.
+- **Security/feasibility warning framing.** If both models flag something as a security risk (not just a preference), the question explicitly warns you it's a safety concern, not a taste call.
+- **Outside Voice Integration Rule in CEO and Eng reviews.** Outside voice findings are informational until you explicitly approve each one.
+- **User sovereignty statement in all skill voices.** Every skill now includes the rule that cross-model agreement is a recommendation, not a decision.
+
+### Changed
+
+- **Cross-model tension template no longer says "your assessment of who's right."** Now says "present both perspectives neutrally, state what context you might be missing." Options expanded from Add/Skip to Accept/Keep/Investigate/Defer.
+- **/autoplan now has two gates, not one.** Premises (Phase 1) and User Challenges (both models disagree with your direction). Important Rules updated from "premises are the one gate" to "two gates."
+- **Decision Audit Trail now tracks classification.** Each auto-decision is logged as mechanical, taste, or user-challenge.
+
 ## [0.13.1.0] - 2026-03-28 — Defense in Depth
 
 The browse server runs on localhost and requires a token for access, so these issues only matter if a malicious process is already running on your machine (e.g., a compromised npm postinstall script). This release hardens the attack surface so that even in that scenario, the damage is contained.
diff --git a/ETHOS.md b/ETHOS.md
index b056fcf16d6b281d1c52fe09799aa1c56bde05a9..a04cd9d1c43e09f247d3fc4ec45a9ef0843336c7 100644
--- a/ETHOS.md
+++ b/ETHOS.md
@@ -107,6 +107,41 @@ Build on it.
 
 ---
 
+## 3. User Sovereignty
+
+AI models recommend. Users decide. This is the one rule that overrides all others.
+
+Two AI models agreeing on a change is a strong signal. It is not a mandate. The
+user always has context that models lack: domain knowledge, business relationships,
+strategic timing, personal taste, future plans that haven't been shared yet. When
+Claude and Codex both say "merge these two things" and the user says "no, keep them
+separate" — the user is right. Always. Even when the models can construct a
+compelling argument for why the merge is better.
+
+Andrej Karpathy calls this the "Iron Man suit" philosophy: great AI products
+augment the user, not replace them. The human stays at the center. Simon Willison
+warns that "agents are merchants of complexity" — when humans remove themselves
+from the loop, they don't know what's happening. Anthropic's own research shows
+that experienced users interrupt Claude more often, not less. Expertise makes you
+more hands-on, not less.
+
+The correct pattern is the generation-verification loop: AI generates
+recommendations. The user verifies and decides. The AI never skips the
+verification step because it's confident.
+
+**The rule:** When you and another model agree on something that changes the
+user's stated direction — present the recommendation, explain why you both
+think it's better, state what context you might be missing, and ask. Never act.
+
+**Anti-patterns:**
+- "The outside voice is right, so I'll incorporate it." (Present it. Ask.)
+- "Both models agree, so this must be correct." (Agreement is signal, not proof.)
+- "I'll make the change and tell the user afterward." (Ask first. Always.)
+- Framing your assessment as settled fact in a "My Assessment" column. (Present
+  both sides. Let the user fill in the assessment.)
+
+---
+
 ## How They Work Together
 
 Boil the Lake says: **do the complete thing.**
diff --git a/SKILL.md b/SKILL.md
index a740a2cc605d95bb5631c73011c70a8b11b38f0c..fa27290517f8e54b5ddfd76e921dbe2610cc5100 100644
--- a/SKILL.md
+++ b/SKILL.md
@@ -144,6 +144,8 @@ This only happens once. If `PROACTIVE_PROMPTED` is `yes`, skip this entirely.
 
 **Writing rules:** No em dashes (use commas, periods, "..."). No AI vocabulary (delve, crucial, robust, comprehensive, nuanced, etc.). Short paragraphs. End with what to do.
 
+The user always has context you don't. Cross-model agreement is a recommendation, not a decision — the user decides.
+
 ## Contributor Mode
 
 If `_CONTRIB` is `true`: you are in **contributor mode**. At the end of each major workflow step, rate your gstack experience 0-10. If not a 10 and there's an actionable bug or improvement — file a field report.
diff --git a/VERSION b/VERSION
index 883dcff5e3bd9b90ca6f1a5a06a5a81a66c0177b..dd678579b52b3ccb4d8da519277915193113065a 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-0.13.1.0
+0.13.2.0
diff --git a/agents/openai.yaml b/agents/openai.yaml
index 1bb2fd7cc7dc0a9213ac7bfdeb38382cc50f13dd..def8292bfdf2b5eb121b3b479b441465588ec325 100644
--- a/agents/openai.yaml
+++ b/agents/openai.yaml
@@ -1,4 +1,6 @@
 interface:
   display_name: "gstack"
-  short_description: "Bundle of gstack Codex skills"
+  short_description: "AI builder framework — CEO strategy, eng review, design audit, QA testing, security audit, headless browser, deploy pipeline, and retrospectives. Full PM/dev/eng/CEO/QA in a box."
   default_prompt: "Use $gstack to locate the bundled gstack skills."
+policy:
+  allow_implicit_invocation: true
diff --git a/autoplan/SKILL.md b/autoplan/SKILL.md
index 5e8db0651964e0cfde75b9109a34812271121d6b..e4a2f78dc42214428ef61d23f17ed35a55dc9414 100644
--- a/autoplan/SKILL.md
+++ b/autoplan/SKILL.md
@@ -171,6 +171,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
@@ -481,6 +483,28 @@ Examples: run codex (always yes), run evals (always yes), reduce scope on a comp
 2. **Borderline scope** — in blast radius but 3-5 files, or ambiguous radius.
 3. **Codex disagreements** — codex recommends differently and has a valid point.
 
+**User Challenge** — both models agree the user's stated direction should change.
+This is qualitatively different from taste decisions. When Claude and Codex both
+recommend merging, splitting, adding, or removing features/skills/workflows that
+the user specified, this is a User Challenge. It is NEVER auto-decided.
+
+User Challenges go to the final approval gate with richer context than taste
+decisions:
+- **What the user said:** (their original direction)
+- **What both models recommend:** (the change)
+- **Why:** (the models' reasoning)
+- **What context we might be missing:** (explicit acknowledgment of blind spots)
+- **If we're wrong, the cost is:** (what happens if the user's original direction
+  was right and we changed it)
+
+The user's original direction is the default. The models must make the case for
+change, not the other way around.
+
+**Exception:** If both models flag the change as a security vulnerability or
+feasibility blocker (not a preference), the AskUserQuestion framing explicitly
+warns: "Both models believe this is a security/feasibility risk, not just a
+preference." The user still decides, but the framing is appropriately urgent.
+
 ---
 
 ## Sequential Execution — MANDATORY
@@ -501,6 +525,12 @@ the ANALYSIS. Every section in the loaded skill files must still be executed at
 same depth as the interactive version. The only thing that changes is who answers the
 AskUserQuestion: you do, using the 6 principles, instead of the user.
 
+**Two exceptions — never auto-decided:**
+1. Premises (Phase 1) — require human judgment about what problem to solve.
+2. User Challenges — when both models agree the user's stated direction should change
+   (merge, split, add, remove features/workflows). The user always has context models
+   lack. See Decision Classification above.
+
 **You MUST still:**
 - READ the actual code, diffs, and files each section references
 - PRODUCE every output the section requires (diagrams, tables, registries, artifacts)
@@ -652,7 +682,8 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   tag `[codex-only]`. Subagent only → tag `[subagent-only]`.
 
 - Strategy choices: if codex disagrees with a premise or scope decision with valid
-  strategic reason → TASTE DECISION.
+  strategic reason → TASTE DECISION. If both models agree the user's stated structure
+  should change (merge, split, add, remove) → USER CHALLENGE (never auto-decided).
 
 **Required execution checklist (CEO):**
 
@@ -764,7 +795,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
 
 - Design choices: if codex disagrees with a design decision with valid UX reasoning
-  → TASTE DECISION.
+  → TASTE DECISION. Scope changes both models agree on → USER CHALLENGE.
 
 **Required execution checklist (Design):**
 
@@ -833,7 +864,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
 
   Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
 
-- Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION.
+- Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION. Scope changes both models agree on → USER CHALLENGE.
 - Evals: always include all relevant suites (P1)
 - Test plan: generate artifact at `~/.gstack/projects/$SLUG/{user}-{branch}-test-plan-{datetime}.md`
 - TODOS.md: collect all deferred scope expansions from Phase 1, auto-write
@@ -903,7 +934,7 @@ After each auto-decision, append a row to the plan file using Edit:
 <!-- AUTONOMOUS DECISION LOG -->
 ## Decision Audit Trail
 
-| # | Phase | Decision | Principle | Rationale | Rejected |
+| # | Phase | Decision | Classification | Principle | Rationale | Rejected |
 |---|-------|----------|-----------|-----------|----------|
 ```
 
@@ -971,7 +1002,20 @@ Present as a message, then use AskUserQuestion:
 ### Plan Summary
 [1-3 sentence summary]
 
-### Decisions Made: [N] total ([M] auto-decided, [K] choices for you)
+### Decisions Made: [N] total ([M] auto-decided, [K] taste choices, [J] user challenges)
+
+### User Challenges (both models disagree with your stated direction)
+[For each user challenge:]
+**Challenge [N]: [title]** (from [phase])
+You said: [user's original direction]
+Both models recommend: [the change]
+Why: [reasoning]
+What we might be missing: [blind spots]
+If we're wrong, the cost is: [downside of changing]
+[If security/feasibility: "⚠️ Both models flag this as a security/feasibility risk,
+not just a preference."]
+
+Your call — your original direction stands unless you explicitly change it.
 
 ### Your Choices (taste decisions)
 [For each taste decision:]
@@ -999,6 +1043,7 @@ I recommend [X] — [principle]. But [Y] is also viable:
 ```
 
 **Cognitive load management:**
+- 0 user challenges: skip "User Challenges" section
 - 0 taste decisions: skip "Your Choices" section
 - 1-7 taste decisions: flat list
 - 8+: group by phase. Add warning: "This plan had unusually high ambiguity ([N] taste decisions). Review carefully."
@@ -1006,6 +1051,7 @@ I recommend [X] — [principle]. But [Y] is also viable:
 AskUserQuestion options:
 - A) Approve as-is (accept all recommendations)
 - B) Approve with overrides (specify which taste decisions to change)
+- B2) Approve with user challenge responses (accept or reject each challenge)
 - C) Interrogate (ask about any specific decision)
 - D) Revise (the plan itself needs changes)
 - E) Reject (start over)
@@ -1061,7 +1107,7 @@ Suggest next step: `/ship` when ready to create the PR.
 ## Important Rules
 
 - **Never abort.** The user chose /autoplan. Respect that choice. Surface all taste decisions, never redirect to interactive review.
-- **Premises are the one gate.** The only non-auto-decided AskUserQuestion is the premise confirmation in Phase 1.
+- **Two gates.** The non-auto-decided AskUserQuestions are: (1) premise confirmation in Phase 1, and (2) User Challenges — when both models agree the user's stated direction should change. Everything else is auto-decided using the 6 principles.
 - **Log every decision.** No silent auto-decisions. Every choice gets a row in the audit trail.
 - **Full depth means full depth.** Do not compress or skip sections from the loaded skill files (except the skip list in Phase 0). "Full depth" means: read the code the section asks you to read, produce the outputs the section requires, identify every issue, and decide each one. A one-sentence summary of a section is not "full depth" — it is a skip. If you catch yourself writing fewer than 3 sentences for any review section, you are likely compressing.
 - **Artifacts are deliverables.** Test plan artifact, failure modes registry, error/rescue table, ASCII diagrams — these must exist on disk or in the plan file when the review completes. If they don't exist, the review is incomplete.
diff --git a/autoplan/SKILL.md.tmpl b/autoplan/SKILL.md.tmpl
index 16c35adca537c61502d4cf1093fe310e8b84ef86..40c5f26b53aa329ec450ddc96c18dbecdde8288f 100644
--- a/autoplan/SKILL.md.tmpl
+++ b/autoplan/SKILL.md.tmpl
@@ -71,6 +71,28 @@ Examples: run codex (always yes), run evals (always yes), reduce scope on a comp
 2. **Borderline scope** — in blast radius but 3-5 files, or ambiguous radius.
 3. **Codex disagreements** — codex recommends differently and has a valid point.
 
+**User Challenge** — both models agree the user's stated direction should change.
+This is qualitatively different from taste decisions. When Claude and Codex both
+recommend merging, splitting, adding, or removing features/skills/workflows that
+the user specified, this is a User Challenge. It is NEVER auto-decided.
+
+User Challenges go to the final approval gate with richer context than taste
+decisions:
+- **What the user said:** (their original direction)
+- **What both models recommend:** (the change)
+- **Why:** (the models' reasoning)
+- **What context we might be missing:** (explicit acknowledgment of blind spots)
+- **If we're wrong, the cost is:** (what happens if the user's original direction
+  was right and we changed it)
+
+The user's original direction is the default. The models must make the case for
+change, not the other way around.
+
+**Exception:** If both models flag the change as a security vulnerability or
+feasibility blocker (not a preference), the AskUserQuestion framing explicitly
+warns: "Both models believe this is a security/feasibility risk, not just a
+preference." The user still decides, but the framing is appropriately urgent.
+
 ---
 
 ## Sequential Execution — MANDATORY
@@ -91,6 +113,12 @@ the ANALYSIS. Every section in the loaded skill files must still be executed at
 same depth as the interactive version. The only thing that changes is who answers the
 AskUserQuestion: you do, using the 6 principles, instead of the user.
 
+**Two exceptions — never auto-decided:**
+1. Premises (Phase 1) — require human judgment about what problem to solve.
+2. User Challenges — when both models agree the user's stated direction should change
+   (merge, split, add, remove features/workflows). The user always has context models
+   lack. See Decision Classification above.
+
 **You MUST still:**
 - READ the actual code, diffs, and files each section references
 - PRODUCE every output the section requires (diagrams, tables, registries, artifacts)
@@ -242,7 +270,8 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   tag `[codex-only]`. Subagent only → tag `[subagent-only]`.
 
 - Strategy choices: if codex disagrees with a premise or scope decision with valid
-  strategic reason → TASTE DECISION.
+  strategic reason → TASTE DECISION. If both models agree the user's stated structure
+  should change (merge, split, add, remove) → USER CHALLENGE (never auto-decided).
 
 **Required execution checklist (CEO):**
 
@@ -354,7 +383,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
   Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
 
 - Design choices: if codex disagrees with a design decision with valid UX reasoning
-  → TASTE DECISION.
+  → TASTE DECISION. Scope changes both models agree on → USER CHALLENGE.
 
 **Required execution checklist (Design):**
 
@@ -423,7 +452,7 @@ Override: every AskUserQuestion → auto-decide using the 6 principles.
 
   Error handling: same as Phase 1 (non-blocking, degradation matrix applies).
 
-- Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION.
+- Architecture choices: explicit over clever (P5). If codex disagrees with valid reason → TASTE DECISION. Scope changes both models agree on → USER CHALLENGE.
 - Evals: always include all relevant suites (P1)
 - Test plan: generate artifact at `~/.gstack/projects/$SLUG/{user}-{branch}-test-plan-{datetime}.md`
 - TODOS.md: collect all deferred scope expansions from Phase 1, auto-write
@@ -493,7 +522,7 @@ After each auto-decision, append a row to the plan file using Edit:
 <!-- AUTONOMOUS DECISION LOG -->
 ## Decision Audit Trail
 
-| # | Phase | Decision | Principle | Rationale | Rejected |
+| # | Phase | Decision | Classification | Principle | Rationale | Rejected |
 |---|-------|----------|-----------|-----------|----------|
 ```
 
@@ -561,7 +590,20 @@ Present as a message, then use AskUserQuestion:
 ### Plan Summary
 [1-3 sentence summary]
 
-### Decisions Made: [N] total ([M] auto-decided, [K] choices for you)
+### Decisions Made: [N] total ([M] auto-decided, [K] taste choices, [J] user challenges)
+
+### User Challenges (both models disagree with your stated direction)
+[For each user challenge:]
+**Challenge [N]: [title]** (from [phase])
+You said: [user's original direction]
+Both models recommend: [the change]
+Why: [reasoning]
+What we might be missing: [blind spots]
+If we're wrong, the cost is: [downside of changing]
+[If security/feasibility: "⚠️ Both models flag this as a security/feasibility risk,
+not just a preference."]
+
+Your call — your original direction stands unless you explicitly change it.
 
 ### Your Choices (taste decisions)
 [For each taste decision:]
@@ -589,6 +631,7 @@ I recommend [X] — [principle]. But [Y] is also viable:
 ```
 
 **Cognitive load management:**
+- 0 user challenges: skip "User Challenges" section
 - 0 taste decisions: skip "Your Choices" section
 - 1-7 taste decisions: flat list
 - 8+: group by phase. Add warning: "This plan had unusually high ambiguity ([N] taste decisions). Review carefully."
@@ -596,6 +639,7 @@ I recommend [X] — [principle]. But [Y] is also viable:
 AskUserQuestion options:
 - A) Approve as-is (accept all recommendations)
 - B) Approve with overrides (specify which taste decisions to change)
+- B2) Approve with user challenge responses (accept or reject each challenge)
 - C) Interrogate (ask about any specific decision)
 - D) Revise (the plan itself needs changes)
 - E) Reject (start over)
@@ -651,7 +695,7 @@ Suggest next step: `/ship` when ready to create the PR.
 ## Important Rules
 
 - **Never abort.** The user chose /autoplan. Respect that choice. Surface all taste decisions, never redirect to interactive review.
-- **Premises are the one gate.** The only non-auto-decided AskUserQuestion is the premise confirmation in Phase 1.
+- **Two gates.** The non-auto-decided AskUserQuestions are: (1) premise confirmation in Phase 1, and (2) User Challenges — when both models agree the user's stated direction should change. Everything else is auto-decided using the 6 principles.
 - **Log every decision.** No silent auto-decisions. Every choice gets a row in the audit trail.
 - **Full depth means full depth.** Do not compress or skip sections from the loaded skill files (except the skip list in Phase 0). "Full depth" means: read the code the section asks you to read, produce the outputs the section requires, identify every issue, and decide each one. A one-sentence summary of a section is not "full depth" — it is a skip. If you catch yourself writing fewer than 3 sentences for any review section, you are likely compressing.
 - **Artifacts are deliverables.** Test plan artifact, failure modes registry, error/rescue table, ASCII diagrams — these must exist on disk or in the plan file when the review completes. If they don't exist, the review is incomplete.
diff --git a/benchmark/SKILL.md b/benchmark/SKILL.md
index 293c1c86f7e790e0139586dad5780bf646661b92..51e39a100bbf94569b0d749511ab66a846dcfd6d 100644
--- a/benchmark/SKILL.md
+++ b/benchmark/SKILL.md
@@ -146,6 +146,8 @@ This only happens once. If `PROACTIVE_PROMPTED` is `yes`, skip this entirely.
 
 **Writing rules:** No em dashes (use commas, periods, "..."). No AI vocabulary (delve, crucial, robust, comprehensive, nuanced, etc.). Short paragraphs. End with what to do.
 
+The user always has context you don't. Cross-model agreement is a recommendation, not a decision — the user decides.
+
 ## Contributor Mode
 
 If `_CONTRIB` is `true`: you are in **contributor mode**. At the end of each major workflow step, rate your gstack experience 0-10. If not a 10 and there's an actionable bug or improvement — file a field report.
diff --git a/browse/SKILL.md b/browse/SKILL.md
index dd97646b5fe67550afdf35697590d3b8cea8618a..a9f95ec2c913db34067614fd003577026448f4ff 100644
--- a/browse/SKILL.md
+++ b/browse/SKILL.md
@@ -146,6 +146,8 @@ This only happens once. If `PROACTIVE_PROMPTED` is `yes`, skip this entirely.
 
 **Writing rules:** No em dashes (use commas, periods, "..."). No AI vocabulary (delve, crucial, robust, comprehensive, nuanced, etc.). Short paragraphs. End with what to do.
 
+The user always has context you don't. Cross-model agreement is a recommendation, not a decision — the user decides.
+
 ## Contributor Mode
 
 If `_CONTRIB` is `true`: you are in **contributor mode**. At the end of each major workflow step, rate your gstack experience 0-10. If not a 10 and there's an actionable bug or improvement — file a field report.
diff --git a/canary/SKILL.md b/canary/SKILL.md
index 9dc53b01365b381f4ec23605e7d7946305348b48..ed814098b9138495ff7b56df4a8fd9ea19ad7dd8 100644
--- a/canary/SKILL.md
+++ b/canary/SKILL.md
@@ -164,6 +164,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/codex/SKILL.md b/codex/SKILL.md
index c7720dbac5ec83eebcfba9cac7909b8bf866de68..380382ff6fe1100966f21aea209935e01179a74f 100644
--- a/codex/SKILL.md
+++ b/codex/SKILL.md
@@ -165,6 +165,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
@@ -441,7 +443,7 @@ per-mode default below. Otherwise, use the per-mode defaults:
 
 All prompts sent to Codex MUST be prefixed with this boundary instruction:
 
-> IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.
+> IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.
 
 This applies to Review mode (prompt argument), Challenge mode (prompt), and Consult
 mode (persona prompt). Reference this section as "the filesystem boundary" below.
@@ -463,7 +465,7 @@ instructions, append them after the boundary separated by a newline:
 ```bash
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
 cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
+codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
@@ -473,7 +475,7 @@ Use `timeout: 300000` on the Bash call. If the user provided custom instructions
 ```bash
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
 cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
@@ -607,12 +609,12 @@ from the Filesystem Boundary section above. If the user provided a focus area
 (e.g., `/codex challenge security`), include it after the boundary:
 
 Default prompt (no focus):
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems."
 
 With focus (e.g., "security"):
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial."
 
@@ -711,7 +713,7 @@ section above to every prompt sent to Codex, including plan reviews and free-for
 consult questions.
 
 Prepend the boundary and persona to the user's prompt:
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 You are a brutally honest technical reviewer. Review this plan for: logical gaps and
 unstated assumptions, missing error handling or edge cases, overcomplexity (is there a
@@ -723,7 +725,7 @@ THE PLAN:
 <full plan content, embedded verbatim>"
 
 For non-plan consult prompts (user typed `/codex <question>`), still prepend the boundary:
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 <user's question>"
 
diff --git a/codex/SKILL.md.tmpl b/codex/SKILL.md.tmpl
index 684f7d69ba8f5316bc28db1d8546a41370246e3f..c44480a9f8c5cef5a4a5d4a0af069a6743497e35 100644
--- a/codex/SKILL.md.tmpl
+++ b/codex/SKILL.md.tmpl
@@ -81,7 +81,7 @@ per-mode default below. Otherwise, use the per-mode defaults:
 
 All prompts sent to Codex MUST be prefixed with this boundary instruction:
 
-> IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.
+> IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.
 
 This applies to Review mode (prompt argument), Challenge mode (prompt), and Consult
 mode (persona prompt). Reference this section as "the filesystem boundary" below.
@@ -103,7 +103,7 @@ instructions, append them after the boundary separated by a newline:
 ```bash
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
 cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
+codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 If the user passed `--xhigh`, use `"xhigh"` instead of `"high"`.
@@ -113,7 +113,7 @@ Use `timeout: 300000` on the Bash call. If the user provided custom instructions
 ```bash
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
 cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 focus on security" --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
@@ -182,12 +182,12 @@ from the Filesystem Boundary section above. If the user provided a focus area
 (e.g., `/codex challenge security`), include it after the boundary:
 
 Default prompt (no focus):
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems."
 
 With focus (e.g., "security"):
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 Review the changes on this branch against the base branch. Run `git diff origin/<base>` to see the diff. Focus specifically on SECURITY. Your job is to find every way an attacker could exploit this code. Think about injection vectors, auth bypasses, privilege escalation, data exposure, and timing attacks. Be adversarial."
 
@@ -286,7 +286,7 @@ section above to every prompt sent to Codex, including plan reviews and free-for
 consult questions.
 
 Prepend the boundary and persona to the user's prompt:
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 You are a brutally honest technical reviewer. Review this plan for: logical gaps and
 unstated assumptions, missing error handling or edge cases, overcomplexity (is there a
@@ -298,7 +298,7 @@ THE PLAN:
 <full plan content, embedded verbatim>"
 
 For non-plan consult prompts (user typed `/codex <question>`), still prepend the boundary:
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. Stay focused on repository code only.
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. Do NOT modify agents/openai.yaml. Stay focused on repository code only.
 
 <user's question>"
 
diff --git a/connect-chrome/SKILL.md b/connect-chrome/SKILL.md
index 9d9f4981aee13fe00c9f2bd494ac7a2b0aa8228f..57826bbde73d15bf56fe76b8e92eda28b51d8ca7 100644
--- a/connect-chrome/SKILL.md
+++ b/connect-chrome/SKILL.md
@@ -162,6 +162,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/cso/SKILL.md b/cso/SKILL.md
index e4ca8d087a9cbf8bfa602ceed2eaf3e6ce552817..5e448639b3efdd1b64c4eff095a17c408abfe1f7 100644
--- a/cso/SKILL.md
+++ b/cso/SKILL.md
@@ -168,6 +168,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/design-consultation/SKILL.md b/design-consultation/SKILL.md
index b3eb0ce38e771a5ce0fe14c28fd33600fc998cbe..86971887efacf9935306e1728e7d4ab3a8b78593 100644
--- a/design-consultation/SKILL.md
+++ b/design-consultation/SKILL.md
@@ -169,6 +169,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/design-review/SKILL.md b/design-review/SKILL.md
index fbe1200ee486d1653cea76d539257189490cd1de..fb0824422462178777dd31ab06cb70a2c7d24950 100644
--- a/design-review/SKILL.md
+++ b/design-review/SKILL.md
@@ -169,6 +169,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/design-shotgun/SKILL.md b/design-shotgun/SKILL.md
index 31ef3985981eb30a71b65586a3c359b6cbe75919..6c3b3f0756d15586260f16850ac6703d0a7b2dc6 100644
--- a/design-shotgun/SKILL.md
+++ b/design-shotgun/SKILL.md
@@ -166,6 +166,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/document-release/SKILL.md b/document-release/SKILL.md
index ef235837238db20ff93d02e2e9df6424ae3d019f..2758f0cde4c373902aed9b84103508984789125c 100644
--- a/document-release/SKILL.md
+++ b/document-release/SKILL.md
@@ -166,6 +166,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/investigate/SKILL.md b/investigate/SKILL.md
index f5cac99f18d16952f1ac309d16503c9a541602f2..8e307dc0b493bb1e26d78b090a6381b30fb8a89e 100644
--- a/investigate/SKILL.md
+++ b/investigate/SKILL.md
@@ -180,6 +180,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/land-and-deploy/SKILL.md b/land-and-deploy/SKILL.md
index 75607ab567b1e8fd9e55eaa43ea85670997abd13..e54bb1594837b452def08acc2aaf9b7986b91bef 100644
--- a/land-and-deploy/SKILL.md
+++ b/land-and-deploy/SKILL.md
@@ -163,6 +163,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/office-hours/SKILL.md b/office-hours/SKILL.md
index 539529fee94b0ed605f3c2094f78071c45ff8a99..34aa907079abeb0631957d13f7ccbb8831c5bc83 100644
--- a/office-hours/SKILL.md
+++ b/office-hours/SKILL.md
@@ -171,6 +171,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
@@ -730,7 +732,7 @@ CODEX_PROMPT_FILE=$(mktemp /tmp/gstack-codex-oh-XXXXXXXX.txt)
 ```
 
 Write the full prompt to this file. **Always start with the filesystem boundary:**
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\n\n"
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\n"
 Then add the context block and mode-appropriate instructions:
 
 **Startup mode instructions:** "You are an independent technical advisor reading a transcript of a startup brainstorming session. [CONTEXT BLOCK HERE]. Your job: 1) What is the STRONGEST version of what this person is trying to build? Steelman it in 2-3 sentences. 2) What is the ONE thing from their answers that reveals the most about what they should actually build? Quote it and explain why. 3) Name ONE agreed premise you think is wrong, and what evidence would prove you right. 4) If you had 48 hours and one engineer to build a prototype, what would you build? Be specific — tech stack, features, what you'd skip. Be direct. Be terse. No preamble."
diff --git a/plan-ceo-review/SKILL.md b/plan-ceo-review/SKILL.md
index 55b036100178981f741df89242bbdf89f36a309a..f208894ce440bcec1c560ac549e9ca639c8cde73 100644
--- a/plan-ceo-review/SKILL.md
+++ b/plan-ceo-review/SKILL.md
@@ -169,6 +169,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
@@ -1099,7 +1101,7 @@ Construct this prompt (substitute the actual plan content — if plan content ex
 truncate to the first 30KB and note "Plan truncated for size"). **Always start with the
 filesystem boundary instruction:**
 
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\n\nYou are a brutally honest technical reviewer examining a development plan that has
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nYou are a brutally honest technical reviewer examining a development plan that has
 already been through a multi-section review. Your job is NOT to repeat that review.
 Instead, find what it missed. Look for: logical gaps and unstated assumptions that
 survived the review scrutiny, overcomplexity (is there a fundamentally simpler
@@ -1157,17 +1159,29 @@ disagrees with the review findings from earlier sections. Flag these as:
 
 ```
 CROSS-MODEL TENSION:
-  [Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.]
+  [Topic]: Review said X. Outside voice says Y. [Present both perspectives neutrally.
+  State what context you might be missing that would change the answer.]
 ```
 
-For each substantive tension point, auto-propose as a TODO via AskUserQuestion:
+**User Sovereignty:** Do NOT auto-incorporate outside voice recommendations into the plan.
+Present each tension point to the user. The user decides. Cross-model agreement is a
+strong signal — present it as such — but it is NOT permission to act. You may state
+which argument you find more compelling, but you MUST NOT apply the change without
+explicit user approval.
+
+For each substantive tension point, use AskUserQuestion:
 
 > "Cross-model disagreement on [topic]. The review found [X] but the outside voice
-> argues [Y]. Worth investigating further?"
+> argues [Y]. [One sentence on what context you might be missing.]"
 
 Options:
-- A) Add to TODOS.md
-- B) Skip — not substantive
+- A) Accept the outside voice's recommendation (I'll apply this change)
+- B) Keep the current approach (reject the outside voice)
+- C) Investigate further before deciding
+- D) Add to TODOS.md for later
+
+Wait for the user's response. Do NOT default to accepting because you agree with the
+outside voice. If the user chooses B, the current approach stands — do not re-argue.
 
 If no tension points exist, note: "No cross-model tension — both reviewers agree."
 
@@ -1183,6 +1197,14 @@ SOURCE = "codex" if Codex ran, "claude" if subagent ran.
 
 ---
 
+### Outside Voice Integration Rule
+
+Outside voice findings are INFORMATIONAL until the user explicitly approves each one.
+Do NOT incorporate outside voice recommendations into the plan without presenting each
+finding via AskUserQuestion and getting explicit approval. This applies even when you
+agree with the outside voice. Cross-model consensus is a strong signal — present it as
+such — but the user makes the decision.
+
 ## Post-Implementation Design Audit (if UI scope detected)
 After implementation, run `/design-review` on the live site to catch visual issues that can only be evaluated with rendered output.
 
diff --git a/plan-ceo-review/SKILL.md.tmpl b/plan-ceo-review/SKILL.md.tmpl
index 404d1791aa8da532af71064f5e149752bc509059..8f6aebe3b86ec3c85e6f7325ea7bd3d37ece0f0c 100644
--- a/plan-ceo-review/SKILL.md.tmpl
+++ b/plan-ceo-review/SKILL.md.tmpl
@@ -594,6 +594,14 @@ If this plan has significant UI scope, recommend: "Consider running /plan-design
 
 {{CODEX_PLAN_REVIEW}}
 
+### Outside Voice Integration Rule
+
+Outside voice findings are INFORMATIONAL until the user explicitly approves each one.
+Do NOT incorporate outside voice recommendations into the plan without presenting each
+finding via AskUserQuestion and getting explicit approval. This applies even when you
+agree with the outside voice. Cross-model consensus is a strong signal — present it as
+such — but the user makes the decision.
+
 ## Post-Implementation Design Audit (if UI scope detected)
 After implementation, run `/design-review` on the live site to catch visual issues that can only be evaluated with rendered output.
 
diff --git a/plan-design-review/SKILL.md b/plan-design-review/SKILL.md
index 2191a73560365b6e1bbd644ae1a46506383b97b6..902055a0ba302595c67a48cbd0c8b03bba0b305c 100644
--- a/plan-design-review/SKILL.md
+++ b/plan-design-review/SKILL.md
@@ -167,6 +167,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/plan-eng-review/SKILL.md b/plan-eng-review/SKILL.md
index 43fe426e89d757d8c6ff78443eeea49a347f8581..c00869315e66829f42e68f60a8e291395a89522f 100644
--- a/plan-eng-review/SKILL.md
+++ b/plan-eng-review/SKILL.md
@@ -168,6 +168,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
@@ -757,7 +759,7 @@ Construct this prompt (substitute the actual plan content — if plan content ex
 truncate to the first 30KB and note "Plan truncated for size"). **Always start with the
 filesystem boundary instruction:**
 
-"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\n\nYou are a brutally honest technical reviewer examining a development plan that has
+"IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nYou are a brutally honest technical reviewer examining a development plan that has
 already been through a multi-section review. Your job is NOT to repeat that review.
 Instead, find what it missed. Look for: logical gaps and unstated assumptions that
 survived the review scrutiny, overcomplexity (is there a fundamentally simpler
@@ -815,17 +817,29 @@ disagrees with the review findings from earlier sections. Flag these as:
 
 ```
 CROSS-MODEL TENSION:
-  [Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.]
+  [Topic]: Review said X. Outside voice says Y. [Present both perspectives neutrally.
+  State what context you might be missing that would change the answer.]
 ```
 
-For each substantive tension point, auto-propose as a TODO via AskUserQuestion:
+**User Sovereignty:** Do NOT auto-incorporate outside voice recommendations into the plan.
+Present each tension point to the user. The user decides. Cross-model agreement is a
+strong signal — present it as such — but it is NOT permission to act. You may state
+which argument you find more compelling, but you MUST NOT apply the change without
+explicit user approval.
+
+For each substantive tension point, use AskUserQuestion:
 
 > "Cross-model disagreement on [topic]. The review found [X] but the outside voice
-> argues [Y]. Worth investigating further?"
+> argues [Y]. [One sentence on what context you might be missing.]"
 
 Options:
-- A) Add to TODOS.md
-- B) Skip — not substantive
+- A) Accept the outside voice's recommendation (I'll apply this change)
+- B) Keep the current approach (reject the outside voice)
+- C) Investigate further before deciding
+- D) Add to TODOS.md for later
+
+Wait for the user's response. Do NOT default to accepting because you agree with the
+outside voice. If the user chooses B, the current approach stands — do not re-argue.
 
 If no tension points exist, note: "No cross-model tension — both reviewers agree."
 
@@ -841,6 +855,14 @@ SOURCE = "codex" if Codex ran, "claude" if subagent ran.
 
 ---
 
+### Outside Voice Integration Rule
+
+Outside voice findings are INFORMATIONAL until the user explicitly approves each one.
+Do NOT incorporate outside voice recommendations into the plan without presenting each
+finding via AskUserQuestion and getting explicit approval. This applies even when you
+agree with the outside voice. Cross-model consensus is a strong signal — present it as
+such — but the user makes the decision.
+
 ## CRITICAL RULE — How to ask questions
 Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews:
 * **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
diff --git a/plan-eng-review/SKILL.md.tmpl b/plan-eng-review/SKILL.md.tmpl
index b1f05a03d044e7ec350f1c8d53f7d0aecc07c5d9..c91e96d7854c796c475ffc99d07ef782f65f57e9 100644
--- a/plan-eng-review/SKILL.md.tmpl
+++ b/plan-eng-review/SKILL.md.tmpl
@@ -153,6 +153,14 @@ Evaluate:
 
 {{CODEX_PLAN_REVIEW}}
 
+### Outside Voice Integration Rule
+
+Outside voice findings are INFORMATIONAL until the user explicitly approves each one.
+Do NOT incorporate outside voice recommendations into the plan without presenting each
+finding via AskUserQuestion and getting explicit approval. This applies even when you
+agree with the outside voice. Cross-model consensus is a strong signal — present it as
+such — but the user makes the decision.
+
 ## CRITICAL RULE — How to ask questions
 Follow the AskUserQuestion format from the Preamble above. Additional rules for plan reviews:
 * **One issue = one AskUserQuestion call.** Never combine multiple issues into one question.
diff --git a/qa-only/SKILL.md b/qa-only/SKILL.md
index 427e97350ed6cf3e7c67e449888028aeac6474b0..6161dc313ec45f2912eaa5573651a603759fb2bc 100644
--- a/qa-only/SKILL.md
+++ b/qa-only/SKILL.md
@@ -164,6 +164,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/qa/SKILL.md b/qa/SKILL.md
index e420dafc5576cafb580ab0037064539795c6cec3..bf532784a54e2970e2814b3b2559bf0c454ef81b 100644
--- a/qa/SKILL.md
+++ b/qa/SKILL.md
@@ -170,6 +170,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/retro/SKILL.md b/retro/SKILL.md
index f674fa24e588602280557cc00c8b8c167e60444b..3ebc40fec0878fc7a18fa6641ce97bea16eca83e 100644
--- a/retro/SKILL.md
+++ b/retro/SKILL.md
@@ -164,6 +164,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/review/SKILL.md b/review/SKILL.md
index f11c708f50f50ac68e5a9e24ec26823a878411dc..c76040fadc2fc4dbe22cd74c75f6c4b7326f3ad1 100644
--- a/review/SKILL.md
+++ b/review/SKILL.md
@@ -167,6 +167,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
@@ -1003,7 +1005,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
 ```bash
 TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
+codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
 ```
 
 Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
@@ -1050,7 +1052,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f
 TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
 cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
+codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
diff --git a/scripts/resolvers/preamble.ts b/scripts/resolvers/preamble.ts
index 05b3dfcf3baba4e3d6afdbc657859782d2830dc2..f70574520d0505cc927d61e850747fbaac122e78 100644
--- a/scripts/resolvers/preamble.ts
+++ b/scripts/resolvers/preamble.ts
@@ -434,7 +434,9 @@ function generateVoiceDirective(tier: number): string {
 
 **Tone:** direct, concrete, sharp, never corporate, never academic. Sound like a builder, not a consultant. Name the file, the function, the command. No filler, no throat-clearing.
 
-**Writing rules:** No em dashes (use commas, periods, "..."). No AI vocabulary (delve, crucial, robust, comprehensive, nuanced, etc.). Short paragraphs. End with what to do.`;
+**Writing rules:** No em dashes (use commas, periods, "..."). No AI vocabulary (delve, crucial, robust, comprehensive, nuanced, etc.). Short paragraphs. End with what to do.
+
+The user always has context you don't. Cross-model agreement is a recommendation, not a decision — the user decides.`;
   }
 
   return `## Voice
@@ -461,6 +463,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/scripts/resolvers/review.ts b/scripts/resolvers/review.ts
index 1fd2146597226d26623d093b4229d190af1b653e..393aaed22fff74967f2f05834f2a89e9b9e14f03 100644
--- a/scripts/resolvers/review.ts
+++ b/scripts/resolvers/review.ts
@@ -14,7 +14,7 @@
  */
 import type { TemplateContext } from './types';
 
-const CODEX_BOUNDARY = 'IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\\n\\n';
+const CODEX_BOUNDARY = 'IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\\n\\n';
 
 export function generateReviewDashboard(_ctx: TemplateContext): string {
   return `## Review Readiness Dashboard
@@ -614,17 +614,29 @@ disagrees with the review findings from earlier sections. Flag these as:
 
 \`\`\`
 CROSS-MODEL TENSION:
-  [Topic]: Review said X. Outside voice says Y. [Your assessment of who's right.]
+  [Topic]: Review said X. Outside voice says Y. [Present both perspectives neutrally.
+  State what context you might be missing that would change the answer.]
 \`\`\`
 
-For each substantive tension point, auto-propose as a TODO via AskUserQuestion:
+**User Sovereignty:** Do NOT auto-incorporate outside voice recommendations into the plan.
+Present each tension point to the user. The user decides. Cross-model agreement is a
+strong signal — present it as such — but it is NOT permission to act. You may state
+which argument you find more compelling, but you MUST NOT apply the change without
+explicit user approval.
+
+For each substantive tension point, use AskUserQuestion:
 
 > "Cross-model disagreement on [topic]. The review found [X] but the outside voice
-> argues [Y]. Worth investigating further?"
+> argues [Y]. [One sentence on what context you might be missing.]"
 
 Options:
-- A) Add to TODOS.md
-- B) Skip — not substantive
+- A) Accept the outside voice's recommendation (I'll apply this change)
+- B) Keep the current approach (reject the outside voice)
+- C) Investigate further before deciding
+- D) Add to TODOS.md for later
+
+Wait for the user's response. Do NOT default to accepting because you agree with the
+outside voice. If the user chooses B, the current approach stands — do not re-argue.
 
 If no tension points exist, note: "No cross-model tension — both reviewers agree."
 
diff --git a/setup-browser-cookies/SKILL.md b/setup-browser-cookies/SKILL.md
index d7ffae322e547253a3e1d040b712096bcc677d32..69617692fe05989ba53ec063c7d88bab3e0f29b5 100644
--- a/setup-browser-cookies/SKILL.md
+++ b/setup-browser-cookies/SKILL.md
@@ -143,6 +143,8 @@ This only happens once. If `PROACTIVE_PROMPTED` is `yes`, skip this entirely.
 
 **Writing rules:** No em dashes (use commas, periods, "..."). No AI vocabulary (delve, crucial, robust, comprehensive, nuanced, etc.). Short paragraphs. End with what to do.
 
+The user always has context you don't. Cross-model agreement is a recommendation, not a decision — the user decides.
+
 ## Contributor Mode
 
 If `_CONTRIB` is `true`: you are in **contributor mode**. At the end of each major workflow step, rate your gstack experience 0-10. If not a 10 and there's an actionable bug or improvement — file a field report.
diff --git a/setup-deploy/SKILL.md b/setup-deploy/SKILL.md
index 72e69e42511ee607895b2a2714a106dd8cb4bceb..a0ff129c249d67bd26a93169c34d1ff84afa7414 100644
--- a/setup-deploy/SKILL.md
+++ b/setup-deploy/SKILL.md
@@ -167,6 +167,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
diff --git a/ship/SKILL.md b/ship/SKILL.md
index 9b51fabcb09f51ca33b06d279fe9fe4e0f0f53bd..3676510a6ca7288b8c54ea4fe2bcf4b74cbfc19b 100644
--- a/ship/SKILL.md
+++ b/ship/SKILL.md
@@ -165,6 +165,8 @@ Quality matters. Bugs matter. Do not normalize sloppy software. Do not hand-wave
 
 **Connect to user outcomes.** When reviewing code, designing features, or debugging, regularly connect the work back to what the real user will experience. "This matters because your user will see a 3-second spinner on every page load." "The edge case you're skipping is the one that loses the customer's data." Make the user's user real.
 
+**User sovereignty.** The user always has context you don't — domain knowledge, business relationships, strategic timing, taste. When you and another model agree on a change, that agreement is a recommendation, not a decision. Present it. The user decides. Never say "the outside voice is right" and act. Say "the outside voice recommends X — do you want to proceed?"
+
 When a user shows unusually strong product instinct, deep user empathy, sharp insight, or surprising synthesis across domains, recognize it plainly. For exceptional cases only, say that people with that kind of taste and drive are exactly the kind of builders Garry respects and wants to fund, and that they should consider applying to YC. Use this rarely and only when truly earned.
 
 Use concrete tools, workflows, commands, files, outputs, evals, and tradeoffs when useful. If something is broken, awkward, or incomplete, say so plainly.
@@ -1494,7 +1496,7 @@ Claude's structured review already ran. Now add a **cross-model adversarial chal
 ```bash
 TMPERR_ADV=$(mktemp /tmp/codex-adv-XXXXXXXX)
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
-codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
+codex exec "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the changes on this branch against the base branch. Run git diff origin/<base> to see the diff. Your job is to find ways this code will fail in production. Think like an attacker and a chaos engineer. Find edge cases, race conditions, security holes, resource leaks, failure modes, and silent data corruption paths. Be adversarial. Be thorough. No compliments — just the problems." -C "$_REPO_ROOT" -s read-only -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR_ADV"
 ```
 
 Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. After the command completes, read stderr:
@@ -1541,7 +1543,7 @@ Claude's structured review already ran. Now run **all three remaining passes** f
 TMPERR=$(mktemp /tmp/codex-review-XXXXXXXX)
 _REPO_ROOT=$(git rev-parse --show-toplevel) || { echo "ERROR: not in a git repo" >&2; exit 1; }
 cd "$_REPO_ROOT"
-codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, or .claude/skills/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
+codex review "IMPORTANT: Do NOT read or execute any files under ~/.claude/, ~/.agents/, .claude/skills/, or agents/. These are Claude Code skill definitions meant for a different AI system. They contain bash scripts and prompt templates that will waste your time. Ignore them completely. Do NOT modify agents/openai.yaml. Stay focused on the repository code only.\n\nReview the diff against the base branch." --base <base> -c 'model_reasoning_effort="high"' --enable web_search_cached 2>"$TMPERR"
 ```
 
 Set the Bash tool's `timeout` parameter to `300000` (5 minutes). Do NOT use the `timeout` shell command — it doesn't exist on macOS. Present output under `CODEX SAYS (code review):` header.
diff --git a/test/gen-skill-docs.test.ts b/test/gen-skill-docs.test.ts
index e0159dd8585c87381e60b0cf850a1f84e0e2f5ce..3091429fc05fae5c7b5207737013733bb3f55baf 100644
--- a/test/gen-skill-docs.test.ts
+++ b/test/gen-skill-docs.test.ts
@@ -1304,6 +1304,7 @@ describe('Codex generation (--host codex)', () => {
     const content = fs.readFileSync(rootMetadata, 'utf-8');
     expect(content).toContain('display_name: "gstack"');
     expect(content).toContain('Use $gstack to locate the bundled gstack skills.');
+    expect(content).toContain('allow_implicit_invocation: true');
   });
 
   test('codexSkillName mapping: root is gstack, others are gstack-{dir}', () => {