~cytrogen/gstack

ref: f3ee0ee28a2c77922302d6e8c30130325737baf5 gstack/review/checklist.md -rw-r--r-- 6.7 KiB
f3ee0ee2 — Garry Tan feat: QA restructure, browser ref staleness, eval efficiency metrics (v0.4.0) (#83) a month ago

#Pre-Landing Review Checklist

#Instructions

Review the git diff origin/main output for the issues listed below. Be specific — cite file:line and suggest fixes. Skip anything that's fine. Only flag real problems.

Two-pass review:

  • Pass 1 (CRITICAL): Run SQL & Data Safety and LLM Output Trust Boundary first. These can block /ship.
  • Pass 2 (INFORMATIONAL): Run all remaining categories. These are included in the PR body but do not block.

Output format:

Pre-Landing Review: N issues (X critical, Y informational)

**CRITICAL** (blocking /ship):
- [file:line] Problem description
  Fix: suggested fix

**Issues** (non-blocking):
- [file:line] Problem description
  Fix: suggested fix

If no issues found: Pre-Landing Review: No issues found.

Be terse. For each issue: one line describing the problem, one line with the fix. No preamble, no summaries, no "looks good overall."


#Review Categories

#Pass 1 — CRITICAL

#SQL & Data Safety
  • String interpolation in SQL (even if values are .to_i/.to_f — use sanitize_sql_array or Arel)
  • TOCTOU races: check-then-set patterns that should be atomic WHERE + update_all
  • update_column/update_columns bypassing validations on fields that have or should have constraints
  • N+1 queries: .includes() missing for associations used in loops/views (especially avatar, attachments)
#Race Conditions & Concurrency
  • Read-check-write without uniqueness constraint or rescue RecordNotUnique; retry (e.g., where(hash:).first then save! without handling concurrent insert)
  • find_or_create_by on columns without unique DB index — concurrent calls can create duplicates
  • Status transitions that don't use atomic WHERE old_status = ? UPDATE SET new_status — concurrent updates can skip or double-apply transitions
  • html_safe on user-controlled data (XSS) — check any .html_safe, raw(), or string interpolation into html_safe output
#LLM Output Trust Boundary
  • LLM-generated values (emails, URLs, names) written to DB or passed to mailers without format validation. Add lightweight guards (EMAIL_REGEXP, URI.parse, .strip) before persisting.
  • Structured tool output (arrays, hashes) accepted without type/shape checks before database writes.

#Pass 2 — INFORMATIONAL

#Conditional Side Effects
  • Code paths that branch on a condition but forget to apply a side effect on one branch. Example: item promoted to verified but URL only attached when a secondary condition is true — the other branch promotes without the URL, creating an inconsistent record.
  • Log messages that claim an action happened but the action was conditionally skipped. The log should reflect what actually occurred.
#Magic Numbers & String Coupling
  • Bare numeric literals used in multiple files — should be named constants documented together
  • Error message strings used as query filters elsewhere (grep for the string — is anything matching on it?)
#Dead Code & Consistency
  • Variables assigned but never read
  • Version mismatch between PR title and VERSION/CHANGELOG files
  • CHANGELOG entries that describe changes inaccurately (e.g., "changed from X to Y" when X never existed)
  • Comments/docstrings that describe old behavior after the code changed
#LLM Prompt Issues
  • 0-indexed lists in prompts (LLMs reliably return 1-indexed)
  • Prompt text listing available tools/capabilities that don't match what's actually wired up in the tool_classes/tools array
  • Word/token limits stated in multiple places that could drift
#Test Gaps
  • Negative-path tests that assert type/status but not the side effects (URL attached? field populated? callback fired?)
  • Assertions on string content without checking format (e.g., asserting title present but not URL format)
  • .expects(:something).never missing when a code path should explicitly NOT call an external service
  • Security enforcement features (blocking, rate limiting, auth) without integration tests verifying the enforcement path works end-to-end
#Crypto & Entropy
  • Truncation of data instead of hashing (last N chars instead of SHA-256) — less entropy, easier collisions
  • rand() / Random.rand for security-sensitive values — use SecureRandom instead
  • Non-constant-time comparisons (==) on secrets or tokens — vulnerable to timing attacks
#Time Window Safety
  • Date-key lookups that assume "today" covers 24h — report at 8am PT only sees midnight→8am under today's key
  • Mismatched time windows between related features — one uses hourly buckets, another uses daily keys for the same data
#Type Coercion at Boundaries
  • Values crossing Ruby→JSON→JS boundaries where type could change (numeric vs string) — hash/digest inputs must normalize types
  • Hash/digest inputs that don't call .to_s or equivalent before serialization — { cores: 8 } vs { cores: "8" } produce different hashes
#View/Frontend
  • Inline <style> blocks in partials (re-parsed every render)
  • O(n*m) lookups in views (Array#find in a loop instead of index_by hash)
  • Ruby-side .select{} filtering on DB results that could be a WHERE clause (unless intentionally avoiding leading-wildcard LIKE)

#Gate Classification

CRITICAL (blocks /ship):          INFORMATIONAL (in PR body):
├─ SQL & Data Safety              ├─ Conditional Side Effects
├─ Race Conditions & Concurrency  ├─ Magic Numbers & String Coupling
└─ LLM Output Trust Boundary      ├─ Dead Code & Consistency
                                   ├─ LLM Prompt Issues
                                   ├─ Test Gaps
                                   ├─ Crypto & Entropy
                                   ├─ Time Window Safety
                                   ├─ Type Coercion at Boundaries
                                   └─ View/Frontend

#Suppressions — DO NOT flag these

  • "X is redundant with Y" when the redundancy is harmless and aids readability (e.g., present? redundant with length > 20)
  • "Add a comment explaining why this threshold/constant was chosen" — thresholds change during tuning, comments rot
  • "This assertion could be tighter" when the assertion already covers the behavior
  • Suggesting consistency-only changes (wrapping a value in a conditional to match how another constant is guarded)
  • "Regex doesn't handle edge case X" when the input is constrained and X never occurs in practice
  • "Test exercises multiple guards simultaneously" — that's fine, tests don't need to isolate every guard
  • Eval threshold changes (max_actionable, min scores) — these are tuned empirically and change constantly
  • Harmless no-ops (e.g., .reject on an element that's never in the array)
  • ANYTHING already addressed in the diff you're reviewing — read the FULL diff before commenting