~cytrogen/gstack

ref: be96ff5ce771f67d4502ea4b2fbbcba53654cdcf gstack/review/specialists/data-migration.md -rw-r--r-- 2.2 KiB
be96ff5c — Garry Tan feat: /plan-devex-review + /devex-review — DX review skills (v0.15.3.0) (#784) a month ago

#Data Migration Specialist Review Checklist

Scope: When SCOPE_MIGRATIONS=true Output: JSON objects, one finding per line. Schema: {"severity":"CRITICAL|INFORMATIONAL","confidence":N,"path":"file","line":N,"category":"data-migration","summary":"...","fix":"...","fingerprint":"path:line:data-migration","specialist":"data-migration"} If no findings: output NO FINDINGS and nothing else.


#Categories

#Reversibility

  • Can this migration be rolled back without data loss?
  • Is there a corresponding down/rollback migration?
  • Does the rollback actually undo the change or just no-op?
  • Would rolling back break the current application code?

#Data Loss Risk

  • Dropping columns that still contain data (add deprecation period first)
  • Changing column types that truncate data (varchar(255) → varchar(50))
  • Removing tables without verifying no code references them
  • Renaming columns without updating all references (ORM, raw SQL, views)
  • NOT NULL constraints added to columns with existing NULL values (needs backfill first)

#Lock Duration

  • ALTER TABLE on large tables without CONCURRENTLY (PostgreSQL)
  • Adding indexes without CONCURRENTLY on tables with >100K rows
  • Multiple ALTER TABLE statements that could be combined into one lock acquisition
  • Schema changes that acquire exclusive locks during peak traffic hours

#Backfill Strategy

  • New NOT NULL columns without DEFAULT value (requires backfill before constraint)
  • New columns with computed defaults that need batch population
  • Missing backfill script or rake task for existing records
  • Backfill that updates all rows at once instead of batching (locks table)

#Index Creation

  • CREATE INDEX without CONCURRENTLY on production tables
  • Duplicate indexes (new index covers same columns as existing one)
  • Missing indexes on new foreign key columns
  • Partial indexes where a full index would be more useful (or vice versa)

#Multi-Phase Safety

  • Migrations that must be deployed in a specific order with application code
  • Schema changes that break the current running code (deploy code first, then migrate)
  • Migrations that assume a deploy boundary (old code + new schema = crash)
  • Missing feature flag to handle mixed old/new code during rolling deploy