Database Migration Strategies: How to Change Your Schema Without Breaking Production

Database schema migrations are one of the highest-risk operations in web application development — a poorly executed migration can take down production, cause data loss, or create hours of downtime. Here are the patterns that work.

Why Database Migrations Are Dangerous

Databases in production are live — new data is being written while you’re trying to change the schema. The danger: if you add a NOT NULL column without a default value, existing rows are invalid; if you rename a column and deploy the application code and database migration separately, one of them will be reading a column that doesn’t exist yet. At scale, an ALTER TABLE on a 50-million-row table can lock the table for minutes, causing a cascading failure as application requests queue up. The fundamental constraint: in most deployment systems, database migrations and application code cannot be deployed atomically — there is a window where either the new code runs against the old schema, or the new schema exists before the new code has deployed.

The Expand-Contract Pattern

The expand-contract (also called parallel change or strangler migration) pattern is the safest approach for zero-downtime schema changes. It has three phases: Phase 1 (Expand): add the new schema elements without removing old ones. Add the new column (with a default or nullable); start writing to both old and new columns; start backfilling old data into the new column. The old application code still works because the old column still exists. Phase 2 (Migrate): once all rows are backfilled and both columns are in sync, deploy new application code that reads from the new column instead of the old one. Phase 3 (Contract): remove the old column. The old application code is no longer deployed, so nothing reads the old column; the migration is complete. Timeline: a column rename that looks instantaneous might take weeks in production if done safely via expand-contract. This is the correct trade-off for zero downtime.

Specific Safe Migration Techniques

Adding a NOT NULL column: never add a NOT NULL column without a default in a single migration on a large table. The safe approach: first, add the column as nullable (ALTER TABLE users ADD COLUMN new_field VARCHAR(255)); then backfill the values; then add the NOT NULL constraint; then add the application-level validation. For PostgreSQL 11+, adding a column with a non-volatile default is O(1) — it doesn’t rewrite the table. Renaming a column: expand-contract. Never rename directly in one step. Adding an index: in PostgreSQL, use `CREATE INDEX CONCURRENTLY` — this builds the index in the background without locking the table. In MySQL/MariaDB, pt-online-schema-change or gh-ost (GitHub’s online schema change tool) allow index creation on large tables without locks. Dropping a column: safe only after the column has been removed from all application code. A common mistake: removing the column from the ORM model without a database migration, causing the ORM to select `*` and explode when the column is still in the database. Remove from ORM first (deploy), then drop the column.

Migration Tools and Practices

Tools: Flyway and Liquibase are the Java ecosystem standards; Alembic for Python/SQLAlchemy; Rails ActiveRecord migrations; Prisma Migrate for Node.js/TypeScript. All provide versioned migration files with up/down operations. The critical practices: never edit a migration that has already run in any environment — create a new migration to fix it; test migrations on a production-size data clone before running on production (performance is dataset-size-dependent); always have a rollback plan (not just `down` migrations — sometimes reverting to the previous code version is the rollback); monitor the migration while it runs in production and be prepared to kill it if it causes unexpected locks.

上一篇 了解德国保险:什么是强制性vs可选的
下一篇 数据库迁移策略:如何在不破坏生产环境的情况下更改模式