Expand and Contract

September 8, 2024

Most breaking changes aren’t hard because the destination is unclear. They’re hard because the old shape is still live while you’re trying to introduce the new one.

Expand and contract, sometimes called parallel change, is the pattern I reach for when I need to change an interface or schema without taking the system down. Don’t replace the old thing in one shot. Add the new thing beside it, migrate traffic in stages, then remove the old one.

Expand

The expand phase adds the new shape in parallel with the old one. If this is an API, that might mean a new field or a compatibility endpoint. If it’s a database, it usually means adding new nullable columns or a new table instead of mutating the old structure in place.

A small schema change makes it concrete. Say users.name needs to become users.first_name and users.last_name.

The schema grows first. The application still reads from the old column, but writes go to both places.

  flowchart LR
    App["<strong>Application</strong>"] -->|read| Old[("<strong>Old</strong><br/><span class='mermaid-detail'>users.name</span>")]
    App -->|write| Old
    App -->|write| New[("<strong>New</strong><br/><span class='mermaid-detail'>users.first_name<br/>users.last_name</span>")]

That keeps external behavior stable while the new path starts receiving live data.

Migrate

Migration is where most of the real work lives. You backfill existing data and move callers over gradually while production stays stable. Feature flags are useful here because they let you separate write cutover from read cutover. Dual writes can start early. Reads can lag until you’re confident.

For the same schema change, migration usually means running a backfill for old rows while production keeps dual-writing. Reads can stay on users.name until you’ve verified that the split columns are correct.

  flowchart LR
    Backfill["<strong>Backfill job</strong>"] -->|copy and split| New[("<strong>New</strong><br/><span class='mermaid-detail'>users.first_name<br/>users.last_name</span>")]
    App["<strong>Application</strong>"] -->|read| Old[("<strong>Old</strong><br/><span class='mermaid-detail'>users.name</span>")]
    App -->|dual write| Old
    App -->|dual write| New

The exact order can vary. Sometimes you dual-write first and then backfill. Sometimes you backfill first and only then update clients. What matters is that each intermediate state is deployable and backward compatible.

This is also the part teams underestimate. During migration you often need software that clearly doesn’t belong in the final design: adapters, compatibility endpoints, event transformers. The kind of glue code that exists only to bridge two shapes. Transitional Architecture gives this work the right name. It’s scaffolding. You build it to make the change safe, then you remove it when the change is done. If you’ve seen a Strangler Fig migration, it’s the same idea.

It’s also worth being honest about rollback. Early in the migration, backing out is usually straightforward. Once the new path starts storing information the old path can’t represent, rollback gets more expensive and may mean losing newer data. That’s normal, but it should be a conscious step.

Contract

The contract phase is where you collect the win. Once every caller reads and writes through the new path, you can stop carrying the old one. Drop the legacy columns, remove the compatibility code, and let the temporary abstraction collapse.

  flowchart LR
    App["<strong>Application</strong>"] -->|read and write| New[("<strong>New</strong><br/><span class='mermaid-detail'>users.first_name<br/>users.last_name</span>")]
    Cleanup["<strong>Cleanup</strong>"] -->|drop| Old[("<strong>Old</strong><br/><span class='mermaid-detail'>users.name</span>")]

This sounds obvious, but it’s the phase that gets skipped most often. Teams finish the cutover, leave the old path in place for “just a bit longer”, and silently lock in permanent complexity. Dual writes are a tool, not the destination. Expand and contract only works if you actually contract.

Mental model

That temporary work can feel wasteful because you know some of it will be deleted. In practice it’s usually a bargain. You get smaller releases and easier rollback. Parts of the new system can ship before the full migration is complete, which means earlier value and better observability during the cutover.

The mechanics of dual writes and backfills matter less than the discipline behind them: make a large change safe by turning it into a series of small reversible ones. Each phase should be releasable on its own, and every temporary component should have an owner and a removal plan. With that in place, the scaffolding is a deliberate tool for evolving a live system, not a mess you apologize for later.