Expand and Contract
Most breaking changes aren’t hard because the destination is unclear. They’re hard because the old shape is still live while you’re trying to introduce the new one.
Expand and contract, sometimes called parallel change, is the pattern I reach for when I need to change an interface or schema without taking the system down. The core move is simple: don’t replace the old thing in one shot. Add the new thing beside it, migrate traffic in stages, then remove the old one.
Expand
The expand phase adds the new shape in parallel with the old one. If this is an API, that might mean a new field or a compatibility endpoint. If it’s a database, it usually means adding new nullable columns or a new table instead of mutating the old structure in place.
A small schema change makes it concrete. Say users.name needs to become users.first_name and users.last_name.
The schema grows first. The application still reads from the old column, but writes go to both places.
flowchart LR
App[Application] -->|read| Old[(users.name)]
App -->|write| Old
App -->|write| New[(users.first_name<br/>users.last_name)]
That keeps external behavior stable while the new path starts receiving live data.
Migrate
Migration is where most of the real work lives. You backfill existing data and move callers over gradually while production stays stable. Feature flags are useful here because they let you separate write cutover from read cutover. Dual writes can start early. Reads can lag until you’re confident.
For the same schema change, migration usually means running a backfill for old rows while production keeps dual-writing. Reads can stay on users.name until you’ve verified that the split columns are correct.
flowchart LR
Backfill[Backfill job] -->|copy and split| New[(users.first_name<br/>users.last_name)]
App[Application] -->|read| Old[(users.name)]
App -->|dual write| Old
App -->|dual write| New
The exact order can vary. Sometimes you dual-write first and then backfill. Sometimes you backfill first and only then update clients. What matters is that each intermediate state is deployable and backward compatible.
This is also the part teams underestimate. During migration you often need software that clearly doesn’t belong in the final design: adapters, compatibility endpoints, event transformers. The kind of glue code that exists only to bridge two shapes. Transitional Architecture gives this work the right name. It’s scaffolding. You build it to make the change safe, then you remove it when the change is done. If you’ve seen a Strangler Fig migration, it’s the same idea.
It’s also worth being honest about rollback. Early in the migration, backing out is usually straightforward. Once the new path starts storing information the old path can’t represent, rollback gets more expensive and may mean losing newer data. That’s normal, but it should be a conscious step.
Contract
The contract phase is where you collect the win. Once every caller reads and writes through the new path, you can stop carrying the old one. Drop the legacy columns, remove the compatibility code, and let the temporary abstraction collapse.
flowchart LR
App[Application] -->|read and write| New[(users.first_name<br/>users.last_name)]
Cleanup[Cleanup] -->|drop| Old[(users.name)]
This sounds obvious, but it’s the phase that gets skipped most often. Teams finish the cutover, leave the old path in place for “just a bit longer”, and silently lock in permanent complexity. Dual writes are a tool, not the destination. Expand and contract only works if you actually contract.
Mental model
That temporary work can feel wasteful because you know some of it will be deleted. In practice it’s usually a bargain. You get smaller releases and easier rollback. Parts of the new system can ship before the full migration is complete, which means earlier value and better observability during the cutover.
Expand and contract is less about the mechanics of dual writes or backfills and more about a mindset: make large changes safe by turning them into a series of small reversible ones. Each phase should be releasable on its own. Each temporary component should have an owner and a removal plan. If you do that, temporary stops being a messy compromise and starts being a deliberate tool for evolving live systems.