← Back to Patterns

Unique keys are not optional in analytical incrementals

Incremental analytical models need an explicit notion of row identity. Without it, merges drift, updates go missing, and review of correctness turns into guesswork.

By Ivan Richter LinkedIn

Last updated: Mar 24, 2026

5 min read

On this page

The rule

Analytical incrementals need an explicit unique key.

If we can’t say what one row represents and how that row should be matched on a later run, we don’t have a safe incremental model. We have a table that’s hoping append logic will somehow behave like state management.

That’s part of the broader reason we prefer structured modeling over ad hoc transformations in reviewable transformations. Once shared data products start changing over time, structure stops being aesthetic and starts being operational.

Row identity is the contract

A unique key isn’t just something the warehouse needs for a merge statement. It’s the contract that explains what one row means.

If a row represents one order, the key should identify that order. If it represents one order line, the key should identify that order line. If it represents one user-day state, the key should identify that user and that day. The key should match the grain of the model instead of being a random technical convenience somebody pasted in to make the merge compile.

Without that contract, incremental behavior gets vague fast. Which row gets updated? Which row should survive? What counts as a correction versus a new fact? Reviewers can’t answer those questions from the model because the model never made identity explicit in the first place.

A merge only works if the key is honest

People talk about “doing a merge” like the merge itself solved the hard part. It didn’t.

A merge is only as sound as the identity rule behind it. If the key doesn’t actually represent the analytical entity in the table, the merge just applies confidence to the wrong boundary. The system still updates rows. It just isn’t clear that it’s updating the right ones.

That’s why weak keys are so dangerous. They make the model look more disciplined than it is. The SQL seems formal. The materialization seems intentional. Meanwhile the table is drifting because the identity rule was never strong enough to support the behavior people expected from it.

Late change still needs a path

A unique key is necessary, but it isn’t enough by itself.

If source records can change after first arrival, the model also needs a reliable way to notice those changes. Late-arriving dimensions, corrected statuses, refunded orders, changed child entities, and source replays do not care that you wrote a merge. They only care whether the incremental logic knows which existing rows need another pass.

That’s why change detection matters. A key tells you how to match. Change detection tells you when to revisit. You need both.

Without that second piece, the model might preserve identity cleanly and still miss the moments where an old row stopped being current.

Weak identity creates ugly failure modes

A lot of ugly incremental behavior starts with pretending a key is close enough.

Maybe the model keys on order_id even though one order can fan out to multiple rows. Maybe it keys on a synthetic hash that changes whenever a non-essential field moves. Maybe it appends blindly because nobody could agree what should count as the same record. None of those are harmless shortcuts. They’re how teams end up with duplicated facts, partial updates, and tables that quietly get less trustworthy over time.

By the time people notice, the symptom is usually stale rows. The warehouse still returns answers. The answers just get increasingly wrong around the edges where real systems actually change.

The grain has to make the key obvious

The right key starts with the right grain.

We try to model around the business entity or decision the row is supposed to represent, not around whatever cleanup residue happened to fall out of staging. That’s the same stance behind decision boundaries. If a row exists only because a landing step split, flattened, or reformatted something in a convenient way, the key is often compensating for the wrong model shape.

A healthy key usually feels obvious once the model intent is clear. When it doesn’t, that usually means the table still hasn’t decided what one row is supposed to mean.

What we do when there is no real key

If the source doesn’t provide a clean identifier, we don’t wave the problem away and call the model incremental anyway.

Sometimes we build a stable composite key. Sometimes we remodel the grain. Sometimes we decide the transformation hasn’t earned an incremental path yet. Those options may be inconvenient, but they’re still cheaper than carrying a table whose identity rules are fuzzy and only seem to work while nobody looks too closely.

The worst option is pretending ambiguity is acceptable because the table “is mostly append-only.” That’s usually how a temporary shortcut gets promoted into platform behavior.

Why this becomes a review problem

A weak or missing key doesn’t just create bad data. It makes the model harder to review.

Once identity is fuzzy, reviewers can’t inspect incremental behavior with confidence because the most basic question stays unresolved. What exactly is being updated over time? If the answer is hand-wavy, the rest of the logic becomes hand-wavy too. Merge rules, replacement logic, late change handling, and stale-row prevention all end up sitting on top of a boundary the model never made explicit.

That’s why unique keys are not a warehouse formality. They’re part of what makes the model legible.

The point

Unique keys are not optional because incrementals are not just about processing less data. They’re about updating the right record on purpose.

If a model doesn’t have a real identity contract, it hasn’t earned an incremental path yet.

More in this domain: Data

Browse all

Related patterns