Constraints without enforcement: still worth it?

Non-enforced constraints are useful when they tell the truth. They act as semantic contracts and optimizer hints, but they become actively dangerous the moment the warehouse is asked to trust a lie.

Operating principle Data

By Ivan RichterLinkedIn

Last updated: Mar 29, 2026

5 min read

bigquery data-modeling data-trust

On this page

Non-enforced doesn’t mean meaningless

BigQuery doesn’t enforce primary keys or foreign keys. That’s the part everybody repeats, usually with the tone of someone announcing they’ve found the loophole in the system. Fine. The more useful question is what a declared constraint is still worth when the warehouse won’t police it for you.

Quite a lot, if the declaration is honest.

A good constraint does two jobs at once. It tells the next person reading the model what one row is supposed to be, and it tells the engine what shape it’s allowed to assume. A primary key says “this is identity.” A foreign key says “this relationship is stable enough to depend on.” When those claims match reality, the table gets easier to read and, sometimes, the warehouse gets more room to optimize. That’s a useful trade. You get clearer semantics and, in some cases, better execution.

The damage starts when people hear “not enforced” and treat declarations like aspirational metadata. A non-enforced constraint is still a claim, and once you publish it, you’re asking both humans and the warehouse to trust that claim.

A key declaration is a statement about identity

Declaring a key is one of the strongest things a model can say about itself.

If you declare a primary key, you’re saying the grain is settled. You’re saying one row represents one thing, that the thing has a stable boundary, and that duplicates aren’t part of normal behavior. If you declare a foreign key, you’re saying the parent-child relationship is real enough that downstream logic can treat it as a dependable shape rather than a best-effort association.

That only helps when the model has already earned the right to say it. If the grain still takes a whiteboard session and three caveats to explain, it isn’t ready. If the entity boundary keeps shifting depending on who built the upstream table, it isn’t ready. If the relationship holds only on clean days and quietly falls apart during backfills, late-arriving data, or partial loads, it isn’t ready.

The dangerous part is not the missing enforcement

The usual complaint is still technically true: BigQuery won’t stop bad data from landing just because you declared a key. Damage starts when a key gets declared anyway and then left in place long after the model stopped deserving it.

BigQuery can use declared constraints for optimization. That’s helpful when the declaration is true. When it isn’t, you’ve handed the warehouse permission to trust a bad assumption. The schema has moved from vague to confidently wrong, which is a much worse failure mode. Vagueness slows people down. False certainty lets them break things faster.

So the standard has to be stricter than people usually want. If duplicates still show up as part of normal system behavior, don’t declare uniqueness. If missing parents are common enough that everybody quietly builds around them, don’t declare the foreign key. If the grain still shifts when a new source lands or a reporting definition changes, leave the constraint out and fix the model first. Same rule as with unique keys in incrementals: you don’t publish stability because you’d prefer the table to behave that way. You publish it after the table already does.

Validation has to exist somewhere else

Because BigQuery won’t enforce the rule at write time, you need some other mechanism that does the enforcing. Tests, assertions, reconciliation queries, load checks, ownership, review gates, or some combination of them. The model needs a real way to notice drift before drift becomes normal.

Without that, a non-enforced key is just a hopeful sentence in DDL.

At that point warehouse design often turns theatrical. Teams declare constraints because the model “should” have them, then never put any validation around the claim. The table looks more finished, everybody gets to feel tidy, and nothing about the actual risk changes. The first time uniqueness drifts or referential integrity starts failing, the constraint stays in place because removing it would force an awkward admission: the schema was describing the model people wanted, not the one they actually had.

That’s useless at best and dangerous at worst. The warehouse isn’t helped by statements that only sound right in a design review.

What truthful constraints buy you

The table becomes easier to understand without reverse-engineering its behavior from sample data and query history. A reader can see the grain. They can see where identity lives. They can see which relationships are supposed to hold. A lot of analytical work gets built on top of tables that technically run but never state their own semantics clearly. Then everyone wonders why downstream logic keeps splintering.

There is also an execution upside. BigQuery can make stronger decisions when it’s allowed to assume the declared shape is true. That’s useful, but it’s secondary. We wouldn’t declare constraints for optimizer hints alone. The contract comes first. Performance is what you get after the contract is already trustworthy.

And none of this rescues a weak model. Constraints don’t fix muddled grain. They don’t clean up unreliable upstream ingestion. They don’t replace sane table design, reviewable transformations, or validation. They help a good model speak plainly. That’s the job.

create table mart.orders (
  order_id string not null,
  customer_id string,
  order_date date,
  net_revenue numeric,
  primary key (order_id) not enforced,
);

create table mart.order_lines (
  order_line_id string not null,
  order_id string not null,
  sku_id string,
  quantity int64,
  foreign key (order_id) references mart.orders(order_id) not enforced,
);

That schema is useful only if mart.orders.order_id is genuinely unique and mart.order_lines.order_id really does reference valid parent rows under normal operating conditions. If those assumptions fail regularly and the declarations stay in place, the schema has stopped describing the warehouse. Now it’s inventing one.

How we treat them

We don’t declare keys because the model would look more mature with them. We declare them after the identity story is already stable.

That means the grain can be stated plainly. The entity boundary doesn’t change every time somebody asks a new reporting question. Duplicates are actually exceptional. Parent-child relationships hold under real system behavior, including the ugly parts like incremental updates, late data, and operational churn. And there is validation somewhere outside good intentions.

If that standard isn’t met, the constraint stays out. No compromise, no schema theater, no pretending the declaration itself will pull the model into shape later. It won’t.

The rule

Non-enforced constraints are worth declaring when they tell the truth and keep telling it.

Declare them after identity is real. Back them with validation. Treat them as claims the warehouse is allowed to trust. If the declaration is still aspirational, leave it out.

More in this domain: Data

Browse all

BigQuery cost guardrails that won't break your teams

BigQuery cost control works when guardrails are designed around workload shape and blast radius, not around shaming whoever happened to run the last expensive query.

On-demand vs slots: the SME decision boundary

For SMEs, the question is not which BigQuery pricing model is more sophisticated. The question is when workload classes have become distinct enough to deserve different compute lanes.

Partitioning defaults for event tables that don't lie

Partitioning is not just a performance tweak. It is one of the cheapest ways to control scan blast radius, but only if the partition contract matches how the table is actually queried.

Physical vs logical storage: a dataset classification rule for SMEs

Physical versus logical storage billing is not a warehouse philosophy debate. It is a dataset classification choice based on change rate, retention behavior, and how much storage churn the table creates.

Reservations for workload isolation: the minimal setup

Reservation design for SMEs is usually not an enterprise org chart. It is a small blast-radius pattern that keeps BI, batch, and sandbox work from bullying each other.

Related patterns

BigQuery cost spikes usually come from table shape, not queries

When BigQuery spend jumps, the cause is usually in model shape, weak incremental design, or unnecessary reprocessing long before it's a single bad query.

Unique keys are not optional in analytical incrementals

Incremental analytical models need an explicit notion of row identity. Without it, merges drift, updates go missing, and review of correctness turns into guesswork.

Streaming buffer is your hidden constraint

When BigQuery streaming pain shows up as a DML error, the real problem is usually workload shape. Streaming wants append-and-reconcile thinking, not row-by-row sync fantasies.

How we prevent stale rows in incremental fact models

Incremental fact models stay trustworthy only when record identity, reprocessing rules, and cleanup boundaries are designed on purpose instead of patched after drift shows up.