How we decide whether a transformation belongs in SQLX, code, or orchestration
We keep transformations in SQLX by default, move to code when the logic truly stops being legible in SQL, and keep orchestration for sequencing rather than business meaning.
On this page
The default
We keep transformations in SQLX by default, use code when code is actually clearer, and keep orchestration for coordination rather than meaning.
That isn’t a language purity rule. It’s a visibility rule. We want the behavior of the data platform to live in the layer where a reviewer would naturally expect to find it.
If someone has to read scheduler branches, helper internals, and runtime flags just to understand what a table means, the boundary is already wrong.
Why SQLX is the default
A large share of transformation work really is best expressed as named models with visible dependencies.
When the job is to define grain, join sources, apply business filters, compute metrics, and materialize a stable table shape, SQLX is usually the honest layer. The model lives where people expect it to live. Its inputs are visible. Its shape is visible. A reviewer can open the file and see the logic that defines the table instead of reconstructing it from scattered steps elsewhere.
That’s the practical benefit behind reviewable transformations. The point isn’t that SQLX is somehow virtuous. The point is that the repo stays inspectable when the transformation sits next to the model it defines.
What belongs in SQLX
Model semantics.
If the work is about what a row represents, how facts get joined, which filters define the business entity, or what columns make up the contract of the downstream table, we want that logic close to the model.
That includes the parts people sometimes try to smuggle elsewhere. Business filters. Join choices. Grain decisions. Metric definitions. Column-level shaping that changes what the table actually says. Those aren’t just implementation details. They’re part of the model.
That’s the same reasoning behind decision boundaries. The model should describe the business shape directly. It shouldn’t outsource meaning to runtime branches in some other layer because the SQL started feeling a little inconvenient.
What belongs in code
Some logic really is better in code.
Complex text normalization, reusable parsing, metadata expansion, API interaction, or helper behavior that would make the SQL materially harder to read may deserve code. Sometimes SQL can express the logic, but only in a way that makes the model worse to review. That’s the point where code earns its keep.
But the boundary still has to stay honest.
Code should support the model, not quietly become the place where the real business behavior lives. If the helper is doing something central to what the table means, and the SQLX now reads like a thin wrapper around mystery meat, the abstraction didn’t help. It just relocated the important part of the system to a place fewer people will inspect.
That’s the same judgment behind earned abstraction. An abstraction helps when it makes the calling layer easier to understand. If it turns the model into a riddle, it failed.
What belongs in orchestration
Orchestration should own sequencing, scheduling, dependency execution, retries, and operational controls. It should not own model semantics.
If the real transformation logic only becomes visible once someone reads task branches, runtime arguments, or workflow conditionals, then the workflow layer has become the semantic layer. At that point, the system might still run, but review gets ugly fast because the meaning of the data is no longer where the data is defined.
That’s what we’re trying to avoid with orchestration boundaries. Workflows should tell you when work runs, in what order, and under what operational conditions. They shouldn’t be the place where you discover what the table actually does.
It’s the same instinct behind Pulumi config boundaries. Different stack, same discipline. The layer that sequences or configures work shouldn’t quietly become the place where the real behavior lives.
Why this ages better
Declarative layers age better because they keep intent visible.
A model in SQLX is easier to inspect than behavior scattered across scripts, wrappers, and workflow flags. A reviewer can read the shape of the transformation without having to replay a miniature runtime in their head. That’s not about elegance. It’s about reducing how much hidden state a person has to carry just to review one change properly.
That’s part of why declarative models age better than script-driven piles. They don’t eliminate complexity. They just keep more of the important complexity where people can still see it.
The decision rule
Keep logic in the highest-level layer that can still express it clearly.
If SQLX is honest and readable, keep it there. If code makes the model clearer, use code. If the problem is just coordination, use orchestration and keep it thin.
The wrong answer isn’t using code. The wrong answer is hiding model behavior in a layer nobody would naturally review.
More in this domain: Data
Browse allBigQuery cost guardrails that won't break your teams
BigQuery cost control works when guardrails are designed around workload shape and blast radius, not around shaming whoever happened to run the last expensive query.
Constraints without enforcement: still worth it?
Non-enforced constraints are useful when they tell the truth. They act as semantic contracts and optimizer hints, but they become actively dangerous the moment the warehouse is asked to trust a lie.
On-demand vs slots: the SME decision boundary
For SMEs, the question is not which BigQuery pricing model is more sophisticated. The question is when workload classes have become distinct enough to deserve different compute lanes.
Partitioning defaults for event tables that don't lie
Partitioning is not just a performance tweak. It is one of the cheapest ways to control scan blast radius, but only if the partition contract matches how the table is actually queried.
Physical vs logical storage: a dataset classification rule for SMEs
Physical versus logical storage billing is not a warehouse philosophy debate. It is a dataset classification choice based on change rate, retention behavior, and how much storage churn the table creates.
Related patterns
Dataform vs. script piles: how we keep transformations reviewable
We prefer a declarative transformation layer over ad hoc script piles once warehouse logic becomes shared, incremental, and worth reviewing as a system.
Why declarative data models scale better than script-driven pipelines
Declarative modeling scales better because it keeps business shape, dependencies, and reviewable intent visible as the platform and team both grow.
Streaming buffer is your hidden constraint
When BigQuery streaming pain shows up as a DML error, the real problem is usually workload shape. Streaming wants append-and-reconcile thinking, not row-by-row sync fantasies.
Incremental models are only safe when change detection is explicit
Incremental models are trustworthy only when they can deliberately identify which records need another pass after late or changed upstream data shows up.