How we decide whether a transformation belongs in SQLX, code, or orchestration

We keep transformations in SQLX by default, move to code when the logic truly stops being legible in SQL, and keep orchestration for sequencing rather than business meaning.

Decision memo Data

By Ivan RichterLinkedIn

Last updated: Mar 24, 2026

4 min read

dataform orchestration data-engineering

On this page

The default

We keep transformations in SQLX by default, use code when code is actually clearer, and keep orchestration for coordination rather than meaning.

The rule keeps behavior visible in the layer where a reviewer would naturally expect to find it.

If someone has to read scheduler branches, helper internals, and runtime flags just to understand what a table means, the boundary is already wrong.

Why SQLX is the default

A large share of transformation work really is best expressed as named models with visible dependencies.

When the job is to define grain, join sources, apply business filters, compute metrics, and materialize a stable table shape, SQLX is usually the honest layer. The model lives where people expect it to live. Its inputs are visible. Its shape is visible. A reviewer can open the file and see the logic that defines the table instead of reconstructing it from scattered steps elsewhere.

That’s the practical benefit behind reviewable transformations. Putting the transformation next to the model it defines keeps the repo inspectable.

What belongs in SQLX

Model semantics.

If the work is about what a row represents, how facts get joined, which filters define the business entity, or what columns make up the contract of the downstream table, we want that logic close to the model.

That includes the parts people sometimes try to smuggle elsewhere. Business filters. Join choices. Grain decisions. Metric definitions. Column-level shaping that changes what the table actually says. Those aren’t just implementation details. They’re part of the model.

That’s the same reasoning behind decision boundaries. The model should describe the business shape directly. It shouldn’t outsource meaning to runtime branches in some other layer because the SQL started feeling a little inconvenient.

What belongs in code

Some logic really is better in code.

Complex text normalization, reusable parsing, metadata expansion, API interaction, or helper behavior that would make the SQL materially harder to read may deserve code. Sometimes SQL can express the logic, but only in a way that makes the model worse to review. That’s the point where code earns its keep.

But the boundary still has to stay honest.

Code should support the model, not quietly become the place where the real business behavior lives. If the helper is doing something central to what the table means, and the SQLX now reads like a thin wrapper around mystery meat, the abstraction has relocated the important part of the system to a place fewer people will inspect.

That’s the same judgment behind earned abstraction. An abstraction helps when it makes the calling layer easier to understand. If it turns the model into a riddle, it failed.

What belongs in orchestration

Orchestration should own sequencing, scheduling, dependency execution, retries, and operational controls. It should not own model semantics.

If the real transformation logic only becomes visible once someone reads task branches, runtime arguments, or workflow conditionals, then the workflow layer has become the semantic layer. At that point, the system might still run, but review gets ugly fast because the meaning of the data is no longer where the data is defined.

That’s what we’re trying to avoid with orchestration boundaries. Workflows should tell you when work runs, in what order, and under what operational conditions. They shouldn’t be the place where you discover what the table actually does.

It’s the same instinct behind Pulumi config boundaries. Different stack, same discipline. The layer that sequences or configures work shouldn’t quietly become the place where the real behavior lives.

Why this ages better

Declarative layers age better because they keep intent visible.

A model in SQLX is easier to inspect than behavior scattered across scripts, wrappers, and workflow flags. A reviewer can read the shape of the transformation without having to replay a miniature runtime in their head, reducing how much hidden state they must carry just to review one change properly.

That’s part of why declarative models age better than script-driven piles. They don’t eliminate complexity. They just keep more of the important complexity where people can still see it.

The decision rule

Keep logic in the highest-level layer that can still express it clearly.

If SQLX is honest and readable, keep it there. If code makes the model clearer, use code. If the problem is just coordination, use orchestration and keep it thin.

The failure is hiding model behavior in a layer nobody would naturally review.

More in this domain: Data

Browse all

BigQuery cost guardrails that won't break your teams

BigQuery cost control works when guardrails are designed around workload shape and blast radius, not around shaming whoever happened to run the last expensive query.

On-demand vs slots: the SME decision boundary

For SMEs, the question is not which BigQuery pricing model is more sophisticated. The question is when workload classes have become distinct enough to deserve different compute lanes.

Partitioning defaults for event tables that don't lie

Partitioning is not just a performance tweak. It is one of the cheapest ways to control scan blast radius, but only if the partition contract matches how the table is actually queried.

Physical vs logical storage: a dataset classification rule for SMEs

Physical versus logical storage billing is not a warehouse philosophy debate. It is a dataset classification choice based on change rate, retention behavior, and how much storage churn the table creates.

Reservations for workload isolation: the minimal setup

Reservation design for SMEs is usually not an enterprise org chart. It is a small blast-radius pattern that keeps BI, batch, and sandbox work from bullying each other.

Related patterns

Dataform vs. script piles: how we keep transformations reviewable

We prefer a declarative transformation layer over ad hoc script piles once warehouse logic becomes shared, incremental, and worth reviewing as a system.

Why declarative data models scale better than script-driven pipelines

Declarative modeling scales better because it keeps business shape, dependencies, and reviewable intent visible as the platform and team both grow.

Streaming buffer is your hidden constraint

When BigQuery streaming pain shows up as a DML error, the real problem is usually workload shape. Streaming wants append-and-reconcile thinking, not row-by-row sync fantasies.

Incremental models are only safe when change detection is explicit

Incremental models are trustworthy only when they can deliberately identify which records need another pass after late or changed upstream data shows up.