Decision Library

Patterns

Reusable decision frameworks, implementation notes, and system breakdowns from production delivery work.

Decision memo Operations

AlloyDB managed connection pooling: when we'd trust it over PgBouncer

AlloyDB managed pooling is attractive because it removes a moving part, but the useful decision is whether the managed path gives enough semantic confidence, observability, and migration predictability to replace PgBouncer.

Last updated: Apr 4, 2026

alloydb postgres connection-pooling
Implementation note Operations

Cloud SQL to AlloyDB migration: what actually changes, what doesn't, and what we'd test first

A Cloud SQL to AlloyDB move is not a philosophical upgrade. It changes the operational boundary, and the useful work is re-proving the parts of the system that may no longer behave the same.

Last updated: Apr 4, 2026

cloud-sql alloydb migration
Decision memo Operations

Cloud SQL vs AlloyDB: the real difference is operational boundary, not benchmarks

The useful comparison between Cloud SQL and AlloyDB is not raw speed. It is how the operating boundary changes around scaling, pooling, failover, migration, and team burden.

Last updated: Apr 4, 2026

cloud-sql alloydb postgres
Decision memo Infrastructure

How we decide between Cloud SQL connectors, Auth Proxy, and private IP

Cloud SQL connectors, the Auth Proxy, and private IP are not interchangeable secure connection options. They change identity, routing, deployment shape, and how much network plumbing the team actually owns.

Last updated: Apr 4, 2026

cloud-sql cloud-run networking
Implementation note Operations

How we diagnose and fix a "too many connections" incident for Cloud Run + Postgres

A "too many connections" incident is rarely a one-line fix. It usually exposes a bad contract between Cloud Run scaling, app pool behavior, and database capacity.

Last updated: Apr 4, 2026

cloud-run postgres incident-response
Operating principle Infrastructure

IAM DB auth for Cloud SQL: when it simplifies security and when it complicates delivery

IAM DB auth can reduce password sprawl and make revocation cleaner, but it also turns database access into an identity operating model that depends on disciplined service-account boundaries.

Last updated: Apr 4, 2026

cloud-sql iam security
Decision memo Operations

Managed connection pooling in Cloud SQL: when it helps and when it complicates things

Managed connection pooling in Cloud SQL can reduce bursty connection pressure, but it also changes session behavior and should be adopted like a runtime boundary, not like a harmless checkbox.

Last updated: Apr 4, 2026

cloud-sql postgres connection-pooling
Decision memo Infrastructure

Safe scaling defaults for Cloud Run + Postgres

Cloud Run autoscaling is not a database strategy. Safe defaults keep the application from scaling itself into a Postgres incident before the team understands the workload.

Last updated: Apr 4, 2026

cloud-run postgres scaling
Operating principle Operations

Why Cloud Run + Postgres needs a connection budget

Cloud Run and Postgres get fragile when connection growth is left implicit. We treat connections as a finite runtime budget, not as plumbing the app can multiply without consequence.

Last updated: Apr 4, 2026

cloud-run postgres cloud-sql
Implementation note Reporting

BI Engine: when it matters, when it's a trap

BI Engine can be useful, but only after you prove it is actually accelerating the workload you care about. Otherwise it turns into configuration thrashing around the wrong problem.

Last updated: Mar 29, 2026

bi-engine bigquery reporting
Operating principle Data

BigQuery cost guardrails that won't break your teams

BigQuery cost control works when guardrails are designed around workload shape and blast radius, not around shaming whoever happened to run the last expensive query.

Last updated: Mar 29, 2026

bigquery cost-control data-platforms
Operating principle Data

Constraints without enforcement: still worth it?

Non-enforced constraints are useful when they tell the truth. They act as semantic contracts and optimizer hints, but they become actively dangerous the moment the warehouse is asked to trust a lie.

Last updated: Mar 29, 2026

bigquery data-modeling data-trust
Decision memo Data

On-demand vs slots: the SME decision boundary

For SMEs, the question is not which BigQuery pricing model is more sophisticated. The question is when workload classes have become distinct enough to deserve different compute lanes.

Last updated: Mar 29, 2026

bigquery cost-control workload-management
Implementation note Data

Partitioning defaults for event tables that don't lie

Partitioning is not just a performance tweak. It is one of the cheapest ways to control scan blast radius, but only if the partition contract matches how the table is actually queried.

Last updated: Mar 29, 2026

bigquery partitioning data-modeling
Decision memo Data

Physical vs logical storage: a dataset classification rule for SMEs

Physical versus logical storage billing is not a warehouse philosophy debate. It is a dataset classification choice based on change rate, retention behavior, and how much storage churn the table creates.

Last updated: Mar 29, 2026

bigquery storage cost-control
Decision memo Reporting

Precompute ladder: cache -> scheduled tables -> MVs -> extracts

Precompute is not mainly a feature choice. It is a freshness budget decision: use the cheapest mechanism that meets the reporting need, then stop paying live query cost out of habit.

Last updated: Mar 29, 2026

bigquery precompute reporting
Implementation note Data

Reservations for workload isolation: the minimal setup

Reservation design for SMEs is usually not an enterprise org chart. It is a small blast-radius pattern that keeps BI, batch, and sandbox work from bullying each other.

Last updated: Mar 29, 2026

bigquery reservations workload-management
Implementation note Data

Streaming buffer is your hidden constraint

When BigQuery streaming pain shows up as a DML error, the real problem is usually workload shape. Streaming wants append-and-reconcile thinking, not row-by-row sync fantasies.

Last updated: Mar 29, 2026

bigquery streaming data-engineering
Implementation note Reporting

Why your BI dashboards melt BigQuery

Dashboards do not passively read data. They generate repeated, variable workload, and that behavior is often the real source of BigQuery cost and latency pain.

Last updated: Mar 29, 2026

bigquery reporting bi
Operating principle Reporting

A dashboard is not an operating system

Dashboards are good at showing state. They are bad at routing action, assigning ownership, and closing operational loops once a metric requires intervention.

Last updated: Mar 26, 2026

reporting operations decision-systems
Decision memo Reporting

How we decide which metrics deserve a dashboard and which deserve a workflow

Some metrics are for observation. Others need ownership, thresholds, timing, and structured action. We decide explicitly which system shape each metric actually deserves.

Last updated: Mar 26, 2026

reporting automation operations
Implementation note Reporting

Looker Studio blending limits expose your real data model problems

When a report starts depending on heroic Looker Studio blending, the issue is usually upstream structure, not dashboard craftsmanship.

Last updated: Mar 26, 2026

looker-studio reporting data-modeling
Decision memo Reporting

What makes a KPI trustworthy enough to automate around

A KPI is not ready to drive action just because it exists on a dashboard. It needs stable meaning, reliable updates, and failure behavior that will not create new chaos.

Last updated: Mar 26, 2026

kpis reporting automation
Decision memo Reporting

When reporting logic belongs upstream instead of in the BI layer

If reporting logic affects business meaning, reuse, or trust, it usually belongs upstream where it can be reviewed, reused, and kept consistent across reports.

Last updated: Mar 26, 2026

reporting bi data-modeling
Operating principle Reporting

Why freshness matters less than trust in most reporting systems

A slightly delayed metric that people trust is usually more valuable than a real-time metric nobody believes.

Last updated: Mar 26, 2026

reporting data-trust kpis
Operating principle Infrastructure

Cloud Run request timeouts don't kill your code (so your architecture has to)

A Cloud Run request timeout ends the request, not necessarily the work. If the operation can outlive its caller, the system needs explicit job semantics instead of hope.

Last updated: Mar 25, 2026

cloud-run reliability runtime-behavior
Operating principle Infrastructure

Cloud Run scaling from zero is a feature until it isn't

Scale to zero is a good default for request-driven services, until startup delay, warm-capacity needs, or instance caps turn it into user-visible reliability behavior instead of a pricing feature.

Last updated: Mar 25, 2026

cloud-run serverless reliability
Decision memo Infrastructure

Direct VPC egress vs Serverless VPC Access for Cloud Run: our default

We default to Direct VPC egress for Cloud Run because it is the cleaner networking shape: fewer moving parts, no connector resource, and costs that scale with the service instead of beside it.

Last updated: Mar 25, 2026

cloud-run networking serverless
Decision memo Infrastructure

GKE Autopilot as the escape hatch from Cloud Run

When Cloud Run stops fitting, the next move is usually GKE Autopilot: more Kubernetes-shaped control without immediately taking on the full burden of Standard clusters.

Last updated: Mar 25, 2026

gke cloud-run kubernetes
System breakdown Infrastructure

"Internal-only" Cloud Run isn't just a checkbox

Making a Cloud Run service private is not one toggle. It is a decision about ingress, routing, caller path, and IAM working together as one access model.

Last updated: Mar 25, 2026

cloud-run networking gcp
Decision memo Infrastructure

Why we default to Cloud Run for SME internal platforms

For SME internal platforms, Cloud Run is our default because it covers a large share of useful workload shapes without forcing teams to own cluster operations before they have earned that surface area.

Last updated: Mar 25, 2026

cloud-run internal-platforms platform-design
Operating principle Data

BigQuery cost spikes usually come from table shape, not queries

When BigQuery spend jumps, the cause is usually in model shape, weak incremental design, or unnecessary reprocessing long before it's a single bad query.

Last updated: Mar 24, 2026

bigquery data-modeling cost-control
Implementation note Data

Dataform vs. script piles: how we keep transformations reviewable

We prefer a declarative transformation layer over ad hoc script piles once warehouse logic becomes shared, incremental, and worth reviewing as a system.

Last updated: Mar 24, 2026

dataform data-engineering transformations
Decision memo Data

How we decide whether a transformation belongs in SQLX, code, or orchestration

We keep transformations in SQLX by default, move to code when the logic truly stops being legible in SQL, and keep orchestration for sequencing rather than business meaning.

Last updated: Mar 24, 2026

dataform orchestration data-engineering
Implementation note Data

How we prevent stale rows in incremental fact models

Incremental fact models stay trustworthy only when record identity, reprocessing rules, and cleanup boundaries are designed on purpose instead of patched after drift shows up.

Last updated: Mar 24, 2026

incremental-models fact-modeling data-modeling
Operating principle Data

Incremental models are only safe when change detection is explicit

Incremental models are trustworthy only when they can deliberately identify which records need another pass after late or changed upstream data shows up.

Last updated: Mar 24, 2026

incremental-models data-engineering transformations
Operating principle Data

Reviewability is a data platform feature

Reviewability is not decoration for data work. It is part of whether a shared platform can change safely once more than one person has to reason about the same models and workflows.

Last updated: Mar 24, 2026

data-platforms data-engineering operations
Operating principle Data

Unique keys are not optional in analytical incrementals

Incremental analytical models need an explicit notion of row identity. Without it, merges drift, updates go missing, and review of correctness turns into guesswork.

Last updated: Mar 24, 2026

dataform incremental-models data-modeling
Operating principle Operations

What we keep out of orchestration in data platforms

We use orchestration to sequence work, not to become the real home of model semantics, cleanup logic, or hidden branching behavior in the data platform.

Last updated: Mar 24, 2026

orchestration data-platforms operations
Decision memo Infrastructure

When repeated Pulumi code earns abstraction and when it doesn't

We don't abstract repeated Pulumi code just because it shows up more than once. We do it when the shared shape is real, the behavior is stable enough to deserve a boundary, and the result is easier to read than the duplication it replaces.

Last updated: Mar 24, 2026

pulumi infrastructure
Decision memo Data

Why declarative data models scale better than script-driven pipelines

Declarative modeling scales better because it keeps business shape, dependencies, and reviewable intent visible as the platform and team both grow.

Last updated: Mar 24, 2026

data-engineering data-modeling transformations
Decision memo Data

Why we model around decision boundaries, not source cleanup

We shape analytical models around the business decision or entity they need to represent, not around the temporary cleanup steps needed to tame source data on the way in.

Last updated: Mar 24, 2026

data-modeling analytics-engineering transformations
Decision memo Infrastructure

How we decide between directory per environment and shared stacks in Pulumi

We do not force DRY across environments by default. We keep Pulumi environments separate until shared code, shared rules, and drift risk make consolidation cheaper than duplication.

Last updated: Mar 23, 2026

pulumi infrastructure
Implementation note Infrastructure

How we structure a directory per environment in Pulumi

When we keep Pulumi environments separate, we make the environment boundary obvious in the filesystem and keep shared logic outside it.

Last updated: Mar 23, 2026

pulumi infrastructure
Implementation note Infrastructure

What goes in Pulumi stack config and what doesn't

We use Pulumi stack config for environment-specific values, not as a hiding place for infrastructure logic.

Last updated: Mar 23, 2026

pulumi infrastructure
Operating principle Infrastructure

How we treat Terraform state in team environments

Terraform starts feeling fragile in teams when state is treated like a backend setting instead of a shared dependency for safe change.

Last updated: Mar 22, 2026

terraform infrastructure
Decision memo Infrastructure

Why we usually choose Pulumi over Terraform

Pulumi is our default when infrastructure starts behaving like software. Existing Terraform estates can still be the better decision when the migration cost is higher than the operational gain.

Last updated: Mar 22, 2026

pulumi terraform infrastructure