Drift Detection

Your pipeline ran successfully last week. Something changed. Drift detection tells you what — before your stakeholders notice it in their dashboards.

The problem

What is drift?

Data pipelines are not static. Your source systems change — tables get new columns, data volumes shift, upstream jobs run late, and sensitive data appears where it should not. Without continuous monitoring, these changes go unnoticed until a business user files a bug report or a compliance audit fails.

BoltPipeline monitors all active pipelines continuously across every environment. When something changes from the expected baseline, it raises a drift event — giving your team the information needed to investigate and decide how to respond.

What we monitor

Four types of drift

Schema Drift

A column was added, removed, or renamed in a source table. A data type changed. A table was dropped. Any of these can silently break a downstream pipeline step that was certified against the previous schema.

Examples

›A source table column was renamed from customer_id to cust_id
›A column's data type changed from VARCHAR to INTEGER
›A new non-nullable column was added to a table your pipeline writes to

Impact: Schema drift typically causes pipeline execution failures or silently incorrect data types in your output tables.

Volume Drift

The row count of a source or target table is significantly outside the expected range. This can indicate upstream data loading failures, accidental truncations, or runaway data growth.

Examples

›An overnight load produced 10% of the usual row count — the source ETL failed
›A source table grew 50x overnight — a duplicate load ran twice
›A target table is empty after execution — a WHERE clause is unexpectedly filtering all rows

Impact: Volume drift often indicates upstream failures that are not visible in your pipeline's own execution logs.

Freshness Drift

A source table has not been updated within the expected time window. Your pipeline may have executed successfully — but against stale data, producing output that is subtly out of date.

Examples

›A daily refresh job has not run in 36 hours
›A table's latest updated_at timestamp is two days old
›A streaming source has stopped writing new records

Impact: Freshness drift produces pipelines that appear healthy but are feeding stale data to BI tools and downstream consumers.

PII Drift

A column that was not previously classified as containing PII now appears to contain personal data — email addresses, phone numbers, names, or other identifiers. This can happen when source schemas change or new data sources are onboarded.

Examples

›A new column added to a source table contains email addresses
›A column previously holding codes now contains customer names
›A JSON blob column is now storing structured personal data

Impact: Undetected PII drift creates data governance and compliance exposure — particularly for GDPR, CCPA, and SOC 2 obligations.

Under the hood

How drift detection works

Drift detection runs as part of the BoltPipeline agent's continuous operation cycle. The agent collects schema snapshots, row count statistics, freshness timestamps, and column-level samples during execution. These are compared against baselines established during certification and previous runs.

Baseline established at certification

When you certify a pipeline, BoltPipeline records the expected schema, approximate data volumes, and column profiles as the baseline. Drift is measured against this point.

Checked on every execution

Before and after each pipeline run, the agent compares current state against the baseline. Deviations above configured thresholds raise drift events.

Visible in the Monitor tab

All drift events are surfaced in the BoltPipeline console under the Monitor tab. You can see which pipeline, which step, which column, and what changed.

Your data stays in your warehouse

Drift detection works with schema metadata and statistics — not your row-level data. BoltPipeline never reads your actual business data values (with the limited exception of error message collection).

What makes this different

Drift detection that understands your pipeline

Generic data observability tools monitor your warehouse tables — but they do not know which tables belong to which pipeline, what the expected lineage is, or what the certified baseline looks like.

Because BoltPipeline owns the entire lifecycle — from plan through certification to operation — its drift detection is context-aware. It knows the expected column lineage for every step, the certified baseline schema for every output table, and the dependency graph that determines which upstream change caused a downstream failure.

The difference in practice

A source column is renamed

Generic monitoring:

Table monitoring alerts that the column is missing

BoltPipeline:

BoltPipeline traces which pipeline steps read that column, predicts the execution failure, and surfaces the specific step and column in the console before the next run

Row count drops 80%

Generic monitoring:

Volume anomaly alert fires on the affected table

BoltPipeline:

BoltPipeline identifies which upstream load job feeds that table and whether the pipeline's own execution produced the drop or inherited it from a failed source