Drift Detection
Your pipeline ran successfully last week. Something changed. Drift detection tells you what — before your stakeholders notice it in their dashboards.
The problem
What is drift?
Data pipelines are not static. Your source systems change — tables get new columns, data volumes shift, upstream jobs run late, and sensitive data appears where it should not. Without continuous monitoring, these changes go unnoticed until a business user files a bug report or a compliance audit fails.
BoltPipeline monitors all active pipelines continuously across every environment. When something changes from the expected baseline, it raises a drift event — giving your team the information needed to investigate and decide how to respond.
What we monitor
Four types of drift
Schema Drift
A column was added, removed, or renamed in a source table. A data type changed. A table was dropped. Any of these can silently break a downstream pipeline step that was certified against the previous schema.
Examples
- ›A source table column was renamed from customer_id to cust_id
- ›A column's data type changed from VARCHAR to INTEGER
- ›A new non-nullable column was added to a table your pipeline writes to
Impact: Schema drift typically causes pipeline execution failures or silently incorrect data types in your output tables.
Volume Drift
The row count of a source or target table is significantly outside the expected range. This can indicate upstream data loading failures, accidental truncations, or runaway data growth.
Examples
- ›An overnight load produced 10% of the usual row count — the source ETL failed
- ›A source table grew 50x overnight — a duplicate load ran twice
- ›A target table is empty after execution — a WHERE clause is unexpectedly filtering all rows
Impact: Volume drift often indicates upstream failures that are not visible in your pipeline's own execution logs.
Freshness Drift
A source table has not been updated within the expected time window. Your pipeline may have executed successfully — but against stale data, producing output that is subtly out of date.
Examples
- ›A daily refresh job has not run in 36 hours
- ›A table's latest updated_at timestamp is two days old
- ›A streaming source has stopped writing new records
Impact: Freshness drift produces pipelines that appear healthy but are feeding stale data to BI tools and downstream consumers.
PII Drift
A column that was not previously classified as containing PII now appears to contain personal data — email addresses, phone numbers, names, or other identifiers. This can happen when source schemas change or new data sources are onboarded.
Examples
- ›A new column added to a source table contains email addresses
- ›A column previously holding codes now contains customer names
- ›A JSON blob column is now storing structured personal data
Impact: Undetected PII drift creates data governance and compliance exposure — particularly for GDPR, CCPA, and SOC 2 obligations.
Under the hood
How drift detection works
Drift detection runs as part of the BoltPipeline agent's continuous operation cycle. The agent collects schema snapshots, row count statistics, freshness timestamps, and column-level samples during execution. These are compared against baselines established during certification and previous runs.
Baseline established at certification
When you certify a pipeline, BoltPipeline records the expected schema, approximate data volumes, and column profiles as the baseline. Drift is measured against this point.
Checked on every execution
Before and after each pipeline run, the agent compares current state against the baseline. Deviations above configured thresholds raise drift events.
Visible in the Monitor tab
All drift events are surfaced in the BoltPipeline console under the Monitor tab. You can see which pipeline, which step, which column, and what changed.
Your data stays in your warehouse
Drift detection works with schema metadata and statistics — not your row-level data. BoltPipeline never reads your actual business data values (with the limited exception of error message collection).
What makes this different
Drift detection that understands your pipeline
Generic data observability tools monitor your warehouse tables — but they do not know which tables belong to which pipeline, what the expected lineage is, or what the certified baseline looks like.
Because BoltPipeline owns the entire lifecycle — from plan through certification to operation — its drift detection is context-aware. It knows the expected column lineage for every step, the certified baseline schema for every output table, and the dependency graph that determines which upstream change caused a downstream failure.
The difference in practice
A source column is renamed
Generic monitoring:
Table monitoring alerts that the column is missing
BoltPipeline:
BoltPipeline traces which pipeline steps read that column, predicts the execution failure, and surfaces the specific step and column in the console before the next run
Row count drops 80%
Generic monitoring:
Volume anomaly alert fires on the affected table
BoltPipeline:
BoltPipeline identifies which upstream load job feeds that table and whether the pipeline's own execution produced the drop or inherited it from a failed source