BoltPipeline logo
Resources
SecurityArchitecture

Why We Designed Security In, Not Bolted It On

Most data tools ask for broad database access. We took the opposite approach: your data never leaves your environment. Here's why that architectural decision changes everything.

Jan 6, 2025|BoltPipeline Team|4 min read

When we started building BoltPipeline, we made a decision that shaped every API call, every data flow, and every deployment: raw data and database credentials would never leave the customer's environment.

This wasn't a security feature we added later. It was the first architectural constraint we imposed — and everything else was built around it.

The Problem with "Send Us Your Data"

Most data tools in the modern stack assume they need access to your raw data. Observability tools query your tables directly. Profiling tools pull samples into their SaaS environment. Transformation tools store your connection strings in their cloud.

For many teams, this is acceptable. But for regulated industries — healthcare, banking, government, utilities — it's a non-starter. Broad data access creates compliance risk, security liability, and audit complexity. Every vendor with access to your data becomes a point of exposure.

The Metadata-Only Approach

BoltPipeline works differently. The platform operates using metadata and execution signals only. We bring clarity to your data model — table names, column names, schema structure, aggregate profiling metrics (null rates, cardinality counts, uniqueness scores), validation results, and health scores. We never see your data — no row values, no PII content, no query results, no data previews. And that same metadata that brings clarity is what powers AI to build better analytics at scale.

Let's be precise: metadata has some sensitivity. Table names can reveal business context. Column names can reveal data categories. We don't pretend metadata is zero-risk. But there's a fundamental difference between a platform that sees "customer_email has 2.1% nulls and 98% uniqueness" and a platform that sees "john.doe@example.com, jane.smith@company.co." One is structure and statistics. The other is actual personal data. Compliance regulations — HIPAA, GDPR, PCI — protect the latter, not the former.

An agent runs inside your infrastructure — your VPC, your warehouse, your controlled environment. It executes validations, profiling, and pipeline logic close to the data. Credentials are managed entirely in your environment, stored in your secret manager, fetched at runtime.

What This Means in Practice

Push-down profiling. When BoltPipeline profiles your data, the SQL queries execute inside your database. We receive aggregate metrics — null percentages, cardinality counts, min/max values. Never individual rows, never PII content, never business data.

Credential isolation. Database passwords are stored in your secret manager. The agent fetches them at runtime — we never collect, store, or transmit database passwords or connection strings.

Per-agent identity. Every agent gets its own certificate for mutual TLS authentication. Each agent has a unique cryptographic identity bound to its tenant. If one agent is compromised, only that agent's certificates are affected.

Scratch schema execution. Profiling and validation run in a dedicated scratch schema. Production tables are never modified by the platform.

Why "Agent in Your VPC" Isn't Enough

Many platforms now run agents in the customer's environment. That's a good start — but it's not the whole picture. The question isn't where the agent runs. It's what the agent sends back.

Most agents move data across the boundary. They extract rows for profiling in their SaaS environment. They show data previews in their UI. They store query results for dashboard rendering. The agent runs in your VPC, but your data still crosses the wire.

BoltPipeline's agent sends structure and statistics — never values. This isn't a configuration option. It's baked into the architecture. There's no "data preview" feature to disable, no "sample rows" toggle to turn off. The platform was designed from day one to operate on metadata alone.

Retrofitting this onto a product designed around data access would require rewriting core data flows, removing data preview features, changing pricing models (which often assume access to raw data), and recertifying compliance boundaries.

The Compliance Advantage

When your platform never sees raw data, compliance alignment becomes architectural rather than procedural:

  • HIPAA: No PHI leaves the customer database
  • GDPR/CCPA: No personal data processed by the control plane
  • PCI: No cardholder data in transit to SaaS
  • Data residency: Metadata-only communication is inherently compliant

This isn't "we have a policy." It's "the architecture makes it impossible."

The Trade-Off

The metadata-only approach limits what the control plane can do. We can't show you a preview of your data in the UI. We can't run ad-hoc queries against your tables. We can't cache results for faster dashboard loading.

We think that's the right trade-off. Your data stays where it belongs — in your environment, under your control. The platform is smart enough to compile, validate, and govern your pipelines using metadata alone.

BLearn more about BoltPipeline's security architecture →

Ready to see BoltPipeline in action?

SQL in. Governed pipelines out. Your data never leaves.

Turn SQL into Production-Ready Data Pipelines — Faster and Safer

SQL-first pipelines, validated and governed — executed directly inside your database.

No new DSLs. No fragile orchestration. Just SQL with built-in validation, lineage, and governance.