What is Change Data Capture (CDC)? Methods Explained

Feb 3, 2026

What is Change Data Capture (CDC)? Methods Explained

Utku Zihnioglu

CEO & Co-founder

Every guide to change data capture starts the same way: here's how to read a database transaction log and replicate rows into a warehouse. That's accurate. It's also half the story. The concept behind CDC is simple: detect what changed, sync only the diff. That principle applies to PostgreSQL transaction logs. It also applies to the Stripe API, the HubSpot contact endpoint, and every other SaaS tool your team uses daily. Most CDC guides ignore the second half because they're written by companies that sell database connectors.

This guide covers both.

What change data capture is and why it matters for keeping data fresh

Change data capture is a technique for identifying which records changed in a source system and propagating only those changes to a destination. Instead of copying an entire dataset every sync cycle, CDC detects inserts, updates, and deletes since the last run and moves just the diffs.

The value is straightforward. A nightly full-snapshot sync of 50,000 customer records processes all 50,000 rows even if only 12 changed. CDC processes 12. The result: lower latency, fewer API calls, less compute, and fresher data in the destination.

CDC matters most for operational tools where humans act on data in real time. A support rep opens a ticket and sees a customer's plan status from yesterday. A sales rep sends a renewal email to someone who already renewed. Stale data causes these failures, and CDC eliminates the root cause by keeping the gap between source and destination measured in minutes, not hours.

Four change data capture methods: log-based, trigger-based, timestamp-based, and API-polling

Every CDC implementation answers the same question differently: how do you detect that something changed?

CDC Method	How It Detects Changes	Best For	Trade-off
Log-based	Reads the database transaction log	High-volume database replication	Requires log access and database-specific configuration
Trigger-based	Database triggers fire on insert/update/delete	Legacy systems without log access	Doubles write load on the source database
Timestamp-based	Queries rows where `updated_at` > last sync time	Simple setups with reliable timestamps	Misses deletes and records without timestamp columns
API-polling	Polls a SaaS API for records modified since last sync	SaaS-to-SaaS and SaaS-to-database sync	Bounded by API rate limits and polling interval

Log-based CDC is the gold standard for databases. The transaction log already records every insert, update, and delete. A log reader (like PostgreSQL's logical replication or MySQL's binlog) parses these entries and streams them downstream. The source database barely notices because the reads happen asynchronously on the log, not on the tables. This is what tools like Debezium and Fivetran use for database-to-warehouse replication.

Trigger-based CDC predates log-based approaches. A database trigger fires on every write and records the change in a shadow table. It works, but it doubles the write load. Modern teams avoid this method unless they're stuck on legacy databases without accessible transaction logs.

Timestamp-based CDC is the simplest approach. Query all rows where updated_at is newer than the last sync. It's easy to implement but blind to deletes (a deleted row has no timestamp to query) and useless for tables that don't track modification times.

API-polling CDC is what most guides skip entirely. SaaS APIs expose endpoints like "get contacts modified since timestamp" or "list invoices updated after date." Polling these endpoints on a schedule, comparing field values against the last known state, and syncing only the diffs is CDC applied to APIs instead of databases. It reduces API calls by 90%+ compared to pulling every record on every run.

Change data capture for databases vs. CDC for SaaS tools

Traditional CDC literature treats this technique as a database concept. The source is PostgreSQL, Oracle, or SQL Server. The destination is Snowflake, BigQuery, or Redshift. The mechanism is a transaction log reader. This framing is correct for analytical pipelines where the goal is warehouse loading.

But most teams under 200 people don't have a warehouse. They have Stripe, HubSpot, Intercom, Mailchimp, and a PostgreSQL database their app writes to. The data freshness problem is the same: when a customer upgrades in Stripe, the CRM should reflect that within minutes, not hours. The fix is the same principle: detect the change, sync the diff.

The difference is the mechanism. Database CDC reads transaction logs. SaaS CDC polls APIs with modified-since filters and compares field-level values against the last known state. Both produce the same output: a set of changes (field X went from value A to value B) that get applied to the destination.

This distinction matters because the tooling diverges. Database CDC tools (Debezium, Fivetran, AWS DMS) are built for warehouse loading. They assume a database source and a warehouse destination. SaaS CDC tools need to handle API authentication, rate limiting, pagination, and field mapping across systems with different schemas. A tool that does log-based CDC from PostgreSQL to Snowflake won't help you sync Stripe subscription changes to HubSpot.

Why most change data capture guides ignore the tools your team actually uses

Search for "change data capture" and you'll find guides that explain log-based replication from Oracle to a data warehouse. They cover trigger-based CDC, timestamp-based CDC, and snapshot comparisons. All four methods assume a relational database as the source.

This is a blind spot, not a minor omission. For a 30-person SaaS company, the data freshness problems are between SaaS tools, not between databases and warehouses. The customer upgraded in Stripe but the CRM still shows "free plan." The support ticket in Intercom doesn't show the customer's latest billing status. Marketing is sending onboarding emails to customers who already completed onboarding.

These are CDC problems. The data changed in one system and didn't propagate to the others. The fix is the same principle: track what changed, sync the diff, keep every tool current. But the traditional CDC stack (transaction log reader, streaming platform, warehouse destination) is the wrong architecture for this use case. You don't need Debezium and Kafka to sync 200 Stripe subscription changes to HubSpot every 15 minutes.

What you need is API-based change detection with field-level diffing: poll the source for modified records, compare each field against the last known value, and write only the fields that actually changed to the destination. This is CDC without the database infrastructure.

How to get change data capture between SaaS tools without building a pipeline

The traditional CDC pipeline looks like this: database transaction log, a streaming platform (Kafka or equivalent), a transformation layer, and a warehouse destination. Building this requires a data engineer, infrastructure, and ongoing maintenance. For analytical use cases at scale, this architecture makes sense.

For syncing Stripe to HubSpot, it's overhead that creates more problems than it solves. You'd need to: set up log-based CDC from Stripe (impossible, since Stripe isn't a database), route data through a warehouse, write SQL models to transform it, and then run reverse ETL to push it back out to HubSpot. Four systems and a data engineer to solve what should be a direct connection.

Oneprofile applies CDC principles to SaaS APIs directly. Property-level change tracking detects which fields changed (with old and new values), syncs only the diff, and reduces API calls compared to full-snapshot sync. Your database is the source of truth: changes in PostgreSQL automatically propagate to every connected tool without custom code, without a warehouse, and without a streaming platform in between.

The setup takes minutes. Connect two tools, map the fields, set a sync schedule. Every run processes only records that changed since the last run. Failed records go to a dead letter queue for investigation instead of being silently dropped. The result is CDC-grade data freshness between your operational tools, without the CDC-grade infrastructure investment.

Is change data capture the same as data replication?

No. Data replication copies entire datasets. Change data capture tracks only what changed since the last sync and moves just those diffs. CDC is a method you can use within replication, but it's more efficient than full-snapshot approaches.

Do I need a data warehouse to use change data capture?

Not for operational sync. Traditional CDC loads changes into a warehouse, but API-based CDC can sync changes directly between SaaS tools. No warehouse, no staging area, no SQL models required.

What is log-based change data capture?

Log-based CDC reads a database's transaction log to detect inserts, updates, and deletes. It has near-zero performance impact on the source database and captures every change in order. It's the gold standard for database-to-warehouse replication.

Can change data capture work with SaaS APIs?

Yes. API-based CDC polls SaaS tools for records modified since the last sync, compares field values, and syncs only the diffs. This reduces API calls by 90%+ compared to full-snapshot sync and keeps tools current without database infrastructure.

Ready to get started?

No credit card required

Free 100k syncs every month

What is Change Data Capture (CDC)? Methods Explained