Customer Data Unification via Direct Sync

Customer Data Unification via Direct Sync

Customer Data Unification via Direct Sync

Photo of Utku Zihnioglu

Utku Zihnioglu

CEO & Co-founder

Your CRM says the customer is on the free plan. Stripe says they upgraded last Tuesday. Intercom has their company name as "Acme" but HubSpot has "Acme Inc." Three tools, three versions of the same customer, and nobody in your company knows which one is right. This is the customer data unification problem, and the conventional CDP approach routes everything through a data warehouse to solve it.

That works for teams with data engineers and analytical workloads. But for the vast majority of teams that encounter this problem, a warehouse is optional.

What customer data unification actually means (and what it doesn't)

The pitch from enterprise CDPs goes something like this: ingest all your customer data into a central store, build an identity graph that resolves every anonymous cookie and device ID into a single person, then push that unified view back out to your tools. It sounds comprehensive. It is also wildly over-engineered for what most teams actually need.

Customer data unification, at its core, means getting a consistent, accurate view of each customer across the tools your team uses every day. McKinsey's research on customer analytics shows that companies with unified customer views significantly outperform those operating on fragmented data. When your support rep opens a ticket, they see the customer's current plan, last payment date, and open deals. When your marketing team builds a campaign, they're segmenting on real data, not a stale export from two weeks ago.

That's it. The job isn't resolving anonymous browsing sessions across devices. The job is making sure your CRM, your billing system, and your support platform agree on who a customer is and what their current status looks like.

Most teams under 200 people don't have a probabilistic matching problem. They have a "my tools disagree" problem. The customer has the same email address in every tool. The data is just stuck in each one, never flowing between them.

How most CDPs build unified profiles and why they require a warehouse

The warehouse-first architecture has become the default in the CDP world, and it's worth understanding why before deciding whether you need it.

The pattern works like this: you pipe all your tool data into a warehouse (Snowflake, BigQuery, Redshift), write dbt models to clean and transform it, build identity resolution logic in SQL, and then use a reverse ETL tool to push the merged data back to your operational tools. Some platforms call this a "composable CDP." Others just call it a data stack.

This approach makes sense for companies with data teams. If you already have analysts writing SQL, a warehouse with years of historical data, and transformation pipelines running on a schedule, adding identity resolution as another dbt model is a natural extension.

The problem is that this architecture has been sold as the only way to get unified customer profiles. It isn't. The warehouse adds something specific: a queryable historical archive for analytical workloads. It does not add anything to the actual matching and merging logic that your CRM and billing data need.

A 30-person SaaS company that uses HubSpot, Stripe, and Intercom does not need Snowflake sitting in the middle to figure out that jane@acme.com in HubSpot is the same person as jane@acme.com in Stripe. That matching is trivial. What they need is for the match to happen automatically and for the resulting profile to stay current as data changes in each tool.

Customer data unification with configurable merge strategies

Matching records is the first half of the problem. The second half, and the one that enterprise CDPs gloss over, is deciding what happens when matched records disagree.

Your customer's plan name in Stripe is "Team" but their CRM record still says "Free." Their phone number was updated in HubSpot last week but Intercom still has the old one. When you merge these records into a single customer view, which value wins?

The lazy approach is last-write-wins globally: whatever was updated most recently overwrites everything else. This works until a marketing automation tool updates a timestamp on a record and suddenly its stale company name overwrites the accurate one your sales rep just entered.

What you actually want is per-field control. Something like:

Field

Strategy

Rationale

Plan/subscription data

Source priority: billing tool

Stripe is the authority on billing state

Company name

Source priority: CRM

Sales reps maintain this manually

Last activity date

Latest value wins

The most recent timestamp is correct by definition

Lifetime revenue

Largest value wins

Avoids partial data from a tool that only sees recent transactions

This is what configurable merge strategies look like in practice. Instead of a single global rule, you set a resolution strategy per field and per source. Billing fields come from Stripe. Contact details come from the CRM. Activity data takes the most recent value. The unified profile assembles itself from the best source for each piece of information. Good unified profile management means you can browse, search, and inspect these merged profiles to verify the rules are working as expected.

We built this into Oneprofile because we kept hearing the same complaint from teams trying to merge customer records across tools: "the wrong tool won." Merge without strategy is just overwriting with extra steps.

Identity resolution with deterministic matching across tools

Identity resolution sounds intimidating because enterprise CDPs have made it intimidating. They talk about identity graphs, probabilistic models, device fingerprinting, and ML-powered clustering. For most B2B SaaS companies and many B2C companies with logged-in users, none of that applies.

Deterministic matching means: if two records in different tools share the same email address (or phone number, or user ID, or any other unique identifier), they're the same person. Period. No confidence scores, no tuning thresholds, no false positive review queues.

The accuracy difference is stark. Deterministic matching on a shared key is 99%+ accurate. Probabilistic matching, depending on the signal quality, typically falls somewhere between 70-85%. Maintaining a probabilistic model, tuning its thresholds, and reviewing its output costs real engineering time on an ongoing basis.

Here's a question worth asking: do your tools actually lack shared identifiers? If your customers sign up with an email, and that email exists in your CRM, your billing tool, your support platform, and your analytics product, you already have a deterministic key in every system. When your records share a known key, identity resolution works directly between tools -- no warehouse step required. The reason your data isn't unified isn't that matching is hard. It's that nothing is doing the matching automatically.

When you do need to match on multiple identifiers, combining them with OR logic handles most cases. Match on email OR external_id OR phone number, configurable per integration. The CRM matches on email, the billing tool matches on customer ID, the support platform matches on email. All three resolve to the same unified profile.

When you need a warehouse for customer data unification and when you don't

I don't want to overstate the case. Warehouses serve a real purpose. If you need historical analytics on customer behavior over months or years, a warehouse is the right tool. If you have a data team running complex transformations and building ML features, they need a warehouse to work in. If your identity resolution problem genuinely requires probabilistic matching across anonymous sessions, you probably need the warehouse-based approach.

But here's the test: if you removed the warehouse from your current architecture, would your customer-facing teams lose anything? If they'd lose analytics dashboards but their CRM would still show the right data, the warehouse is serving analytics, not unification. Those are different problems.

For the team that just needs their tools to agree on customer data, the path is shorter than the industry suggests:

  • Connect the tools that hold customer data

  • Set a matching key (usually email) per tool

  • Configure merge strategies so conflicts resolve predictably

  • Let the sync engine keep everything current

Warehouse optional. No dbt models to write. No reverse ETL pipeline to maintain. The unified profile lives where it's useful: inside a platform that's connected to all your tools and keeps them in sync. If you already have a warehouse, Oneprofile works alongside it.

The industry has a habit of selling infrastructure to teams that need outcomes. A 50-person company asking "how do I get a single customer view?" doesn't need a $50,000/year data stack and a six-month implementation. They need their tools talking to each other with clear rules about which data wins.

You can solve it today, with or without a warehouse.

Do I need a data warehouse to unify customer data?

What is a merge strategy in unified profile management?

How does deterministic matching differ from probabilistic?

Can I control which tool's data wins for specific fields?

Ready to get started?

No credit card required

Free 100k syncs every month