Your customer upgrades in Stripe. Your CRM still says "free plan." Your support tool shows a different email than the one in your marketing platform. Three tools, three versions of the same person, and nobody agrees on who they are or what they're paying. This is the identity resolution problem.
And every guide you'll find frames it as a data engineering project that requires an identity graph, a warehouse, and a six-figure CDP contract. For a 30-person company with five SaaS tools, that framing is wrong.
What identity resolution is and why your tools disagree about who your customers are
Identity resolution is the process of linking customer records across different systems to build a single, unified profile per person. When a customer signs up in your app, makes a payment in Stripe, opens a support ticket in Intercom, and receives emails through Mailchimp, each tool creates its own record. Linking those records ensures every tool agrees on who the customer is.
The disagreement happens because each tool stores its own version of customer data independently. Stripe knows the billing email. HubSpot knows the marketing email. Intercom knows the email they used to start a chat. If those emails differ (and they often do: work email for billing, personal email for marketing), you now have three "different" customers who are actually the same person.
Enterprise companies solve this with identity graphs: centralized databases that map every identifier (email, phone, device ID, cookie, customer ID) to a single persistent profile. The identity graph updates continuously as new identifiers come in, linking anonymous browsing sessions to known accounts and resolving conflicts between systems.
For a team running Stripe, HubSpot, and Intercom, that architecture is overkill. The question isn't "how do we build an identity graph?" It's "do our tools share a common identifier we can match on?"
How identity resolution works: deterministic matching vs. probabilistic matching
There are two approaches to linking customer records, and they solve fundamentally different problems.
Deterministic matching uses exact identifiers to connect records with certainty. If the email jane@acme.com exists in both Stripe and HubSpot, those records belong to the same person. No algorithm required. Deterministic matching works with any unique identifier: email, phone number, customer ID, or any key your systems share.
Probabilistic matching uses statistical models to infer that two records likely belong to the same person, even without a shared identifier. It analyzes signals like IP address, device type, browser fingerprint, and behavioral patterns to estimate match probability. If one record shows j.smith@gmail.com and another shows john.smith@gmail.com from the same IP address, probabilistic matching flags them as a likely match.
The difference matters because it determines what infrastructure you need.
Approach | How it works | Accuracy | When to use it |
|---|---|---|---|
Deterministic | Matches on exact identifiers (email, phone, ID) | 99%+ | Your tools share a common key |
Probabilistic | Infers matches from behavioral signals | 70-85% | Anonymous visitors, cross-device tracking |
Enterprise CDPs sell probabilistic matching as the core of record linking because it solves their customers' hardest problem: linking anonymous website visitors across devices before they log in. That's a real challenge for a retailer with 10 million monthly visitors and a 2% login rate.
But most B2B SaaS teams don't have that problem. Your customers log in. They give you an email. Your tools already have the identifier you need. The gap isn't matching algorithms. The gap is that your tools don't share data with each other.
Why enterprise identity resolution requires a warehouse, identity graph, and data engineer
The enterprise approach to identity resolution looks like this:
Collect data from every touchpoint into a data warehouse (Snowflake, BigQuery, Redshift).
Build an identity graph table that maps all known identifiers to a single persistent ID per customer.
Configure matching rules: deterministic for exact matches, probabilistic for fuzzy matches.
Resolve conflicts when two identity clusters should merge or when a bad match creates a false link.
Activate the resolved profiles by pushing them back out to marketing tools, ad platforms, and CRMs.
Each step requires engineering resources. The identity graph needs SQL models or a dedicated tool to maintain. Probabilistic matching needs tuning: too aggressive and you merge different people; too conservative and you miss real matches. The entire pipeline runs on warehouse compute, which means ongoing infrastructure cost.
This architecture makes sense for companies with millions of anonymous touchpoints, cross-device tracking requirements, and teams of data engineers to maintain the system. It does not make sense for a 20-person company where every customer has an email address in every tool.
The infrastructure cost is real. Warehouse compute for identity resolution runs $500-$5,000/month depending on data volume. A CDP with built-in matching starts at $50,000/year. A data engineer to maintain the pipeline costs $150,000+ annually. For a small team, the total cost of this enterprise approach exceeds the revenue from the customers it helps you identify.
Identity resolution without an identity graph: how matching keys solve the same problem
For teams where customers have a known identifier across tools, the challenge simplifies to a matching key problem.
Here's what that looks like in practice. Your customer signs up with jane@acme.com. That email exists in Stripe (billing), HubSpot (CRM), Intercom (support), and Mailchimp (marketing). The matching challenge isn't "which Jane is this?" It's "why don't these tools share data about the same Jane?"
Matching on a shared key gives you the core benefit of record unification: every tool agrees on who the customer is and what their current data says. When Jane upgrades her plan in Stripe, her CRM record reflects the change. When she opens a support ticket, the agent sees her billing status. When marketing segments by plan tier, Jane lands in the right segment.
This approach covers the vast majority of record-matching use cases for teams under 200 people:
Record deduplication: Match on email to merge duplicate contacts that entered different tools through different channels.
Data enrichment: Pull billing data from Stripe into the CRM so sales sees plan status and revenue without switching tabs.
Consistent segmentation: Ensure your marketing platform uses the same customer attributes as your CRM, not a stale copy from last month's CSV export.
Support context: Give agents full customer context (plan, recent activity, billing status) by syncing data from the tools that own each data point.
The limitation is clear: key-based matching doesn't work for anonymous visitors, cross-device tracking before login, or records that genuinely have no shared identifier. If you're an e-commerce company tracking anonymous browsing behavior across mobile and desktop, you need probabilistic matching. If you're a B2B SaaS company whose customers log in and provide an email, you don't.
How to match customer records across tools in minutes, not months
The gap between the enterprise approach and what most teams actually need is this: enterprise solutions build a centralized identity graph and push resolved profiles outward. Small teams need the opposite: connect the tools directly and let data flow between them using the identifiers that already exist.
With Oneprofile, you connect two tools, pick a matching key (email, customer ID, or any field both tools share), map the fields you want to sync, and data flows. Stripe subscription status appears in your CRM within 15 minutes. Support agents see billing data without opening Stripe. Marketing segments by plan tier using current data, not a weekly export.
No warehouse. No identity graph. No SQL models to maintain. No data engineer required. The matching key does the linking: if the email in Stripe matches the email in HubSpot, the records link. Field-level change tracking means only changed fields sync, so your tools stay current without overwriting local edits.
This works because the record-linking problem for small teams isn't a matching algorithm problem. It's a data connectivity problem. Your tools already know who the customer is. They just don't talk to each other.
For teams that outgrow key-based matching (high anonymous traffic, cross-device tracking, millions of unidentified visitors), warehouse-native matching with probabilistic algorithms is the right next step. But for the 30-person company that needs Stripe, HubSpot, and Intercom to agree on who their customers are, the answer is simpler than an identity graph. It's connecting the tools and letting the email address do the work.
What is identity resolution?
Identity resolution is the process of linking customer records across different tools and systems to build a single profile per person. It uses identifiers like email, phone, or customer ID to connect fragmented data.
Do I need an identity graph for identity resolution?
Only if your records lack a shared identifier. If your tools all store email addresses, matching on email gives you 95% of the value. Identity graphs solve the harder problem of linking anonymous device IDs.
What is the difference between deterministic and probabilistic matching?
Deterministic matching links records using exact identifiers like email or phone number. Probabilistic matching uses statistical signals like IP address or device type to infer a match. Deterministic is more accurate; probabilistic has broader reach.
Can I do identity resolution without a data warehouse?
Yes. If your tools share a common key like email, you can match and sync records directly between tools without routing data through a warehouse first.
How is identity resolution different from deduplication?
Deduplication removes duplicate records within a single system. Identity resolution links records across multiple systems to recognize the same person in different tools.
