Skip to main content
RAEKFirstPartyData

Identity Resolution Explained: Turning Anonymous Visitors Into Known Customers

Identity resolution is the process of matching records and signals from different sources to one persistent profile per real person. It is the hinge that turns scattered first-party data into a usable customer view, because the same person arrives as a cookie on Monday, an email click on Tuesday, and a purchase on Friday, and nothing labels them as one.

How-ToBy RAEK Editorial TeamUpdated 12 min read

What identity resolution is

Identity resolution is the process of matching records and signals from different sources to a single, persistent profile for each real person. It answers a deceptively hard question: is this the same customer I have seen before, and which existing profile do they belong to? Get it right and every downstream use of data improves. Get it wrong and you either split one person into many or merge two people into one.

Without identity resolution, a customer can exist as five disconnected records: a web visitor, an email subscriber, a support ticket, an order, and a loyalty member, with no system aware they are the same person.

Why it matters

Almost every valuable use of first-party data assumes you can tie data to a person. Personalization, churn prediction, lead scoring, suppression, attribution, and AI all break when the same customer is fragmented across tools. Resolution is the unglamorous middle step that makes everything else possible, which is exactly why it is so often skipped and so often the reason a data program underperforms.

Consider what fragmentation costs in practice. You email a discount to a loyal customer who just bought at full price because the order and the email list never connected. You count one person as three in your analytics. You train a model on partial histories and wonder why its predictions are weak. None of these are data-collection problems. They are resolution problems.

Deterministic vs probabilistic matching

There are two broad ways to decide whether two records are the same person, and mature systems use both.

Deterministic matching

Deterministic matching joins records that share a strong, exact identifier, most often a hashed email or phone number. When two records carry the same hashed email, they are almost certainly the same person. Deterministic matches are high-confidence and explainable, which makes them the backbone of any trustworthy identity graph. The limit is coverage: not every record carries a shared strong identifier.

Probabilistic matching

Probabilistic matching estimates whether two records belong to the same person from weaker, combined signals such as device, approximate location, and behavior patterns. It extends reach into cases deterministic matching cannot cover, but each match is a likelihood, not a certainty. The discipline is to use it to widen coverage while keeping a confidence threshold high enough that you do not merge two different people.

Rule of thumb: anchor on deterministic matches for the records you will act on with confidence, and use probabilistic matching to extend reach where you can tolerate some uncertainty. Keep the two clearly labeled so you always know how sure you are.

How it works, in plain terms

  1. 1Standardize identifiers: normalize emails, phone numbers, names, and IDs so they can be compared cleanly.
  2. 2Match deterministically: join records that share a strong identifier, like the same hashed email.
  3. 3Extend probabilistically: link likely matches from weaker signals, above a confidence threshold.
  4. 4Stitch sessions: connect anonymous activity to a known profile once a person identifies themselves.
  5. 5Build the golden record: reconcile conflicting fields into one trusted profile per person.
  6. 6Maintain it: keep that record durable and current as new signals arrive over time.

The golden record and householding

When records merge, their fields can disagree: two addresses, two phone numbers, a maiden name and a married name. The reconciled, best-version profile is called the golden record, and the rules for choosing which value wins (most recent, most verified, most complete) are part of the system, not an afterthought.

A related concept is householding, grouping individuals who share an address or account into a household while keeping each person distinct. It matters whenever the buying unit is a family or a company rather than a single individual, and it prevents both over-merging and missed relationships.

Turning anonymous into known

A large share of website traffic is anonymous. Identity resolution lets you connect that anonymous behavior to a real profile once a person gives a signal you can match, so the browsing they did before they identified themselves is not lost. The session where they compared three products becomes part of their history the moment they log in or check out. Done with consent and owned data, this is how you recover value that would otherwise vanish.

Match rate, the share of activity you can tie to a known person, is the metric to watch here. It is never one hundred percent and chasing the last few points rarely pays. The goal is enough coverage to act confidently, improved steadily as you collect more identifiers through the value-for-value exchanges that good collection is built on.

Resolution vs enrichment

Resolution is often confused with enrichment, but they do different jobs in a fixed order. Resolution connects the records you already have into one profile. Enrichment then adds missing attributes to that profile from additional sources. You cannot enrich reliably until you know which records belong to the same person, so resolution always comes first.

Where it sits in your stack

Resolution runs on top of centralized storage and feeds directly into activation. It is the middle step that connects collection to value, and it is core to what RAEK Data does: take the data you already own and resolve it to real people you can act on.

It is also the property that makes data AI-ready: unified to one profile per customer, so models learn from a whole person instead of fragments. If you want to know your current match rate and where resolution would unlock the most value, a Readiness Review is the fastest way to find out.

Frequently asked questions

What is the difference between identity resolution and enrichment?
Identity resolution connects records you already have to one profile per person. Enrichment adds missing attributes to that profile from additional sources. Resolution comes first: you cannot enrich a profile reliably until you know which records belong to the same person.
What is the difference between deterministic and probabilistic matching?
Deterministic matching joins records that share an exact strong identifier like a hashed email, so it is high-confidence and explainable. Probabilistic matching estimates whether records belong to the same person from weaker combined signals, extending reach at the cost of certainty. Mature systems use deterministic as the anchor and probabilistic to widen coverage.
What is a golden record in identity resolution?
A golden record is the single, reconciled profile that results when multiple records for the same person are merged. When fields disagree, rules decide which value wins, such as most recent, most verified, or most complete. It is the trusted version of the customer that downstream systems read from.
How does identity resolution identify anonymous visitors?
It links anonymous activity to a known profile once the visitor provides a signal that can be matched, such as logging in or checking out. The earlier browsing is then attributed to that person. With consent and owned data, this recovers value from sessions that would otherwise stay anonymous and unusable.

See where your first-party data stands

Get a free First-Party Data Readiness Review, or score yourself in minutes with the readiness checklist.