First-Party Data Glossary
Plain-language definitions for the data, identity, privacy, and AI terms behind a modern first-party data strategy. Each entry opens with a direct answer, then links to the guides that go deeper.
Data Fundamentals
Behavioral DataBehavioral data is information about what people actually do: pages viewed, products browsed, features used, videos watched, and clicks. It reveals interest and intent in real time and is one of the richest categories of first-party data a business collects.Read definition →Demographic DataDemographic data describes the characteristics of individuals, such as age, gender, income, education, and household composition. It adds context to a customer profile and supports segmentation, though it should be collected and used with consent and care.Read definition →Firmographic DataFirmographic data describes the attributes of a company rather than an individual: industry, size, revenue, location, and structure. It is the B2B counterpart to demographic data and is used to segment, score, and target business accounts.Read definition →First-Party DataFirst-party data is information a business collects directly from its own customers and audience through its own channels, such as a website, app, purchases, emails, and support. Because the company gathers it firsthand with consent, it is accurate, owned, and not shared with competitors.Read definition →Personally Identifiable Information (PII)Personally identifiable information (PII) is any data that can identify a specific individual on its own or combined with other data, such as name, email address, phone number, or government ID. Handling PII responsibly is central to privacy law and data security.Read definition →Second-Party DataSecond-party data is another company's first-party data, shared or sold directly to you through a trusted partnership rather than an anonymous marketplace. Because it started as someone's firsthand data, it is usually more accurate and transparent than pooled third-party data.Read definition →Third-Party DataThird-party data is information collected by an organization that has no direct relationship with the people it describes, then aggregated and sold to others. Because it is pooled from many sources and resold widely, it tends to be less accurate, less fresh, and harder to verify than first-party data.Read definition →Transactional DataTransactional data is the record of what customers actually buy: orders, purchases, refunds, subscriptions, and payment history. Because it reflects real spending rather than stated intent, it is typically the highest-signal first-party data a business owns.Read definition →Zero-Party DataZero-party data is information a customer intentionally and proactively shares with a business, such as preferences, intentions, and goals. It is a high-value subset of first-party data because the customer declares it directly, removing the guesswork of inferring intent from behavior.Read definition →
Identity & Resolution
Deterministic MatchingDeterministic matching links records to the same person using shared, exact identifiers such as a login, email address, or phone number. Because the match is based on a definite identifier, it is highly accurate and is the preferred method in identity resolution.Read definition →Golden RecordA golden record is the single, authoritative profile for a customer, created by resolving and merging all known data about that person into one trusted view. It is the canonical record that analytics, personalization, and activation systems read from.Read definition →Hashed Email (HEM)A hashed email (HEM) is an email address transformed by a one-way cryptographic function into a fixed string that cannot be reversed back to the original address. It lets systems match people on email without exposing the raw address, supporting privacy-safe identity resolution.Read definition →HouseholdingHouseholding is the practice of grouping individual customer profiles that belong to the same household, based on shared address or family relationships. It helps businesses understand and serve customers at the household level, not just as isolated individuals.Read definition →Identity GraphAn identity graph is a database that maps the many identifiers belonging to a single person, such as emails, device IDs, and cookies, into one connected profile. It is the data structure that makes identity resolution possible across channels and devices.Read definition →Identity ResolutionIdentity resolution is the process of matching and merging data from different sources and devices into a single profile for each real person. It turns scattered, duplicate records into one unified customer view, which is the foundation for accurate analytics, personalization, and AI.Read definition →Match RateMatch rate is the share of records or visitors that a system can successfully resolve to a known person or profile. In identity resolution and audience activation, a higher match rate means more of your data or traffic can be recognized and acted upon.Read definition →Persistent IdentifierA persistent identifier is a stable value that recognizes the same user or device across sessions over time, such as a logged-in account ID or a first-party identifier. It enables continuity in personalization and measurement without relying on third-party cookies.Read definition →Probabilistic MatchingProbabilistic matching links records to the same person by estimating likelihood from indirect signals such as device type, IP address, location, and behavior patterns. It extends reach where exact identifiers are missing, but it is less certain than deterministic matching.Read definition →
Data Infrastructure
Customer Data Platform (CDP)A customer data platform (CDP) is software that collects customer data from many sources, unifies it into persistent profiles, and makes those profiles available to other tools. It gives marketing and data teams a central, owned source of truth for first-party data.Read definition →Data ActivationData activation is the process of putting unified customer data to work to drive action: personalizing experiences, targeting audiences, prioritizing leads, and powering AI. It is the stage where collected and resolved first-party data finally creates value.Read definition →Data Clean RoomA data clean room is a secure, privacy-protected environment where two or more parties can match and analyze combined data without exposing the underlying records to each other. It enables collaboration on audience and measurement data while keeping raw PII private.Read definition →Data EnrichmentData enrichment is the practice of adding accurate, missing attributes to existing customer profiles, such as firmographics, demographics, or contact details. Done on resolved profiles with quality sources, it sharpens segmentation and targeting without compromising trust.Read definition →Data GovernanceData governance is the framework of policies, roles, and controls that ensure data is accurate, secure, compliant, and used responsibly. For first-party data, governance covers access control, lineage, retention, and consent, making data trustworthy enough to power AI.Read definition →Data LakeA data lake is a storage system that holds large volumes of raw data in its native format, structured or unstructured, until it is needed. It complements a data warehouse by retaining flexible, low-cost raw data for analytics, machine learning, and AI.Read definition →Data NormalizationData normalization is the process of standardizing data into consistent formats and values, such as making phone numbers, emails, and addresses uniform. It is essential for matching records accurately during identity resolution and for clean, reliable analytics.Read definition →Data WarehouseA data warehouse is a central repository that stores structured data from across a business for analysis and reporting. In a modern first-party data stack, the warehouse often serves as the owned source of truth that other tools read from and write to.Read definition →Reverse ETLReverse ETL is the process of moving data out of a central data warehouse and into operational tools like CRMs, ad platforms, and email systems. It lets teams activate the unified profiles and segments built in the warehouse across the channels where work happens.Read definition →Single Customer View (SCV)A single customer view is a complete, unified profile of a customer that brings together every interaction across channels and systems into one record. It is the goal of identity resolution and the prerequisite for consistent personalization, service, and measurement.Read definition →
Privacy & Tracking
App Tracking Transparency (ATT)App Tracking Transparency (ATT) is Apple's framework requiring iOS apps to ask users for permission before tracking them across other companies' apps and websites. When users decline, apps lose access to the device's advertising identifier (IDFA) for cross-app tracking.Read definition →CCPA / CPRA (California Privacy Law)The California Consumer Privacy Act (CCPA), as amended by the CPRA, is a state privacy law giving California residents rights over their personal information, including the right to know, delete, and opt out of its sale or sharing. It is a leading model for US state privacy regulation.Read definition →Consent Management Platform (CMP)A consent management platform (CMP) is software that collects, records, and enforces users' choices about data collection and tracking, typically through a cookie or privacy banner. It helps businesses honor privacy laws and respect user preferences across their sites.Read definition →Cookie ConsentCookie consent is a user's explicit permission for a website to use cookies and similar technologies, especially for non-essential purposes like advertising and analytics. Under laws such as the GDPR, businesses must obtain valid consent before setting many cookies.Read definition →CookielessCookieless describes marketing and measurement approaches that do not rely on third-party cookies, using instead first-party data, contextual signals, authenticated identifiers, and privacy-preserving techniques. It reflects the industry's shift toward durable, consent-based recognition.Read definition →Data MinimizationData minimization is the principle of collecting and retaining only the personal data you actually need for a specific, stated purpose. It reduces privacy risk and compliance burden and is a core requirement of regulations like the GDPR.Read definition →First-Party CookieA first-party cookie is a small file set by the website a user is directly visiting, used to remember things like login state, cart contents, and preferences. Because it belongs to the site itself, it is broadly supported and central to a normal user experience.Read definition →GDPR (General Data Protection Regulation)The General Data Protection Regulation (GDPR) is the European Union's comprehensive data privacy law governing how organizations collect, use, and protect the personal data of people in the EU. It sets requirements for consent, transparency, data rights, and security, with significant penalties for violations.Read definition →Opt-In and Opt-OutOpt-in means a user must actively give permission before data is collected or used; opt-out means collection is allowed until the user declines. Privacy regimes differ on which applies, with stricter laws generally requiring opt-in for sensitive uses.Read definition →Privacy SandboxPrivacy Sandbox is Google's set of web standards intended to support advertising use cases like targeting and measurement without third-party cookies or cross-site tracking of individuals. It aims to replace some cookie-based functions with more privacy-preserving APIs in Chrome.Read definition →Server-Side TrackingServer-side tracking collects and sends analytics and marketing data from your own server rather than directly from the user's browser. It gives a business more control over data accuracy, security, and what is shared, and is more resilient to browser restrictions and ad blockers.Read definition →Third-Party CookieA third-party cookie is set by a domain other than the one a user is visiting, typically by ad and tracking services embedded in a page. It enabled cross-site tracking and targeting, and it is being restricted or blocked by major browsers over privacy concerns.Read definition →Tracking PixelA tracking pixel is a tiny, often invisible image or snippet of code embedded in a web page or email that records when it loads, signaling an action like a page view or open. Pixels are widely used for analytics and ad measurement and are subject to consent rules.Read definition →
Marketing & Measurement
Audience SegmentationAudience segmentation is dividing customers and prospects into groups based on shared attributes or behavior, such as high-value buyers or lapsed customers, so you can target each with relevant messaging. Built on unified first-party data, segments map directly to action.Read definition →ChurnChurn is the rate at which customers stop buying, subscribing, or engaging over a given period. Reducing churn raises lifetime value, and first-party behavioral and transactional data is what lets businesses predict and prevent it before customers leave.Read definition →Customer Acquisition Cost (CAC)Customer acquisition cost (CAC) is the total sales and marketing spend required to acquire one new customer, calculated by dividing acquisition costs by the number of customers gained. Lowering CAC is a primary reason businesses invest in first-party data and suppression.Read definition →Customer Lifetime Value (LTV)Customer lifetime value (LTV) is the total revenue or profit a business expects from a customer over the entire relationship. It guides how much you can afford to spend on acquisition and retention and is sharpened by unified first-party transactional data.Read definition →Lead ScoringLead scoring is the practice of ranking prospects by their likelihood to convert, using signals like engagement, behavior, and fit. Powered by first-party data, it helps sales and marketing prioritize the warmest leads instead of treating every lead the same.Read definition →Lookalike AudienceA lookalike audience is a new advertising audience built to resemble an existing group of customers, usually your best ones, by finding people with similar attributes and behavior. Built from first-party seed data, lookalikes are a powerful way to expand reach efficiently.Read definition →Suppression ListA suppression list is a set of people you deliberately exclude from a campaign, most often existing customers excluded from acquisition ads. Suppression prevents wasted spend re-acquiring people you already have and is one of the fastest wins from first-party data.Read definition →Visitor IdentificationVisitor identification is the practice of recognizing who is visiting a website, connecting anonymous traffic to known profiles or companies using first-party signals and identity data, with consent. It helps businesses turn otherwise anonymous sessions into contactable, actionable leads.Read definition →Website Visitor De-AnonymizationWebsite visitor de-anonymization is the process of identifying previously anonymous website visitors by resolving their first-party signals to a known person or company profile. With consent, it lets businesses follow up with engaged visitors who did not fill out a form.Read definition →
AI, Search & Answer Engines
AI-Ready DataAI-ready data is data clean and owned enough to safely power models, agents, and automation. In practice it meets five properties: owned, unified, accurate, consented, and governed. First-party data is the natural starting point because it clears most of those by definition.Read definition →Answer Engine Optimization (AEO)Answer engine optimization (AEO) is the practice of structuring content so that AI answer engines and featured snippets can extract and cite it directly. It emphasizes clear, direct answers, structured data, and authoritative sourcing rather than only ranking links.Read definition →Generative Engine Optimization (GEO)Generative engine optimization (GEO) is the practice of optimizing content to be surfaced, summarized, and cited by generative AI tools like ChatGPT, Perplexity, and Google's AI overviews. It aims to make a brand the trusted source generative engines draw on when answering questions.Read definition →Large Language Model (LLM)A large language model (LLM) is an AI system trained on vast amounts of text to understand and generate human-like language. LLMs power chatbots, assistants, and answer engines, but they know nothing about your customers unless connected to your own first-party data.Read definition →Retrieval-Augmented Generation (RAG)Retrieval-augmented generation (RAG) is a technique that lets an AI model pull in relevant information from an external, trusted data source at query time, then generate an answer grounded in that data. It is how businesses make general models specific to their own knowledge.Read definition →