Data Cloud Architecture

Salesforce Data Cloud (rebranded to Data 360 in October 2025) is a native lakehouse platform that ingests, harmonizes, unifies, and activates customer data at scale. For CTAs, Data Cloud represents the platform answer to Customer 360 — replacing traditional ETL/MDM approaches with a metadata-driven, consumption-based data platform. This page goes deep on architecture; for basic comparisons with external data options, see External Data.

Architecture Pipeline

Data Cloud processes data through six stages. Each stage transforms raw source data into actionable, unified customer insights.

graph LR
    subgraph Ingest["1 - Ingest"]
        direction TB
        DS[Data Streams]
        DSO[Data Stream<br/>Objects]
        DS --> DSO
    end

    subgraph Prepare["2 - Prepare"]
        direction TB
        DLO[Data Lake<br/>Objects]
        XFORM[Transforms<br/>batch + streaming]
        DLO --> XFORM
    end

    subgraph Model["3 - Model"]
        direction TB
        DMO[Data Model<br/>Objects]
        DSPACE[Data Spaces]
        DMO --> DSPACE
    end

    subgraph Unify["4 - Unify"]
        direction TB
        IR[Identity<br/>Resolution]
        UP[Unified<br/>Profiles]
        IR --> UP
    end

    subgraph Analyze["5 - Analyze"]
        direction TB
        CI[Calculated<br/>Insights]
        SEG[Segments]
        DG[Data Graphs]
    end

    subgraph Act["6 - Activate"]
        direction TB
        DA[Data Actions]
        AT[Activation<br/>Targets]
        FL[Flows]
    end

    Ingest --> Prepare --> Model --> Unify --> Analyze --> Act

    style Ingest fill:#4c6ef5,color:#fff
    style Prepare fill:#339af0,color:#fff
    style Model fill:#20c997,color:#fff
    style Unify fill:#51cf66,color:#fff
    style Analyze fill:#ffd43b,color:#333
    style Act fill:#ff6b6b,color:#fff

Reactive processing — Data Cloud does not poll for changes. Storage Native Change Events (SNCE) detect every write operation via atomic metadata pointer swaps, and Change Data Feed (CDF) identifies exactly which records changed, enabling incremental downstream processing.

Data Object Hierarchy

Understanding the DSO-DLO-DMO progression is foundational. Each layer serves a different purpose.

Layer	Object	Storage	Purpose
Raw	Data Stream Object (DSO)	Materialized (Parquet/Iceberg)	Raw ingested data, schema as-is from source
Prepared	Data Lake Object (DLO)	Materialized (Parquet/Iceberg)	Cleaned, transformed data in the lakehouse
Modeled	Data Model Object (DMO)	Physical and virtual views	Harmonized canonical model mapped to Customer 360 schema

graph TD
    SRC1[Salesforce CRM] --> DS1[Data Stream]
    SRC2[Marketing Cloud] --> DS2[Data Stream]
    SRC3[External DB] --> DS3[Data Stream]
    SRC4[Web/Mobile SDK] --> DS4[Data Stream]

    DS1 --> DSO1[DSO<br/>raw schema]
    DS2 --> DSO2[DSO<br/>raw schema]
    DS3 --> DSO3[DSO<br/>raw schema]
    DS4 --> DSO4[DSO<br/>raw schema]

    DSO1 --> DLO1[DLO<br/>cleaned + typed]
    DSO2 --> DLO2[DLO<br/>cleaned + typed]
    DSO3 --> DLO3[DLO<br/>cleaned + typed]
    DSO4 --> DLO4[DLO<br/>cleaned + typed]

    DLO1 --> DMO[DMO<br/>Unified Individual<br/>canonical model]
    DLO2 --> DMO
    DLO3 --> DMO
    DLO4 --> DMO

    style DSO1 fill:#868e96,color:#fff
    style DSO2 fill:#868e96,color:#fff
    style DSO3 fill:#868e96,color:#fff
    style DSO4 fill:#868e96,color:#fff
    style DLO1 fill:#4c6ef5,color:#fff
    style DLO2 fill:#4c6ef5,color:#fff
    style DLO3 fill:#4c6ef5,color:#fff
    style DLO4 fill:#4c6ef5,color:#fff
    style DMO fill:#51cf66,color:#fff

DMOs are physical and virtual views

DMOs are physical and virtual views of data lake objects — standard DMOs typically behave as virtual views, but the architecture supports materialized representations. Queries against a virtual DMO always read the latest data from underlying DLOs. Standard DMOs map to the Salesforce Customer 360 Data Model; custom DMOs can also be created.

Data Spaces

Data Spaces provide logical partitions within a single Data Cloud instance — separating data by brand, region, department, or SDLC stage without provisioning separate orgs.

Data Sources, Data Streams, and DLOs can be shared across Data Spaces
DMOs and platform features (segments, activations) are isolated per Data Space
Permission Sets control read/write/admin access per Data Space
All operations are audit-logged for compliance

Identity Resolution

Identity resolution is the process that links records about the same entity across sources into a single Unified Profile. It uses a ruleset containing match rules and reconciliation rules.

Three-Stage Process

graph LR
    A[Source Profiles<br/>from multiple DLOs] --> B[Matching<br/>blocking keys +<br/>fuzzy/exact rules]
    B --> C[Clustering<br/>transitive match<br/>resolution]
    C --> D[Reconciliation<br/>field-level winner<br/>selection]
    D --> E[Unified Profile<br/>+ Individual ID Graph]

    style A fill:#868e96,color:#fff
    style B fill:#4c6ef5,color:#fff
    style C fill:#ffd43b,color:#333
    style D fill:#51cf66,color:#fff
    style E fill:#ff6b6b,color:#fff

Stage	What Happens	Key Details
Matching	Candidate pairs identified	Blocking keys narrow the search space; Locality Sensitive Hashing (LSH) finds candidates; exact and fuzzy (probabilistic) match rules score similarity
Clustering	Related matches grouped	Transitive matching: if A=B and B=C, then A=B=C form one cluster
Reconciliation	Winner selected per field	When multiple sources provide the same field (e.g., email), reconciliation rules pick the winner based on source priority, recency, or most complete

Match Rule Types

Type	How It Works	Example
Exact	Field values must be identical	Email = Email
Fuzzy	Probabilistic similarity scoring	”Jon Smith” matches “John Smith” using phonetic/semantic algorithms
Normalized	Pre-processing before comparison	Strip whitespace, lowercase, remove special characters

Individual ID Graph

The ID graph connects all known identifiers for a person: email addresses, phone numbers, device IDs, loyalty IDs, and account usernames. Each Unified Profile has its own subgraph.

Processing cadence

Batch identity resolution runs periodically (exact cadence varies by configuration). Real-time identity resolution uses exact-match rules only and processes in milliseconds — useful for instantly recognizing a returning website visitor.

Entity Types

Identity resolution supports multiple entity types beyond individuals:

Individual — B2C customer profiles
Account — B2B company profiles
Household — Grouped individuals sharing an address or relationship
Cross-entity — Linking individuals to accounts and households

Calculated Insights

Calculated insights are derived metrics computed via ANSI SQL or a visual declarative builder. They surface aggregated intelligence on unified data.

Aspect	Detail
Language	ANSI SQL or visual builder (no-code)
Input	DMOs, other calculated insights
Output	Metrics attached to profiles (e.g., Lifetime Value, RFM score, Engagement Score)
Materialization	Batch (periodic refresh) or streaming (continuous)
Surfacing	Available on CRM records, in segments, in flows, and via API

Common calculated insight patterns:

Customer Lifetime Value (CLV) — sum of historical purchases
Recency-Frequency-Monetary (RFM) scoring
Engagement score — weighted sum of interactions across channels
Product affinity — category preferences from purchase/browse history
Churn risk — days since last interaction thresholds

Credit consumption

Calculated insights consume credits at very different rates depending on mode. Batch: 15 credits per million rows. Streaming: 800 credits per million rows. Choose batch unless sub-minute latency is a genuine requirement.

Segmentation and Activation

Segments

Segments are audiences built from unified profiles and calculated insights. They define “who” to target.

Built using a drag-and-drop segment builder or SQL
Can reference DMO attributes, calculated insight values, and engagement data
Support nested logic (AND/OR), exclusions, and time-based filters
Segments are recalculated on a schedule or near-real-time

Activation Targets

Activation pushes segments to downstream systems for action.

Target Category	Examples
Marketing	Marketing Cloud Engagement, Google Ads, Meta Ads, Amazon Ads
CRM	Data actions to Salesforce flows, platform events, record updates
Commerce	B2C Commerce Cloud for personalized storefronts
Analytics	Tableau, CRM Analytics for segment analysis
External	Any system via webhook, API, or data action

Data Actions and CRM Integration

Data actions bridge Data Cloud and CRM. They fire when conditions on a DMO or calculated insight are met.

Same-org: Data Cloud-Triggered Flows respond to DMO changes directly
Cross-org: Data actions generate Platform Events consumed by flows or Apex in another org
Related lists: Data Cloud Related Lists surface DMO data on Contact, Account, and Lead records without replicating data into CRM objects
Field enrichment: Calculated insight values can be written back to CRM fields

Zero-Copy Partner Network

Zero-copy eliminates data duplication by querying external data warehouses in place, using Apache Iceberg table format and Parquet files.

How It Works

graph LR
    subgraph DataCloud["Data Cloud"]
        QE[Query Engine<br/>SQL translation +<br/>pushdown]
        ACC[Acceleration<br/>Layer cache]
    end

    subgraph Partners["External Systems"]
        SF_SNO[Snowflake]
        SF_DBR[Databricks]
        SF_GBQ[Google BigQuery]
        SF_RED[Amazon Redshift]
    end

    QE <-->|"Iceberg REST<br/>Catalog"| SF_SNO
    QE <-->|"Iceberg REST<br/>Catalog"| SF_DBR
    QE <-->|"Iceberg REST<br/>Catalog"| SF_GBQ
    QE <-->|"Iceberg REST<br/>Catalog"| SF_RED

    QE --> ACC

    style QE fill:#4c6ef5,color:#fff
    style ACC fill:#ffd43b,color:#333

Bidirectional Access

Direction	Mechanism
Data Cloud queries external	SQL federation with intelligent pushdown to external engines
External queries Data Cloud	JDBC driver or Data-as-a-Service (DaaS) API for file-based sharing

When to Use vs Avoid Zero-Copy

Use Zero-Copy When	Avoid Zero-Copy When
Enterprise data lake is actively managed and governed	Source data is poorly structured or undocumented
Data is already curated and complete in the warehouse	Frequent complex transformations are needed
Avoiding duplicate pipelines across business units	Identity resolution is needed (lakes lack this)
Cost optimization — 70 credits/M rows vs free internal ingestion	Compliance requires full data lineage within Salesforce

Dual billing

Zero-copy means two bills: Salesforce credits for federation queries plus your data warehouse provider’s compute costs (Snowflake credits, BigQuery slots, etc.). Model both cost streams when presenting to stakeholders.

Storage and Compute Architecture

Data Cloud runs on a lakehouse architecture built on Apache Iceberg (table format) and Apache Parquet (file format), deployed on Hyperforce (AWS).

Tiered Storage

Tier	Latency	Use Case
Main memory	Milliseconds	Real-time event processing, in-session personalization
Low Latency Store (LLS)	Sub-second	NVMe-backed durable cache for hot data
Lakehouse (S3)	Seconds	Long-term storage for DLOs, historical data, bulk queries

Real-Time Layer

The real-time layer enables sub-second personalization:

Real-time data graphs — denormalized Customer 360 profiles with pre-joined objects
Real-time ingest — millisecond-level event capture from Web/Mobile SDKs
Real-time identity resolution — exact-match only, instant unification
Real-time calculated insights — metrics computed in milliseconds
Real-time segmentation — on-the-fly audience evaluation
Real-time actions — immediate flow triggers or external channel activation

Credit Consumption Model

Data Cloud uses consumption-based pricing. All actions consume credits from a unified credit pool.

Action	Credits per Million Rows	Notes
Data ingestion (batch, external)	2,000	Salesforce-to-Salesforce ingestion is free as of August 2025
Data ingestion (streaming)	5,000	2.5x batch cost — use only when latency demands it
Identity resolution	100,000	Most expensive operation by far
Calculated insights (batch)	15	Very efficient for periodic metrics
Calculated insights (streaming)	800	53x batch cost
Data queries	2	Cheapest operation
Segmentation	20	Per million rows evaluated
Activation (batch)	10	Pushing segments to targets
Activation (streaming DMO)	1,600	Real-time activation is expensive
Zero-copy federation	70	35x cheaper than batch ingest

Credit multipliers are recurring

Credits are consumed every time rows are processed — not just on first configuration. A calculated insight that runs hourly over 10M rows will consume credits every hour. Monitor usage via the Digital Wallet and set alerts.

Credit rate disclaimer

Credit multipliers change frequently — verify current rates at salesforce.com/data/rates/multipliers/

Pricing reference: 100,000 credits cost $500. Sandbox environments get a 20% discount on credit multipliers.

CTA Scenario Patterns

Scenario 1: Unified Customer 360 for Omnichannel Retail

Situation: Retailer with separate systems for e-commerce (Commerce Cloud), in-store POS, loyalty program, and customer service (Service Cloud). No unified view of the customer.

Data Cloud solution: Ingest from all four sources via data streams. Map to standard Individual DMO. Run identity resolution to merge the same customer across channels (email from loyalty, phone from POS, cookie ID from web). Create calculated insights for CLV, channel preference, and churn risk. Activate segments to Marketing Cloud for personalized campaigns and to Service Cloud for proactive case routing.

Why not traditional MDM: Traditional MDM would require an external platform (Informatica, Reltio), ETL pipelines to Salesforce, and ongoing sync maintenance. Data Cloud provides native identity resolution and activation without middleware.

Scenario 2: B2B Account Intelligence with Zero-Copy

Situation: Enterprise SaaS company with product usage data in Snowflake (billions of rows), CRM data in Sales Cloud, and support data in Service Cloud. Leadership wants account health scores visible to sales reps.

Data Cloud solution: Zero-copy federation to Snowflake for product usage data (avoids ingesting billions of rows). Ingest CRM and Service Cloud data natively (free Salesforce-to-Salesforce ingestion). Create calculated insights for account health score combining usage, support ticket trends, and renewal dates. Surface scores on Account records via Data Cloud Related Lists. Trigger flows when health score drops below threshold.

Why not ETL: Ingesting billions of usage rows would consume massive credits (2,000 per million = 2M credits for 1B rows). Zero-copy queries the data in place for 70 credits per million.

Scenario 3: Real-Time Personalization for Financial Services

Situation: Bank wants to personalize digital experiences in real-time — showing relevant product offers when a customer logs into the mobile app based on transaction history, life events, and segment membership.

Data Cloud solution: Ingest transaction data via streaming. Real-time identity resolution matches the authenticated user to their unified profile. Real-time calculated insights compute product eligibility scores. Real-time segmentation evaluates membership in offer audiences. Data actions push personalized offer payload to the mobile app in milliseconds.

Why Data Cloud over custom build: Building real-time identity resolution, segmentation, and activation from scratch requires significant engineering. Data Cloud provides this declaratively with sub-second latency.

Decision Guide: Data Cloud vs Alternatives

Factor	Data Cloud	Traditional ETL/MDM	Salesforce Connect
Best for	Unified customer view, analytics, AI	Transactional data sync	Read-only external data access
Data volume	Billions of records	Millions of records	Any (query-time)
Identity resolution	Native (match + reconcile)	Requires separate MDM tool	Not available
Latency	Near-real-time to real-time	Real-time to batch	Real-time (per query)
Query model	SQL on lakehouse	SOQL on platform objects	Limited SOQL subset
Cost model	Credits (consumption)	Integration tool license	Salesforce Connect license
AI/ML readiness	Native (Einstein, Agentforce)	Requires separate data pipeline	Not applicable
Activation	Native to Marketing, CRM, Commerce	Custom integration needed	Not applicable

CTA positioning

Data Cloud is not a replacement for MuleSoft or traditional integration. It complements integration by providing the unified data layer that integration patterns feed into. Present Data Cloud as the “customer data brain” and integration middleware as the “nervous system” connecting operational systems.

Industry-Specific Patterns

Industry	Data Cloud Pattern	Key Data Sources
Healthcare	Patient 360 — unified patient journey across EMR, appointments, wearables	EMR systems, IoT health trackers, scheduling platforms
Financial Services	Client 360 — transaction history, credit data, relationship insights	Core banking, credit bureaus, wealth platforms
Retail	Customer 360 — omnichannel purchase, loyalty, browsing behavior	POS, e-commerce, loyalty, clickstream
Manufacturing	Asset 360 — IoT sensor data, service history, predictive maintenance	IoT platforms, ERP, field service

Each industry leverages pre-built Data Cloud templates with industry-specific DMOs, reducing implementation time.

Gotchas and Anti-Patterns

Identity resolution is the most expensive operation

At 100,000 credits per million rows, identity resolution can dominate your credit budget. Optimize by reducing source profile volume (deduplicate before ingestion), using precise blocking keys, and running full resolution less frequently.

Data model mapping complexity

Mapping source fields to standard DMOs requires deep understanding of both the source schema and the Customer 360 Data Model. Poor mapping leads to incomplete unified profiles, broken segments, and wasted credits on re-processing.

Streaming vs batch cost gap

Streaming ingestion costs 2.5x batch. Streaming calculated insights cost 53x batch. Streaming activation costs 160x batch. Default to batch unless sub-minute latency is a proven business requirement — not a “nice to have.”

Data Cloud is not a transactional database

Data Cloud is optimized for analytical and profile-unification workloads. It is not designed to replace CRM objects for transactional CRUD operations. Do not use it as a general-purpose data store.

Cross-Domain Impact

Data Modeling — DMO design follows many of the same principles as CRM data modeling but with a canonical schema approach
External Data — Data Cloud is an alternative to Salesforce Connect for accessing external data; zero-copy extends this further
Large Data Volumes — Data Cloud handles billions of records without LDV governor limit concerns
Data Quality & Governance — Identity resolution and data spaces are governance mechanisms
Integration — Data Cloud complements (not replaces) integration middleware; data streams are an ingestion pattern
Security — Data spaces, ABAC policies, and field-level masking enforce data governance