Data Cloud Architecture
Salesforce Data Cloud (rebranded to Data 360 in October 2025) is a native lakehouse platform that ingests, harmonizes, unifies, and activates customer data at scale. For CTAs, Data Cloud represents the platform answer to Customer 360 — replacing traditional ETL/MDM approaches with a metadata-driven, consumption-based data platform. This page goes deep on architecture; for basic comparisons with external data options, see External Data.
Architecture Pipeline
Data Cloud processes data through six stages. Each stage transforms raw source data into actionable, unified customer insights.
graph LR
subgraph Ingest["1 - Ingest"]
direction TB
DS[Data Streams]
DSO[Data Stream<br/>Objects]
DS --> DSO
end
subgraph Prepare["2 - Prepare"]
direction TB
DLO[Data Lake<br/>Objects]
XFORM[Transforms<br/>batch + streaming]
DLO --> XFORM
end
subgraph Model["3 - Model"]
direction TB
DMO[Data Model<br/>Objects]
DSPACE[Data Spaces]
DMO --> DSPACE
end
subgraph Unify["4 - Unify"]
direction TB
IR[Identity<br/>Resolution]
UP[Unified<br/>Profiles]
IR --> UP
end
subgraph Analyze["5 - Analyze"]
direction TB
CI[Calculated<br/>Insights]
SEG[Segments]
DG[Data Graphs]
end
subgraph Act["6 - Activate"]
direction TB
DA[Data Actions]
AT[Activation<br/>Targets]
FL[Flows]
end
Ingest --> Prepare --> Model --> Unify --> Analyze --> Act
style Ingest fill:#4c6ef5,color:#fff
style Prepare fill:#339af0,color:#fff
style Model fill:#20c997,color:#fff
style Unify fill:#51cf66,color:#fff
style Analyze fill:#ffd43b,color:#333
style Act fill:#ff6b6b,color:#fff
Reactive processing — Data Cloud does not poll for changes. Storage Native Change Events (SNCE) detect every write operation via atomic metadata pointer swaps, and Change Data Feed (CDF) identifies exactly which records changed, enabling incremental downstream processing.
Data Object Hierarchy
Understanding the DSO-DLO-DMO progression is foundational. Each layer serves a different purpose.
| Layer | Object | Storage | Purpose |
|---|---|---|---|
| Raw | Data Stream Object (DSO) | Materialized (Parquet/Iceberg) | Raw ingested data, schema as-is from source |
| Prepared | Data Lake Object (DLO) | Materialized (Parquet/Iceberg) | Cleaned, transformed data in the lakehouse |
| Modeled | Data Model Object (DMO) | Physical and virtual views | Harmonized canonical model mapped to Customer 360 schema |
graph TD
SRC1[Salesforce CRM] --> DS1[Data Stream]
SRC2[Marketing Cloud] --> DS2[Data Stream]
SRC3[External DB] --> DS3[Data Stream]
SRC4[Web/Mobile SDK] --> DS4[Data Stream]
DS1 --> DSO1[DSO<br/>raw schema]
DS2 --> DSO2[DSO<br/>raw schema]
DS3 --> DSO3[DSO<br/>raw schema]
DS4 --> DSO4[DSO<br/>raw schema]
DSO1 --> DLO1[DLO<br/>cleaned + typed]
DSO2 --> DLO2[DLO<br/>cleaned + typed]
DSO3 --> DLO3[DLO<br/>cleaned + typed]
DSO4 --> DLO4[DLO<br/>cleaned + typed]
DLO1 --> DMO[DMO<br/>Unified Individual<br/>canonical model]
DLO2 --> DMO
DLO3 --> DMO
DLO4 --> DMO
style DSO1 fill:#868e96,color:#fff
style DSO2 fill:#868e96,color:#fff
style DSO3 fill:#868e96,color:#fff
style DSO4 fill:#868e96,color:#fff
style DLO1 fill:#4c6ef5,color:#fff
style DLO2 fill:#4c6ef5,color:#fff
style DLO3 fill:#4c6ef5,color:#fff
style DLO4 fill:#4c6ef5,color:#fff
style DMO fill:#51cf66,color:#fff
DMOs are physical and virtual views
DMOs are physical and virtual views of data lake objects — standard DMOs typically behave as virtual views, but the architecture supports materialized representations. Queries against a virtual DMO always read the latest data from underlying DLOs. Standard DMOs map to the Salesforce Customer 360 Data Model; custom DMOs can also be created.
Data Spaces
Data Spaces provide logical partitions within a single Data Cloud instance — separating data by brand, region, department, or SDLC stage without provisioning separate orgs.
- Data Sources, Data Streams, and DLOs can be shared across Data Spaces
- DMOs and platform features (segments, activations) are isolated per Data Space
- Permission Sets control read/write/admin access per Data Space
- All operations are audit-logged for compliance
Identity Resolution
Identity resolution is the process that links records about the same entity across sources into a single Unified Profile. It uses a ruleset containing match rules and reconciliation rules.
Three-Stage Process
graph LR
A[Source Profiles<br/>from multiple DLOs] --> B[Matching<br/>blocking keys +<br/>fuzzy/exact rules]
B --> C[Clustering<br/>transitive match<br/>resolution]
C --> D[Reconciliation<br/>field-level winner<br/>selection]
D --> E[Unified Profile<br/>+ Individual ID Graph]
style A fill:#868e96,color:#fff
style B fill:#4c6ef5,color:#fff
style C fill:#ffd43b,color:#333
style D fill:#51cf66,color:#fff
style E fill:#ff6b6b,color:#fff
| Stage | What Happens | Key Details |
|---|---|---|
| Matching | Candidate pairs identified | Blocking keys narrow the search space; Locality Sensitive Hashing (LSH) finds candidates; exact and fuzzy (probabilistic) match rules score similarity |
| Clustering | Related matches grouped | Transitive matching: if A=B and B=C, then A=B=C form one cluster |
| Reconciliation | Winner selected per field | When multiple sources provide the same field (e.g., email), reconciliation rules pick the winner based on source priority, recency, or most complete |
Match Rule Types
| Type | How It Works | Example |
|---|---|---|
| Exact | Field values must be identical | Email = Email |
| Fuzzy | Probabilistic similarity scoring | ”Jon Smith” matches “John Smith” using phonetic/semantic algorithms |
| Normalized | Pre-processing before comparison | Strip whitespace, lowercase, remove special characters |
Individual ID Graph
The ID graph connects all known identifiers for a person: email addresses, phone numbers, device IDs, loyalty IDs, and account usernames. Each Unified Profile has its own subgraph.
Processing cadence
Batch identity resolution runs periodically (exact cadence varies by configuration). Real-time identity resolution uses exact-match rules only and processes in milliseconds — useful for instantly recognizing a returning website visitor.
Entity Types
Identity resolution supports multiple entity types beyond individuals:
- Individual — B2C customer profiles
- Account — B2B company profiles
- Household — Grouped individuals sharing an address or relationship
- Cross-entity — Linking individuals to accounts and households
Calculated Insights
Calculated insights are derived metrics computed via ANSI SQL or a visual declarative builder. They surface aggregated intelligence on unified data.
| Aspect | Detail |
|---|---|
| Language | ANSI SQL or visual builder (no-code) |
| Input | DMOs, other calculated insights |
| Output | Metrics attached to profiles (e.g., Lifetime Value, RFM score, Engagement Score) |
| Materialization | Batch (periodic refresh) or streaming (continuous) |
| Surfacing | Available on CRM records, in segments, in flows, and via API |
Common calculated insight patterns:
- Customer Lifetime Value (CLV) — sum of historical purchases
- Recency-Frequency-Monetary (RFM) scoring
- Engagement score — weighted sum of interactions across channels
- Product affinity — category preferences from purchase/browse history
- Churn risk — days since last interaction thresholds
Credit consumption
Calculated insights consume credits at very different rates depending on mode. Batch: 15 credits per million rows. Streaming: 800 credits per million rows. Choose batch unless sub-minute latency is a genuine requirement.
Segmentation and Activation
Segments
Segments are audiences built from unified profiles and calculated insights. They define “who” to target.
- Built using a drag-and-drop segment builder or SQL
- Can reference DMO attributes, calculated insight values, and engagement data
- Support nested logic (AND/OR), exclusions, and time-based filters
- Segments are recalculated on a schedule or near-real-time
Activation Targets
Activation pushes segments to downstream systems for action.
| Target Category | Examples |
|---|---|
| Marketing | Marketing Cloud Engagement, Google Ads, Meta Ads, Amazon Ads |
| CRM | Data actions to Salesforce flows, platform events, record updates |
| Commerce | B2C Commerce Cloud for personalized storefronts |
| Analytics | Tableau, CRM Analytics for segment analysis |
| External | Any system via webhook, API, or data action |
Data Actions and CRM Integration
Data actions bridge Data Cloud and CRM. They fire when conditions on a DMO or calculated insight are met.
- Same-org: Data Cloud-Triggered Flows respond to DMO changes directly
- Cross-org: Data actions generate Platform Events consumed by flows or Apex in another org
- Related lists: Data Cloud Related Lists surface DMO data on Contact, Account, and Lead records without replicating data into CRM objects
- Field enrichment: Calculated insight values can be written back to CRM fields
Zero-Copy Partner Network
Zero-copy eliminates data duplication by querying external data warehouses in place, using Apache Iceberg table format and Parquet files.
How It Works
graph LR
subgraph DataCloud["Data Cloud"]
QE[Query Engine<br/>SQL translation +<br/>pushdown]
ACC[Acceleration<br/>Layer cache]
end
subgraph Partners["External Systems"]
SF_SNO[Snowflake]
SF_DBR[Databricks]
SF_GBQ[Google BigQuery]
SF_RED[Amazon Redshift]
end
QE <-->|"Iceberg REST<br/>Catalog"| SF_SNO
QE <-->|"Iceberg REST<br/>Catalog"| SF_DBR
QE <-->|"Iceberg REST<br/>Catalog"| SF_GBQ
QE <-->|"Iceberg REST<br/>Catalog"| SF_RED
QE --> ACC
style QE fill:#4c6ef5,color:#fff
style ACC fill:#ffd43b,color:#333
Bidirectional Access
| Direction | Mechanism |
|---|---|
| Data Cloud queries external | SQL federation with intelligent pushdown to external engines |
| External queries Data Cloud | JDBC driver or Data-as-a-Service (DaaS) API for file-based sharing |
When to Use vs Avoid Zero-Copy
| Use Zero-Copy When | Avoid Zero-Copy When |
|---|---|
| Enterprise data lake is actively managed and governed | Source data is poorly structured or undocumented |
| Data is already curated and complete in the warehouse | Frequent complex transformations are needed |
| Avoiding duplicate pipelines across business units | Identity resolution is needed (lakes lack this) |
| Cost optimization — 70 credits/M rows vs free internal ingestion | Compliance requires full data lineage within Salesforce |
Dual billing
Zero-copy means two bills: Salesforce credits for federation queries plus your data warehouse provider’s compute costs (Snowflake credits, BigQuery slots, etc.). Model both cost streams when presenting to stakeholders.
Storage and Compute Architecture
Data Cloud runs on a lakehouse architecture built on Apache Iceberg (table format) and Apache Parquet (file format), deployed on Hyperforce (AWS).
Tiered Storage
| Tier | Latency | Use Case |
|---|---|---|
| Main memory | Milliseconds | Real-time event processing, in-session personalization |
| Low Latency Store (LLS) | Sub-second | NVMe-backed durable cache for hot data |
| Lakehouse (S3) | Seconds | Long-term storage for DLOs, historical data, bulk queries |
Real-Time Layer
The real-time layer enables sub-second personalization:
- Real-time data graphs — denormalized Customer 360 profiles with pre-joined objects
- Real-time ingest — millisecond-level event capture from Web/Mobile SDKs
- Real-time identity resolution — exact-match only, instant unification
- Real-time calculated insights — metrics computed in milliseconds
- Real-time segmentation — on-the-fly audience evaluation
- Real-time actions — immediate flow triggers or external channel activation
Credit Consumption Model
Data Cloud uses consumption-based pricing. All actions consume credits from a unified credit pool.
| Action | Credits per Million Rows | Notes |
|---|---|---|
| Data ingestion (batch, external) | 2,000 | Salesforce-to-Salesforce ingestion is free as of August 2025 |
| Data ingestion (streaming) | 5,000 | 2.5x batch cost — use only when latency demands it |
| Identity resolution | 100,000 | Most expensive operation by far |
| Calculated insights (batch) | 15 | Very efficient for periodic metrics |
| Calculated insights (streaming) | 800 | 53x batch cost |
| Data queries | 2 | Cheapest operation |
| Segmentation | 20 | Per million rows evaluated |
| Activation (batch) | 10 | Pushing segments to targets |
| Activation (streaming DMO) | 1,600 | Real-time activation is expensive |
| Zero-copy federation | 70 | 35x cheaper than batch ingest |
Credit multipliers are recurring
Credits are consumed every time rows are processed — not just on first configuration. A calculated insight that runs hourly over 10M rows will consume credits every hour. Monitor usage via the Digital Wallet and set alerts.
Credit rate disclaimer
Credit multipliers change frequently — verify current rates at salesforce.com/data/rates/multipliers/
Pricing reference: 100,000 credits cost $500. Sandbox environments get a 20% discount on credit multipliers.
CTA Scenario Patterns
Scenario 1: Unified Customer 360 for Omnichannel Retail
Situation: Retailer with separate systems for e-commerce (Commerce Cloud), in-store POS, loyalty program, and customer service (Service Cloud). No unified view of the customer.
Data Cloud solution: Ingest from all four sources via data streams. Map to standard Individual DMO. Run identity resolution to merge the same customer across channels (email from loyalty, phone from POS, cookie ID from web). Create calculated insights for CLV, channel preference, and churn risk. Activate segments to Marketing Cloud for personalized campaigns and to Service Cloud for proactive case routing.
Why not traditional MDM: Traditional MDM would require an external platform (Informatica, Reltio), ETL pipelines to Salesforce, and ongoing sync maintenance. Data Cloud provides native identity resolution and activation without middleware.
Scenario 2: B2B Account Intelligence with Zero-Copy
Situation: Enterprise SaaS company with product usage data in Snowflake (billions of rows), CRM data in Sales Cloud, and support data in Service Cloud. Leadership wants account health scores visible to sales reps.
Data Cloud solution: Zero-copy federation to Snowflake for product usage data (avoids ingesting billions of rows). Ingest CRM and Service Cloud data natively (free Salesforce-to-Salesforce ingestion). Create calculated insights for account health score combining usage, support ticket trends, and renewal dates. Surface scores on Account records via Data Cloud Related Lists. Trigger flows when health score drops below threshold.
Why not ETL: Ingesting billions of usage rows would consume massive credits (2,000 per million = 2M credits for 1B rows). Zero-copy queries the data in place for 70 credits per million.
Scenario 3: Real-Time Personalization for Financial Services
Situation: Bank wants to personalize digital experiences in real-time — showing relevant product offers when a customer logs into the mobile app based on transaction history, life events, and segment membership.
Data Cloud solution: Ingest transaction data via streaming. Real-time identity resolution matches the authenticated user to their unified profile. Real-time calculated insights compute product eligibility scores. Real-time segmentation evaluates membership in offer audiences. Data actions push personalized offer payload to the mobile app in milliseconds.
Why Data Cloud over custom build: Building real-time identity resolution, segmentation, and activation from scratch requires significant engineering. Data Cloud provides this declaratively with sub-second latency.
Decision Guide: Data Cloud vs Alternatives
| Factor | Data Cloud | Traditional ETL/MDM | Salesforce Connect |
|---|---|---|---|
| Best for | Unified customer view, analytics, AI | Transactional data sync | Read-only external data access |
| Data volume | Billions of records | Millions of records | Any (query-time) |
| Identity resolution | Native (match + reconcile) | Requires separate MDM tool | Not available |
| Latency | Near-real-time to real-time | Real-time to batch | Real-time (per query) |
| Query model | SQL on lakehouse | SOQL on platform objects | Limited SOQL subset |
| Cost model | Credits (consumption) | Integration tool license | Salesforce Connect license |
| AI/ML readiness | Native (Einstein, Agentforce) | Requires separate data pipeline | Not applicable |
| Activation | Native to Marketing, CRM, Commerce | Custom integration needed | Not applicable |
CTA positioning
Data Cloud is not a replacement for MuleSoft or traditional integration. It complements integration by providing the unified data layer that integration patterns feed into. Present Data Cloud as the “customer data brain” and integration middleware as the “nervous system” connecting operational systems.
Industry-Specific Patterns
| Industry | Data Cloud Pattern | Key Data Sources |
|---|---|---|
| Healthcare | Patient 360 — unified patient journey across EMR, appointments, wearables | EMR systems, IoT health trackers, scheduling platforms |
| Financial Services | Client 360 — transaction history, credit data, relationship insights | Core banking, credit bureaus, wealth platforms |
| Retail | Customer 360 — omnichannel purchase, loyalty, browsing behavior | POS, e-commerce, loyalty, clickstream |
| Manufacturing | Asset 360 — IoT sensor data, service history, predictive maintenance | IoT platforms, ERP, field service |
Each industry leverages pre-built Data Cloud templates with industry-specific DMOs, reducing implementation time.
Gotchas and Anti-Patterns
Identity resolution is the most expensive operation
At 100,000 credits per million rows, identity resolution can dominate your credit budget. Optimize by reducing source profile volume (deduplicate before ingestion), using precise blocking keys, and running full resolution less frequently.
Data model mapping complexity
Mapping source fields to standard DMOs requires deep understanding of both the source schema and the Customer 360 Data Model. Poor mapping leads to incomplete unified profiles, broken segments, and wasted credits on re-processing.
Streaming vs batch cost gap
Streaming ingestion costs 2.5x batch. Streaming calculated insights cost 53x batch. Streaming activation costs 160x batch. Default to batch unless sub-minute latency is a proven business requirement — not a “nice to have.”
Data Cloud is not a transactional database
Data Cloud is optimized for analytical and profile-unification workloads. It is not designed to replace CRM objects for transactional CRUD operations. Do not use it as a general-purpose data store.
Cross-Domain Impact
- Data Modeling — DMO design follows many of the same principles as CRM data modeling but with a canonical schema approach
- External Data — Data Cloud is an alternative to Salesforce Connect for accessing external data; zero-copy extends this further
- Large Data Volumes — Data Cloud handles billions of records without LDV governor limit concerns
- Data Quality & Governance — Identity resolution and data spaces are governance mechanisms
- Integration — Data Cloud complements (not replaces) integration middleware; data streams are an ingestion pattern
- Security — Data spaces, ABAC policies, and field-level masking enforce data governance
Sources
- Salesforce Architects: Data 360 Architecture
- Salesforce Architects: Data 360 Integration Patterns
- Salesforce Architects: Data 360 Interoperability Decision Guide
- Salesforce Developers: Model Data in Data Cloud (DMO Mapping Guide)
- Salesforce Help: Calculated Insights
- Salesforce Help: Data Actions in Data Cloud
- Salesforce Press Release: Zero Copy Partner Network (April 2024)
- Salesforce Engineering: Zero Copy Real-Time Analysis
- Salesforce Blog: Real-Time Identity Resolution
- Salesforce Blog: Data Cloud Pricing Updates (Aug 2025)
- Salesforce Developers Blog: Data Cloud and Identity Resolution (Oct 2024)
- Salesforce Ben: Data Cloud Zero Copy — When and When Not to Use It
- Salesforce Ben: Data Cloud Match Rules vs Duplicate Rules
- Trailhead: Identity Resolution Rulesets
- Trailhead: Data Cloud Insights and Use Cases
- David Palencia: Data Cloud Pricing and Credit Consumption