Skip to content

Data Cloud Architecture

Salesforce Data Cloud (rebranded to Data 360 in October 2025) is a native lakehouse platform that ingests, harmonizes, unifies, and activates customer data at scale. For CTAs, Data Cloud represents the platform answer to Customer 360 — replacing traditional ETL/MDM approaches with a metadata-driven, consumption-based data platform. This page goes deep on architecture; for basic comparisons with external data options, see External Data.

Architecture Pipeline

Data Cloud processes data through six stages. Each stage transforms raw source data into actionable, unified customer insights.

graph LR
    subgraph Ingest["1 - Ingest"]
        direction TB
        DS[Data Streams]
        DSO[Data Stream<br/>Objects]
        DS --> DSO
    end

    subgraph Prepare["2 - Prepare"]
        direction TB
        DLO[Data Lake<br/>Objects]
        XFORM[Transforms<br/>batch + streaming]
        DLO --> XFORM
    end

    subgraph Model["3 - Model"]
        direction TB
        DMO[Data Model<br/>Objects]
        DSPACE[Data Spaces]
        DMO --> DSPACE
    end

    subgraph Unify["4 - Unify"]
        direction TB
        IR[Identity<br/>Resolution]
        UP[Unified<br/>Profiles]
        IR --> UP
    end

    subgraph Analyze["5 - Analyze"]
        direction TB
        CI[Calculated<br/>Insights]
        SEG[Segments]
        DG[Data Graphs]
    end

    subgraph Act["6 - Activate"]
        direction TB
        DA[Data Actions]
        AT[Activation<br/>Targets]
        FL[Flows]
    end

    Ingest --> Prepare --> Model --> Unify --> Analyze --> Act

    style Ingest fill:#4c6ef5,color:#fff
    style Prepare fill:#339af0,color:#fff
    style Model fill:#20c997,color:#fff
    style Unify fill:#51cf66,color:#fff
    style Analyze fill:#ffd43b,color:#333
    style Act fill:#ff6b6b,color:#fff

Reactive processing — Data Cloud does not poll for changes. Storage Native Change Events (SNCE) detect every write operation via atomic metadata pointer swaps, and Change Data Feed (CDF) identifies exactly which records changed, enabling incremental downstream processing.


Data Object Hierarchy

Understanding the DSO-DLO-DMO progression is foundational. Each layer serves a different purpose.

LayerObjectStoragePurpose
RawData Stream Object (DSO)Materialized (Parquet/Iceberg)Raw ingested data, schema as-is from source
PreparedData Lake Object (DLO)Materialized (Parquet/Iceberg)Cleaned, transformed data in the lakehouse
ModeledData Model Object (DMO)Physical and virtual viewsHarmonized canonical model mapped to Customer 360 schema
graph TD
    SRC1[Salesforce CRM] --> DS1[Data Stream]
    SRC2[Marketing Cloud] --> DS2[Data Stream]
    SRC3[External DB] --> DS3[Data Stream]
    SRC4[Web/Mobile SDK] --> DS4[Data Stream]

    DS1 --> DSO1[DSO<br/>raw schema]
    DS2 --> DSO2[DSO<br/>raw schema]
    DS3 --> DSO3[DSO<br/>raw schema]
    DS4 --> DSO4[DSO<br/>raw schema]

    DSO1 --> DLO1[DLO<br/>cleaned + typed]
    DSO2 --> DLO2[DLO<br/>cleaned + typed]
    DSO3 --> DLO3[DLO<br/>cleaned + typed]
    DSO4 --> DLO4[DLO<br/>cleaned + typed]

    DLO1 --> DMO[DMO<br/>Unified Individual<br/>canonical model]
    DLO2 --> DMO
    DLO3 --> DMO
    DLO4 --> DMO

    style DSO1 fill:#868e96,color:#fff
    style DSO2 fill:#868e96,color:#fff
    style DSO3 fill:#868e96,color:#fff
    style DSO4 fill:#868e96,color:#fff
    style DLO1 fill:#4c6ef5,color:#fff
    style DLO2 fill:#4c6ef5,color:#fff
    style DLO3 fill:#4c6ef5,color:#fff
    style DLO4 fill:#4c6ef5,color:#fff
    style DMO fill:#51cf66,color:#fff

DMOs are physical and virtual views

DMOs are physical and virtual views of data lake objects — standard DMOs typically behave as virtual views, but the architecture supports materialized representations. Queries against a virtual DMO always read the latest data from underlying DLOs. Standard DMOs map to the Salesforce Customer 360 Data Model; custom DMOs can also be created.

Data Spaces

Data Spaces provide logical partitions within a single Data Cloud instance — separating data by brand, region, department, or SDLC stage without provisioning separate orgs.

  • Data Sources, Data Streams, and DLOs can be shared across Data Spaces
  • DMOs and platform features (segments, activations) are isolated per Data Space
  • Permission Sets control read/write/admin access per Data Space
  • All operations are audit-logged for compliance

Identity Resolution

Identity resolution is the process that links records about the same entity across sources into a single Unified Profile. It uses a ruleset containing match rules and reconciliation rules.

Three-Stage Process

graph LR
    A[Source Profiles<br/>from multiple DLOs] --> B[Matching<br/>blocking keys +<br/>fuzzy/exact rules]
    B --> C[Clustering<br/>transitive match<br/>resolution]
    C --> D[Reconciliation<br/>field-level winner<br/>selection]
    D --> E[Unified Profile<br/>+ Individual ID Graph]

    style A fill:#868e96,color:#fff
    style B fill:#4c6ef5,color:#fff
    style C fill:#ffd43b,color:#333
    style D fill:#51cf66,color:#fff
    style E fill:#ff6b6b,color:#fff
StageWhat HappensKey Details
MatchingCandidate pairs identifiedBlocking keys narrow the search space; Locality Sensitive Hashing (LSH) finds candidates; exact and fuzzy (probabilistic) match rules score similarity
ClusteringRelated matches groupedTransitive matching: if A=B and B=C, then A=B=C form one cluster
ReconciliationWinner selected per fieldWhen multiple sources provide the same field (e.g., email), reconciliation rules pick the winner based on source priority, recency, or most complete

Match Rule Types

TypeHow It WorksExample
ExactField values must be identicalEmail = Email
FuzzyProbabilistic similarity scoring”Jon Smith” matches “John Smith” using phonetic/semantic algorithms
NormalizedPre-processing before comparisonStrip whitespace, lowercase, remove special characters

Individual ID Graph

The ID graph connects all known identifiers for a person: email addresses, phone numbers, device IDs, loyalty IDs, and account usernames. Each Unified Profile has its own subgraph.

Processing cadence

Batch identity resolution runs periodically (exact cadence varies by configuration). Real-time identity resolution uses exact-match rules only and processes in milliseconds — useful for instantly recognizing a returning website visitor.

Entity Types

Identity resolution supports multiple entity types beyond individuals:

  • Individual — B2C customer profiles
  • Account — B2B company profiles
  • Household — Grouped individuals sharing an address or relationship
  • Cross-entity — Linking individuals to accounts and households

Calculated Insights

Calculated insights are derived metrics computed via ANSI SQL or a visual declarative builder. They surface aggregated intelligence on unified data.

AspectDetail
LanguageANSI SQL or visual builder (no-code)
InputDMOs, other calculated insights
OutputMetrics attached to profiles (e.g., Lifetime Value, RFM score, Engagement Score)
MaterializationBatch (periodic refresh) or streaming (continuous)
SurfacingAvailable on CRM records, in segments, in flows, and via API

Common calculated insight patterns:

  • Customer Lifetime Value (CLV) — sum of historical purchases
  • Recency-Frequency-Monetary (RFM) scoring
  • Engagement score — weighted sum of interactions across channels
  • Product affinity — category preferences from purchase/browse history
  • Churn risk — days since last interaction thresholds

Credit consumption

Calculated insights consume credits at very different rates depending on mode. Batch: 15 credits per million rows. Streaming: 800 credits per million rows. Choose batch unless sub-minute latency is a genuine requirement.


Segmentation and Activation

Segments

Segments are audiences built from unified profiles and calculated insights. They define “who” to target.

  • Built using a drag-and-drop segment builder or SQL
  • Can reference DMO attributes, calculated insight values, and engagement data
  • Support nested logic (AND/OR), exclusions, and time-based filters
  • Segments are recalculated on a schedule or near-real-time

Activation Targets

Activation pushes segments to downstream systems for action.

Target CategoryExamples
MarketingMarketing Cloud Engagement, Google Ads, Meta Ads, Amazon Ads
CRMData actions to Salesforce flows, platform events, record updates
CommerceB2C Commerce Cloud for personalized storefronts
AnalyticsTableau, CRM Analytics for segment analysis
ExternalAny system via webhook, API, or data action

Data Actions and CRM Integration

Data actions bridge Data Cloud and CRM. They fire when conditions on a DMO or calculated insight are met.

  • Same-org: Data Cloud-Triggered Flows respond to DMO changes directly
  • Cross-org: Data actions generate Platform Events consumed by flows or Apex in another org
  • Related lists: Data Cloud Related Lists surface DMO data on Contact, Account, and Lead records without replicating data into CRM objects
  • Field enrichment: Calculated insight values can be written back to CRM fields

Zero-Copy Partner Network

Zero-copy eliminates data duplication by querying external data warehouses in place, using Apache Iceberg table format and Parquet files.

How It Works

graph LR
    subgraph DataCloud["Data Cloud"]
        QE[Query Engine<br/>SQL translation +<br/>pushdown]
        ACC[Acceleration<br/>Layer cache]
    end

    subgraph Partners["External Systems"]
        SF_SNO[Snowflake]
        SF_DBR[Databricks]
        SF_GBQ[Google BigQuery]
        SF_RED[Amazon Redshift]
    end

    QE <-->|"Iceberg REST<br/>Catalog"| SF_SNO
    QE <-->|"Iceberg REST<br/>Catalog"| SF_DBR
    QE <-->|"Iceberg REST<br/>Catalog"| SF_GBQ
    QE <-->|"Iceberg REST<br/>Catalog"| SF_RED

    QE --> ACC

    style QE fill:#4c6ef5,color:#fff
    style ACC fill:#ffd43b,color:#333

Bidirectional Access

DirectionMechanism
Data Cloud queries externalSQL federation with intelligent pushdown to external engines
External queries Data CloudJDBC driver or Data-as-a-Service (DaaS) API for file-based sharing

When to Use vs Avoid Zero-Copy

Use Zero-Copy WhenAvoid Zero-Copy When
Enterprise data lake is actively managed and governedSource data is poorly structured or undocumented
Data is already curated and complete in the warehouseFrequent complex transformations are needed
Avoiding duplicate pipelines across business unitsIdentity resolution is needed (lakes lack this)
Cost optimization — 70 credits/M rows vs free internal ingestionCompliance requires full data lineage within Salesforce

Dual billing

Zero-copy means two bills: Salesforce credits for federation queries plus your data warehouse provider’s compute costs (Snowflake credits, BigQuery slots, etc.). Model both cost streams when presenting to stakeholders.


Storage and Compute Architecture

Data Cloud runs on a lakehouse architecture built on Apache Iceberg (table format) and Apache Parquet (file format), deployed on Hyperforce (AWS).

Tiered Storage

TierLatencyUse Case
Main memoryMillisecondsReal-time event processing, in-session personalization
Low Latency Store (LLS)Sub-secondNVMe-backed durable cache for hot data
Lakehouse (S3)SecondsLong-term storage for DLOs, historical data, bulk queries

Real-Time Layer

The real-time layer enables sub-second personalization:

  • Real-time data graphs — denormalized Customer 360 profiles with pre-joined objects
  • Real-time ingest — millisecond-level event capture from Web/Mobile SDKs
  • Real-time identity resolution — exact-match only, instant unification
  • Real-time calculated insights — metrics computed in milliseconds
  • Real-time segmentation — on-the-fly audience evaluation
  • Real-time actions — immediate flow triggers or external channel activation

Credit Consumption Model

Data Cloud uses consumption-based pricing. All actions consume credits from a unified credit pool.

ActionCredits per Million RowsNotes
Data ingestion (batch, external)2,000Salesforce-to-Salesforce ingestion is free as of August 2025
Data ingestion (streaming)5,0002.5x batch cost — use only when latency demands it
Identity resolution100,000Most expensive operation by far
Calculated insights (batch)15Very efficient for periodic metrics
Calculated insights (streaming)80053x batch cost
Data queries2Cheapest operation
Segmentation20Per million rows evaluated
Activation (batch)10Pushing segments to targets
Activation (streaming DMO)1,600Real-time activation is expensive
Zero-copy federation7035x cheaper than batch ingest

Credit multipliers are recurring

Credits are consumed every time rows are processed — not just on first configuration. A calculated insight that runs hourly over 10M rows will consume credits every hour. Monitor usage via the Digital Wallet and set alerts.

Credit rate disclaimer

Credit multipliers change frequently — verify current rates at salesforce.com/data/rates/multipliers/

Pricing reference: 100,000 credits cost $500. Sandbox environments get a 20% discount on credit multipliers.


CTA Scenario Patterns

Scenario 1: Unified Customer 360 for Omnichannel Retail

Situation: Retailer with separate systems for e-commerce (Commerce Cloud), in-store POS, loyalty program, and customer service (Service Cloud). No unified view of the customer.

Data Cloud solution: Ingest from all four sources via data streams. Map to standard Individual DMO. Run identity resolution to merge the same customer across channels (email from loyalty, phone from POS, cookie ID from web). Create calculated insights for CLV, channel preference, and churn risk. Activate segments to Marketing Cloud for personalized campaigns and to Service Cloud for proactive case routing.

Why not traditional MDM: Traditional MDM would require an external platform (Informatica, Reltio), ETL pipelines to Salesforce, and ongoing sync maintenance. Data Cloud provides native identity resolution and activation without middleware.

Scenario 2: B2B Account Intelligence with Zero-Copy

Situation: Enterprise SaaS company with product usage data in Snowflake (billions of rows), CRM data in Sales Cloud, and support data in Service Cloud. Leadership wants account health scores visible to sales reps.

Data Cloud solution: Zero-copy federation to Snowflake for product usage data (avoids ingesting billions of rows). Ingest CRM and Service Cloud data natively (free Salesforce-to-Salesforce ingestion). Create calculated insights for account health score combining usage, support ticket trends, and renewal dates. Surface scores on Account records via Data Cloud Related Lists. Trigger flows when health score drops below threshold.

Why not ETL: Ingesting billions of usage rows would consume massive credits (2,000 per million = 2M credits for 1B rows). Zero-copy queries the data in place for 70 credits per million.

Scenario 3: Real-Time Personalization for Financial Services

Situation: Bank wants to personalize digital experiences in real-time — showing relevant product offers when a customer logs into the mobile app based on transaction history, life events, and segment membership.

Data Cloud solution: Ingest transaction data via streaming. Real-time identity resolution matches the authenticated user to their unified profile. Real-time calculated insights compute product eligibility scores. Real-time segmentation evaluates membership in offer audiences. Data actions push personalized offer payload to the mobile app in milliseconds.

Why Data Cloud over custom build: Building real-time identity resolution, segmentation, and activation from scratch requires significant engineering. Data Cloud provides this declaratively with sub-second latency.


Decision Guide: Data Cloud vs Alternatives

FactorData CloudTraditional ETL/MDMSalesforce Connect
Best forUnified customer view, analytics, AITransactional data syncRead-only external data access
Data volumeBillions of recordsMillions of recordsAny (query-time)
Identity resolutionNative (match + reconcile)Requires separate MDM toolNot available
LatencyNear-real-time to real-timeReal-time to batchReal-time (per query)
Query modelSQL on lakehouseSOQL on platform objectsLimited SOQL subset
Cost modelCredits (consumption)Integration tool licenseSalesforce Connect license
AI/ML readinessNative (Einstein, Agentforce)Requires separate data pipelineNot applicable
ActivationNative to Marketing, CRM, CommerceCustom integration neededNot applicable

CTA positioning

Data Cloud is not a replacement for MuleSoft or traditional integration. It complements integration by providing the unified data layer that integration patterns feed into. Present Data Cloud as the “customer data brain” and integration middleware as the “nervous system” connecting operational systems.


Industry-Specific Patterns

IndustryData Cloud PatternKey Data Sources
HealthcarePatient 360 — unified patient journey across EMR, appointments, wearablesEMR systems, IoT health trackers, scheduling platforms
Financial ServicesClient 360 — transaction history, credit data, relationship insightsCore banking, credit bureaus, wealth platforms
RetailCustomer 360 — omnichannel purchase, loyalty, browsing behaviorPOS, e-commerce, loyalty, clickstream
ManufacturingAsset 360 — IoT sensor data, service history, predictive maintenanceIoT platforms, ERP, field service

Each industry leverages pre-built Data Cloud templates with industry-specific DMOs, reducing implementation time.


Gotchas and Anti-Patterns

Identity resolution is the most expensive operation

At 100,000 credits per million rows, identity resolution can dominate your credit budget. Optimize by reducing source profile volume (deduplicate before ingestion), using precise blocking keys, and running full resolution less frequently.

Data model mapping complexity

Mapping source fields to standard DMOs requires deep understanding of both the source schema and the Customer 360 Data Model. Poor mapping leads to incomplete unified profiles, broken segments, and wasted credits on re-processing.

Streaming vs batch cost gap

Streaming ingestion costs 2.5x batch. Streaming calculated insights cost 53x batch. Streaming activation costs 160x batch. Default to batch unless sub-minute latency is a proven business requirement — not a “nice to have.”

Data Cloud is not a transactional database

Data Cloud is optimized for analytical and profile-unification workloads. It is not designed to replace CRM objects for transactional CRUD operations. Do not use it as a general-purpose data store.


Cross-Domain Impact

  • Data Modeling — DMO design follows many of the same principles as CRM data modeling but with a canonical schema approach
  • External Data — Data Cloud is an alternative to Salesforce Connect for accessing external data; zero-copy extends this further
  • Large Data Volumes — Data Cloud handles billions of records without LDV governor limit concerns
  • Data Quality & Governance — Identity resolution and data spaces are governance mechanisms
  • Integration — Data Cloud complements (not replaces) integration middleware; data streams are an ingestion pattern
  • Security — Data spaces, ABAC policies, and field-level masking enforce data governance

Sources