External Data

Not all data belongs inside Salesforce. A CTA must know when to bring data onto the platform, when to leave it external, and how to bridge the two. This decision affects storage costs, performance, integration complexity, and user experience.

The Fundamental Question

Before designing any data architecture, ask: Should this data live in Salesforce?

graph TD
    A[Data Source Identified] --> B{Do Salesforce users<br/>need this data?}
    B -->|No| C[Keep external.<br/>No integration needed.]
    B -->|Yes| D{How do they<br/>need it?}
    D -->|Read-only reference| E{Data volume?}
    D -->|Read-write,<br/>interactive| F{Is Salesforce the<br/>system of record?}
    E -->|Small < 100K| G[Replicate via ETL<br/>into custom objects]
    E -->|Large > 100K| H{Real-time needed?}
    H -->|Yes| I[Salesforce Connect<br/>External Objects]
    H -->|No| J[Nightly ETL sync<br/>or Data Cloud]
    F -->|Yes| K[Store in Salesforce<br/>Standard/Custom objects]
    F -->|No| L{Latency tolerance?}
    L -->|Low latency OK| I
    L -->|Must be instant| K

    style C fill:#868e96,color:#fff
    style K fill:#51cf66,color:#fff
    style I fill:#4c6ef5,color:#fff
    style G fill:#ffd43b,color:#333
    style J fill:#ffd43b,color:#333

Salesforce Connect

Salesforce Connect enables real-time access to external data through External Objects, without copying data into Salesforce.

How It Works

sequenceDiagram
    participant User
    participant Salesforce
    participant Adapter
    participant ExternalSystem

    User->>Salesforce: View related list / run report
    Salesforce->>Adapter: OData query
    Adapter->>ExternalSystem: Translate to native query
    ExternalSystem->>Adapter: Return results
    Adapter->>Salesforce: OData response
    Salesforce->>User: Display external data

Adapters

Adapter	Connects To	Best For
OData 2.0	Any OData 2.0 endpoint	SAP, Microsoft Dynamics, custom APIs
OData 4.0	Any OData 4.0 endpoint	Modern REST APIs with OData support
Cross-org	Another Salesforce org	Multi-org architectures
Custom (Apex)	Any system via Apex	Systems without OData support

Adapter Architecture Comparison

Each adapter type has a different data flow path. Understanding these paths is critical for latency analysis and troubleshooting.

graph TD
    subgraph ODataFlow["OData Adapter Flow"]
        U1[User Action] --> SF1[Salesforce Platform]
        SF1 --> OA[OData Adapter<br/>translates to OData request]
        OA --> MW[OData Endpoint<br/>middleware or direct]
        MW --> EXT1[External Database<br/>SAP, Dynamics, etc.]
    end

    subgraph CrossOrgFlow["Cross-Org Adapter Flow"]
        U2[User Action] --> SF2[Salesforce Org A]
        SF2 --> XO[Cross-Org Adapter<br/>uses REST API]
        XO --> SF3[Salesforce Org B<br/>direct API call]
    end

    subgraph CustomFlow["Custom Apex Adapter Flow"]
        U3[User Action] --> SF4[Salesforce Platform]
        SF4 --> CA[Custom Apex Class<br/>DataSource.Connection]
        CA --> ANY[Any External System<br/>REST, SOAP, GraphQL, etc.]
    end

    style OA fill:#4c6ef5,color:#fff
    style XO fill:#51cf66,color:#fff
    style CA fill:#ffd43b,color:#333

Latency differences

OData adds translation overhead (Salesforce-to-OData-to-native query). Cross-org is faster for Salesforce-to-Salesforce because it uses the native REST API with no translation layer. Custom adapters have the most flexibility but require Apex development and testing for each external system.

Cross-Org Adapter

The cross-org adapter is particularly important for CTA scenarios involving multi-org strategies:

Connects two Salesforce orgs without middleware
Uses standard Salesforce APIs under the hood
Supports SOQL-like queries across orgs
Subject to API limits on both orgs
Useful for franchise models, acquisitions, or multi-cloud architectures

Salesforce Connect Limits

Limit	Value
External objects per org	100
Rows returned per query	2,000 (page-based)
Named credentials	50 per org
Callout time limit	120 seconds
Monthly callout limit	Based on license type

User experience impact

External objects add latency to every page load and related list render. Users accustomed to sub-second Salesforce response times will notice 1-3 second delays for external data. Set expectations and consider caching strategies.

External Objects Deep Dive

External objects (__x suffix) represent data stored outside Salesforce.

Capabilities and Limitations

Feature	Supported?	Notes
Related lists	Yes	Appear on parent records
List views	Yes	With filter limitations
SOQL	Partial	Subset of operators, no aggregate queries
SOSL (search)	No	External objects are not searchable via global search
Triggers	Limited	After-insert only, asynchronous
Flows	Limited	Some actions supported
Reports	Limited	Can be included in custom report types
Validation rules	No	Validation must happen in external system
Workflow rules	No	Use triggers or flows instead
Approval processes	No	Not supported

Relationship Types for External Objects

Relationship	Description
External lookup	Standard/custom object looks up to external object (by External ID)
Lookup	External object looks up to standard/custom object (by 18-char Salesforce ID)
Indirect lookup	External object looks up to standard/custom object (by unique external ID field)

Indirect lookups are key

Indirect lookups let you relate external objects to Salesforce objects without the external system knowing Salesforce IDs. The external system uses its own ID, and Salesforce matches it against a unique external ID field on the parent object. This is the recommended approach for most scenarios.

Big Objects

Big Objects (__b suffix) store massive volumes of data on the Salesforce platform itself. They complement external storage by keeping data within the platform’s trust boundary.

When to Use Big Objects

Scenario	Why Big Objects
Audit trail archival	Store field history beyond 18-month limit
Historical transactions	Transaction logs, payment history, event logs
IoT telemetry	Sensor data, device events
Regulatory compliance	Long-term record retention on-platform
LDV archival	Move aged records from standard objects

Big Object Constraints

Constraint	Detail
Query	Standard SOQL on indexed fields (Async SOQL retired as of Summer ‘25)
DML	`Database.insertImmediate()` — no standard insert/update
Index	Defined at creation, immutable after
Triggers	Not supported
Reports	Not supported directly (query results into custom objects)
Relationships	Can have lookups but cannot be child in master-detail
Storage	Counts toward Big Object storage, not data storage

Big Objects vs Custom Objects

Understanding when to use Big Objects versus standard Custom Objects is a critical architectural decision. The diagram below highlights the key differentiators.

graph TD
    A[Need to store<br/>structured data] --> B{Expected record<br/>volume?}
    B -->|"< 10M records"| C{Need triggers,<br/>reporting, SOQL?}
    B -->|"10M - 1B records"| D{Need real-time<br/>query access?}
    B -->|"> 1B records"| E[Big Objects<br/>or External Storage]
    C -->|Yes| F[Custom Object]
    C -->|No| G{Audit / compliance<br/>archival use case?}
    G -->|Yes| E
    G -->|No| F
    D -->|Yes| H{Can you use<br/>indexed fields only?}
    D -->|No - batch OK| E
    H -->|Yes| E
    H -->|No - complex queries| I[Keep in Custom Object<br/>+ optimize with LDV<br/>strategies]

    style F fill:#51cf66,color:#fff
    style E fill:#4c6ef5,color:#fff
    style I fill:#ffd43b,color:#333

Capability	Custom Object	Big Object
Record scale	Millions (with LDV tuning)	Billions
SOQL	Full SOQL support	Standard SOQL on indexed fields (Async SOQL retired as of Summer ‘25)
DML	Standard insert/update/delete	`Database.insertImmediate()` only
Triggers	Full support	Not supported
Reporting	Full support	Not supported (query into custom objects)
Index changes	Configurable anytime	Immutable after creation
Storage type	Counts toward data storage	Separate Big Object storage
Use case	Operational data	Archival, audit, telemetry, historical

Big Object Index Design

Big Object indexes are defined at creation and cannot be changed. This makes upfront design critical.

Index fields define query capability (you can only query by index fields)
First index field is the most significant (leftmost in the composite key)
Index determines sort order of results
Maximum 5 fields in the index

Immutable indexes

If you get the Big Object index wrong, you must delete the Big Object and recreate it. All data is lost. Design the index based on how you will query the data, not how you will insert it.

Data Cloud (now Data 360)

Salesforce Data Cloud (now Data 360 as of the October 2025 rebrand; formerly CDP, formerly Salesforce 360 Audiences) is the platform’s answer to unified data management at scale.

Data Cloud Capabilities

Capability	Description
Data ingestion	Ingest from Salesforce CRM, external databases, cloud storage, streaming
Identity resolution	Unify customer profiles across sources
Segmentation	Create audiences based on unified data
Activation	Push segments to marketing channels, CRM, or external systems
Analytics	Query large datasets without platform limits
Calculated insights	Computed metrics available in Salesforce records

Data Cloud Architecture Pipeline

Data Cloud processes data through a defined pipeline: ingest, unify, analyze, and activate. Each stage transforms raw data into actionable customer insights.

graph LR
    subgraph Ingest["1. Ingest"]
        S1[Salesforce CRM]
        S2[Marketing Cloud]
        S3[External DBs<br/>hundreds of connectors]
        S4[Streaming APIs<br/>Web SDK, Mobile]
    end

    subgraph Unify["2. Unify"]
        DM[Data Model<br/>Objects / DMOs]
        IR[Identity Resolution<br/>Match + Reconcile]
        UP[Unified Profile]
    end

    subgraph Analyze["3. Analyze"]
        CI[Calculated Insights<br/>Aggregated metrics]
        SEG[Segmentation<br/>Audience creation]
        DG[Data Graphs<br/>Related DMOs]
    end

    subgraph Activate["4. Activate"]
        CRM[CRM Actions<br/>Flows, Apex]
        MKT[Marketing<br/>Journeys, Ads]
        EXT[External Systems<br/>Data Actions]
    end

    Ingest --> Unify --> Analyze --> Activate

    style Ingest fill:#4c6ef5,color:#fff
    style Unify fill:#51cf66,color:#fff
    style Analyze fill:#ffd43b,color:#333
    style Activate fill:#ff6b6b,color:#fff

Key architectural details:

Storage Native Change Events (SNCE) notify when data changes; Change Data Feed (CDF) identifies what changed — making the platform reactive rather than polling-based
Identity resolution uses matching rules (exact and fuzzy) and reconciliation rules to merge duplicate profiles. Near-real-time pipelines target sub-five-minute turnaround
Calculated insights define aggregated metrics (e.g., lifetime value, engagement score) that can be surfaced on CRM records
Activation pushes segments and insights to any downstream channel — Marketing Cloud journeys, ad platforms, CRM flows, or external systems

When Data Cloud vs Traditional Integration

Factor	Data Cloud	Traditional ETL/Integration
Primary goal	Unified customer view, analytics	Transactional data sync
Data volume	Billions of records	Millions of records
Latency	Near real-time (minutes)	Real-time (seconds) to batch
Query model	SQL-like on data lake	SOQL on platform objects
Cost	Separate Data Cloud license	Integration tool license
Complexity	Data model mapping, identity rules	Field mapping, error handling

Data Virtualization vs Replication

This is a core architectural decision that a CTA must articulate clearly.

Comparison

Dimension	Virtualization (Salesforce Connect)	Replication (ETL/API Sync)
Data freshness	Real-time (live query)	Depends on sync frequency
Performance	Slower (external callout per query)	Faster (local data)
Storage cost	No Salesforce storage consumed	Consumes Salesforce data storage
Offline access	Not available	Available
SOQL support	Limited subset	Full SOQL
Reporting	Limited	Full reporting
Trigger/Flow support	Minimal	Full support
Complexity	Lower (no sync logic)	Higher (sync, conflict resolution)
Availability	Dependent on external system uptime	Independent of external system

Decision Flowchart

graph TD
    A[External data needed<br/>in Salesforce] --> B{Write access<br/>needed?}
    B -->|Yes| C[Replicate into<br/>Salesforce objects]
    B -->|No| D{Users need it in<br/>reports/dashboards?}
    D -->|Yes| E{Volume > 100K<br/>records?}
    D -->|No| F{Real-time freshness<br/>critical?}
    E -->|Yes| G[Data Cloud or<br/>external reporting tool]
    E -->|No| C
    F -->|Yes| H[Salesforce Connect<br/>External Objects]
    F -->|No| I{Sync frequency<br/>tolerance?}
    I -->|Hourly OK| J[Scheduled ETL sync]
    I -->|Daily OK| J
    I -->|Must be live| H

Hybrid Patterns

Most enterprise architectures use a combination of approaches:

Pattern 1: Core + Extended

Core data (Accounts, Contacts, Opportunities) — replicated in Salesforce
Extended data (transaction history, audit logs) — external via Connect or Big Objects
Analytics data (clickstream, IoT) — Data Cloud or external data warehouse

Pattern 2: Warm + Cold Storage

Warm data (recent, actively accessed) — Salesforce standard objects
Cold data (aged, infrequently accessed) — Big Objects or external storage
Archival — External data lake with Salesforce Connect for occasional access

Pattern 3: System of Record + Reference

System of record data — mastered and stored in Salesforce
Reference data (product catalogs, pricing from ERP) — virtualized via Connect
Enrichment data (firmographics, credit scores) — periodic batch sync

CTA Scenario Considerations

When evaluating external data in a CTA scenario, address these questions:

Who is the system of record? — Where is each entity mastered?
What is the access pattern? — Read-only reference or interactive read-write?
What is the acceptable latency? — Sub-second or minutes OK?
What is the data volume? — Hundreds, thousands, or millions of records?
What reporting is needed? — Standard reports or just inline visibility?
What is the availability requirement? — Can the solution tolerate external system downtime?
What are the licensing implications? — Salesforce Connect licenses, Data Cloud licenses?

Cross-Domain Impact

Integration — External data access is an integration pattern (Integration)
Security — External data must respect org sharing model (Security)
LDV — External storage is an LDV archival strategy (Large Data Volumes)
Data Modeling — External objects have different relationship types (Data Modeling)
System Architecture — Multi-org + external data affects org strategy (System Architecture)