Skip to content

External Data

Not all data belongs inside Salesforce. A CTA must know when to bring data onto the platform, when to leave it external, and how to bridge the two. This decision affects storage costs, performance, integration complexity, and user experience.

The Fundamental Question

Before designing any data architecture, ask: Should this data live in Salesforce?

graph TD
    A[Data Source Identified] --> B{Do Salesforce users<br/>need this data?}
    B -->|No| C[Keep external.<br/>No integration needed.]
    B -->|Yes| D{How do they<br/>need it?}
    D -->|Read-only reference| E{Data volume?}
    D -->|Read-write,<br/>interactive| F{Is Salesforce the<br/>system of record?}
    E -->|Small < 100K| G[Replicate via ETL<br/>into custom objects]
    E -->|Large > 100K| H{Real-time needed?}
    H -->|Yes| I[Salesforce Connect<br/>External Objects]
    H -->|No| J[Nightly ETL sync<br/>or Data Cloud]
    F -->|Yes| K[Store in Salesforce<br/>Standard/Custom objects]
    F -->|No| L{Latency tolerance?}
    L -->|Low latency OK| I
    L -->|Must be instant| K

    style C fill:#868e96,color:#fff
    style K fill:#51cf66,color:#fff
    style I fill:#4c6ef5,color:#fff
    style G fill:#ffd43b,color:#333
    style J fill:#ffd43b,color:#333

Salesforce Connect

Salesforce Connect enables real-time access to external data through External Objects, without copying data into Salesforce.

How It Works

sequenceDiagram
    participant User
    participant Salesforce
    participant Adapter
    participant ExternalSystem

    User->>Salesforce: View related list / run report
    Salesforce->>Adapter: OData query
    Adapter->>ExternalSystem: Translate to native query
    ExternalSystem->>Adapter: Return results
    Adapter->>Salesforce: OData response
    Salesforce->>User: Display external data

Adapters

AdapterConnects ToBest For
OData 2.0Any OData 2.0 endpointSAP, Microsoft Dynamics, custom APIs
OData 4.0Any OData 4.0 endpointModern REST APIs with OData support
Cross-orgAnother Salesforce orgMulti-org architectures
Custom (Apex)Any system via ApexSystems without OData support

Adapter Architecture Comparison

Each adapter type has a different data flow path. Understanding these paths is critical for latency analysis and troubleshooting.

graph TD
    subgraph ODataFlow["OData Adapter Flow"]
        U1[User Action] --> SF1[Salesforce Platform]
        SF1 --> OA[OData Adapter<br/>translates to OData request]
        OA --> MW[OData Endpoint<br/>middleware or direct]
        MW --> EXT1[External Database<br/>SAP, Dynamics, etc.]
    end

    subgraph CrossOrgFlow["Cross-Org Adapter Flow"]
        U2[User Action] --> SF2[Salesforce Org A]
        SF2 --> XO[Cross-Org Adapter<br/>uses REST API]
        XO --> SF3[Salesforce Org B<br/>direct API call]
    end

    subgraph CustomFlow["Custom Apex Adapter Flow"]
        U3[User Action] --> SF4[Salesforce Platform]
        SF4 --> CA[Custom Apex Class<br/>DataSource.Connection]
        CA --> ANY[Any External System<br/>REST, SOAP, GraphQL, etc.]
    end

    style OA fill:#4c6ef5,color:#fff
    style XO fill:#51cf66,color:#fff
    style CA fill:#ffd43b,color:#333

Latency differences

OData adds translation overhead (Salesforce-to-OData-to-native query). Cross-org is faster for Salesforce-to-Salesforce because it uses the native REST API with no translation layer. Custom adapters have the most flexibility but require Apex development and testing for each external system.

Cross-Org Adapter

The cross-org adapter is particularly important for CTA scenarios involving multi-org strategies:

  • Connects two Salesforce orgs without middleware
  • Uses standard Salesforce APIs under the hood
  • Supports SOQL-like queries across orgs
  • Subject to API limits on both orgs
  • Useful for franchise models, acquisitions, or multi-cloud architectures

Salesforce Connect Limits

LimitValue
External objects per org100
Rows returned per query2,000 (page-based)
Named credentials50 per org
Callout time limit120 seconds
Monthly callout limitBased on license type

User experience impact

External objects add latency to every page load and related list render. Users accustomed to sub-second Salesforce response times will notice 1-3 second delays for external data. Set expectations and consider caching strategies.


External Objects Deep Dive

External objects (__x suffix) represent data stored outside Salesforce.

Capabilities and Limitations

FeatureSupported?Notes
Related listsYesAppear on parent records
List viewsYesWith filter limitations
SOQLPartialSubset of operators, no aggregate queries
SOSL (search)NoExternal objects are not searchable via global search
TriggersLimitedAfter-insert only, asynchronous
FlowsLimitedSome actions supported
ReportsLimitedCan be included in custom report types
Validation rulesNoValidation must happen in external system
Workflow rulesNoUse triggers or flows instead
Approval processesNoNot supported

Relationship Types for External Objects

RelationshipDescription
External lookupStandard/custom object looks up to external object (by External ID)
LookupExternal object looks up to standard/custom object (by 18-char Salesforce ID)
Indirect lookupExternal object looks up to standard/custom object (by unique external ID field)

Indirect lookups are key

Indirect lookups let you relate external objects to Salesforce objects without the external system knowing Salesforce IDs. The external system uses its own ID, and Salesforce matches it against a unique external ID field on the parent object. This is the recommended approach for most scenarios.


Big Objects

Big Objects (__b suffix) store massive volumes of data on the Salesforce platform itself. They complement external storage by keeping data within the platform’s trust boundary.

When to Use Big Objects

ScenarioWhy Big Objects
Audit trail archivalStore field history beyond 18-month limit
Historical transactionsTransaction logs, payment history, event logs
IoT telemetrySensor data, device events
Regulatory complianceLong-term record retention on-platform
LDV archivalMove aged records from standard objects

Big Object Constraints

ConstraintDetail
QueryStandard SOQL on indexed fields (Async SOQL retired as of Summer ‘25)
DMLDatabase.insertImmediate() — no standard insert/update
IndexDefined at creation, immutable after
TriggersNot supported
ReportsNot supported directly (query results into custom objects)
RelationshipsCan have lookups but cannot be child in master-detail
StorageCounts toward Big Object storage, not data storage

Big Objects vs Custom Objects

Understanding when to use Big Objects versus standard Custom Objects is a critical architectural decision. The diagram below highlights the key differentiators.

graph TD
    A[Need to store<br/>structured data] --> B{Expected record<br/>volume?}
    B -->|"< 10M records"| C{Need triggers,<br/>reporting, SOQL?}
    B -->|"10M - 1B records"| D{Need real-time<br/>query access?}
    B -->|"> 1B records"| E[Big Objects<br/>or External Storage]
    C -->|Yes| F[Custom Object]
    C -->|No| G{Audit / compliance<br/>archival use case?}
    G -->|Yes| E
    G -->|No| F
    D -->|Yes| H{Can you use<br/>indexed fields only?}
    D -->|No - batch OK| E
    H -->|Yes| E
    H -->|No - complex queries| I[Keep in Custom Object<br/>+ optimize with LDV<br/>strategies]

    style F fill:#51cf66,color:#fff
    style E fill:#4c6ef5,color:#fff
    style I fill:#ffd43b,color:#333
CapabilityCustom ObjectBig Object
Record scaleMillions (with LDV tuning)Billions
SOQLFull SOQL supportStandard SOQL on indexed fields (Async SOQL retired as of Summer ‘25)
DMLStandard insert/update/deleteDatabase.insertImmediate() only
TriggersFull supportNot supported
ReportingFull supportNot supported (query into custom objects)
Index changesConfigurable anytimeImmutable after creation
Storage typeCounts toward data storageSeparate Big Object storage
Use caseOperational dataArchival, audit, telemetry, historical

Big Object Index Design

Big Object indexes are defined at creation and cannot be changed. This makes upfront design critical.

  • Index fields define query capability (you can only query by index fields)
  • First index field is the most significant (leftmost in the composite key)
  • Index determines sort order of results
  • Maximum 5 fields in the index

Immutable indexes

If you get the Big Object index wrong, you must delete the Big Object and recreate it. All data is lost. Design the index based on how you will query the data, not how you will insert it.


Data Cloud (now Data 360)

Salesforce Data Cloud (now Data 360 as of the October 2025 rebrand; formerly CDP, formerly Salesforce 360 Audiences) is the platform’s answer to unified data management at scale.

Data Cloud Capabilities

CapabilityDescription
Data ingestionIngest from Salesforce CRM, external databases, cloud storage, streaming
Identity resolutionUnify customer profiles across sources
SegmentationCreate audiences based on unified data
ActivationPush segments to marketing channels, CRM, or external systems
AnalyticsQuery large datasets without platform limits
Calculated insightsComputed metrics available in Salesforce records

Data Cloud Architecture Pipeline

Data Cloud processes data through a defined pipeline: ingest, unify, analyze, and activate. Each stage transforms raw data into actionable customer insights.

graph LR
    subgraph Ingest["1. Ingest"]
        S1[Salesforce CRM]
        S2[Marketing Cloud]
        S3[External DBs<br/>hundreds of connectors]
        S4[Streaming APIs<br/>Web SDK, Mobile]
    end

    subgraph Unify["2. Unify"]
        DM[Data Model<br/>Objects / DMOs]
        IR[Identity Resolution<br/>Match + Reconcile]
        UP[Unified Profile]
    end

    subgraph Analyze["3. Analyze"]
        CI[Calculated Insights<br/>Aggregated metrics]
        SEG[Segmentation<br/>Audience creation]
        DG[Data Graphs<br/>Related DMOs]
    end

    subgraph Activate["4. Activate"]
        CRM[CRM Actions<br/>Flows, Apex]
        MKT[Marketing<br/>Journeys, Ads]
        EXT[External Systems<br/>Data Actions]
    end

    Ingest --> Unify --> Analyze --> Activate

    style Ingest fill:#4c6ef5,color:#fff
    style Unify fill:#51cf66,color:#fff
    style Analyze fill:#ffd43b,color:#333
    style Activate fill:#ff6b6b,color:#fff

Key architectural details:

  • Storage Native Change Events (SNCE) notify when data changes; Change Data Feed (CDF) identifies what changed — making the platform reactive rather than polling-based
  • Identity resolution uses matching rules (exact and fuzzy) and reconciliation rules to merge duplicate profiles. Near-real-time pipelines target sub-five-minute turnaround
  • Calculated insights define aggregated metrics (e.g., lifetime value, engagement score) that can be surfaced on CRM records
  • Activation pushes segments and insights to any downstream channel — Marketing Cloud journeys, ad platforms, CRM flows, or external systems

When Data Cloud vs Traditional Integration

FactorData CloudTraditional ETL/Integration
Primary goalUnified customer view, analyticsTransactional data sync
Data volumeBillions of recordsMillions of records
LatencyNear real-time (minutes)Real-time (seconds) to batch
Query modelSQL-like on data lakeSOQL on platform objects
CostSeparate Data Cloud licenseIntegration tool license
ComplexityData model mapping, identity rulesField mapping, error handling

Data Virtualization vs Replication

This is a core architectural decision that a CTA must articulate clearly.

Comparison

DimensionVirtualization (Salesforce Connect)Replication (ETL/API Sync)
Data freshnessReal-time (live query)Depends on sync frequency
PerformanceSlower (external callout per query)Faster (local data)
Storage costNo Salesforce storage consumedConsumes Salesforce data storage
Offline accessNot availableAvailable
SOQL supportLimited subsetFull SOQL
ReportingLimitedFull reporting
Trigger/Flow supportMinimalFull support
ComplexityLower (no sync logic)Higher (sync, conflict resolution)
AvailabilityDependent on external system uptimeIndependent of external system

Decision Flowchart

graph TD
    A[External data needed<br/>in Salesforce] --> B{Write access<br/>needed?}
    B -->|Yes| C[Replicate into<br/>Salesforce objects]
    B -->|No| D{Users need it in<br/>reports/dashboards?}
    D -->|Yes| E{Volume > 100K<br/>records?}
    D -->|No| F{Real-time freshness<br/>critical?}
    E -->|Yes| G[Data Cloud or<br/>external reporting tool]
    E -->|No| C
    F -->|Yes| H[Salesforce Connect<br/>External Objects]
    F -->|No| I{Sync frequency<br/>tolerance?}
    I -->|Hourly OK| J[Scheduled ETL sync]
    I -->|Daily OK| J
    I -->|Must be live| H

Hybrid Patterns

Most enterprise architectures use a combination of approaches:

Pattern 1: Core + Extended

  • Core data (Accounts, Contacts, Opportunities) — replicated in Salesforce
  • Extended data (transaction history, audit logs) — external via Connect or Big Objects
  • Analytics data (clickstream, IoT) — Data Cloud or external data warehouse

Pattern 2: Warm + Cold Storage

  • Warm data (recent, actively accessed) — Salesforce standard objects
  • Cold data (aged, infrequently accessed) — Big Objects or external storage
  • Archival — External data lake with Salesforce Connect for occasional access

Pattern 3: System of Record + Reference

  • System of record data — mastered and stored in Salesforce
  • Reference data (product catalogs, pricing from ERP) — virtualized via Connect
  • Enrichment data (firmographics, credit scores) — periodic batch sync

CTA Scenario Considerations

When evaluating external data in a CTA scenario, address these questions:

  1. Who is the system of record? — Where is each entity mastered?
  2. What is the access pattern? — Read-only reference or interactive read-write?
  3. What is the acceptable latency? — Sub-second or minutes OK?
  4. What is the data volume? — Hundreds, thousands, or millions of records?
  5. What reporting is needed? — Standard reports or just inline visibility?
  6. What is the availability requirement? — Can the solution tolerate external system downtime?
  7. What are the licensing implications? — Salesforce Connect licenses, Data Cloud licenses?

Cross-Domain Impact

  • Integration — External data access is an integration pattern (Integration)
  • Security — External data must respect org sharing model (Security)
  • LDV — External storage is an LDV archival strategy (Large Data Volumes)
  • Data Modeling — External objects have different relationship types (Data Modeling)
  • System Architecture — Multi-org + external data affects org strategy (System Architecture)

Sources