Data Migration

Data migration is one of the most underestimated workstreams in any Salesforce implementation. A CTA must treat migration as a first-class architectural concern — it touches data modeling, security, integration, and governance simultaneously. Getting it wrong means launch delays, data quality issues, and eroded stakeholder confidence.

Migration Phases

Every migration follows a lifecycle regardless of scale. Skipping phases is the primary cause of migration failures.

graph LR
    A[1. Planning] --> B[2. Profiling]
    B --> C[3. Design]
    C --> D[4. Build]
    D --> E[5. Test]
    E --> F[6. Execute]
    F --> G[7. Validate]
    G -->|Issues found| E
    E -->|Redesign needed| C

    style A fill:#4c6ef5,color:#fff
    style G fill:#51cf66,color:#fff

Phase 1: Planning

Define scope, timeline, and success criteria before touching any data.

Key planning activities:

Inventory all source systems and data stores
Identify which data migrates (not everything should)
Define data ownership and migration team roles
Establish cutover window and downtime tolerance
Set success criteria: record counts, field completeness, relationship integrity
Plan rollback strategy

The “not everything” conversation

CTAs should challenge the assumption that all historical data must migrate. Question: “Do you need 10 years of closed-lost opportunities in Salesforce, or would 2 years of active data suffice with the rest accessible via an archive?” This single conversation can reduce migration scope by 60-80%.

Phase 2: Data Profiling

Analyze source data to understand quality, completeness, and structure before designing mappings.

Profiling checklist:

Record counts per entity and per source system
Field completeness rates (% populated)
Data type mismatches between source and target
Duplicate detection rates
Referential integrity (orphan records, broken relationships)
Character encoding issues (UTF-8, special characters)
Date format inconsistencies
Picklist value mapping requirements

Phase 3: Design

Create the migration architecture: field mappings, transformation rules, load sequence, and error handling.

Design deliverables:

Field mapping documents (source field to target field)
Transformation rules (data cleansing, format conversion, value mapping)
Load sequence diagram (parent objects before child objects)
External ID strategy
Error handling and retry logic
Validation rules to temporarily disable during load

Phase 4: Build

Develop migration scripts, ETL jobs, and automation.

Build considerations:

Configure ETL tool connections and credentials
Build and unit test transformation logic
Create External ID fields where needed
Prepare pre-migration scripts (disable triggers, workflows, validation rules)
Build post-migration scripts (re-enable automations, run sharing recalculation)

Phase 5: Test (Trial Migrations)

Run the migration in a sandbox — ideally multiple times.

Trial migration objectives:

Validate record counts match expectations
Verify relationship integrity (lookups resolve correctly)
Test data transformations produce correct results
Measure timing (will it fit in the cutover window?)
Identify and fix errors before production run
Train the migration team on the execution procedure

The three-run rule

Run at least three full trial migrations. The first finds structural issues. The second tests fixes. The third proves repeatability. Do not go to production with fewer than three clean runs.

Phase 6: Execute (Cutover)

Run the production migration with the full team mobilized.

Cutover activities:

Freeze source systems (or capture delta changes)
Disable automations (triggers, workflows, flows, validation rules)
Run migration in defined sequence
Monitor progress and error logs in real-time
Execute delta migration for changes during cutover window
Re-enable automations
Run sharing recalculation if needed

Phase 7: Validate

Post-migration validation confirms data integrity and completeness.

Validation checks:

Record count reconciliation (source vs target)
Spot-check random records for field accuracy
Verify all relationships resolve (no orphan records)
Run standard reports and compare to source system reports
Test business processes with migrated data (create records, run automation)
User acceptance testing with business stakeholders

Migration Tools

Tool Comparison

Tool	Best For	Volume	Complexity	Cost
Data Loader	Simple loads, ad-hoc	Up to millions	Low	Free
Bulk API 2.0	High-volume programmatic	Millions+	Medium	Platform included
Informatica Cloud	Complex ETL, multiple sources	Unlimited	High	Licensed
MuleSoft	API-led integration + migration	Unlimited	High	Licensed
Jitterbit	Mid-complexity ETL	Millions	Medium	Licensed
Talend	Open-source ETL	Millions	Medium	Free/Licensed
Import Wizard	Small volumes, simple	< 50K	Very Low	Free

Data Loader Deep Dive

Salesforce Data Loader is the standard tool for most migrations:

Insert — Creates new records
Update — Updates existing records (requires Salesforce ID or External ID)
Upsert — Insert or update based on External ID match
Delete — Soft delete records
Hard Delete — Permanently delete (requires Bulk API enabled)
Export / Export All — Extract data including soft-deleted records

Command-line mode enables scripting and scheduling:

# Example: command-line Data Loader for automated loads
process.bat <config-directory> <operation>

Bulk API 2.0

For high-volume migrations, Bulk API 2.0 is the workhorse:

Feature	Bulk API 2.0
Record limit	100 million records per 24-hour period
File format	CSV
Processing	Asynchronous
PK Chunking	Supported (for queries)
Serial mode	Supported (avoids lock contention)
Parallelism	Automatic

Serial vs parallel mode

Use serial mode when migrating data with potential lock contention (e.g., many child records pointing to the same parent). Parallel mode is faster but can cause UNABLE_TO_LOCK_ROW errors on skewed data.

Migration Sequence

The load sequence is critical. Parent records must exist before child records can reference them. External IDs enable upserts that handle this gracefully.

graph TD
    A[Phase 1: Reference Data] --> B[Phase 2: Accounts]
    B --> C[Phase 3: Contacts]
    C --> D[Phase 4: Opportunities]
    D --> E[Phase 5: Products &<br/>Price Books]
    E --> F[Phase 6: Opportunity<br/>Line Items]
    F --> G[Phase 7: Cases]
    G --> H[Phase 8: Activities]
    H --> I[Phase 9: Junction Objects<br/>& Relationships]
    I --> J[Phase 10: Files &<br/>Attachments]

    style A fill:#4c6ef5,color:#fff
    style J fill:#51cf66,color:#fff

Sequence Rules

Users and roles first — OwnerId and sharing depend on users existing
Reference data — Picklist values, record types, products, price books
Parent objects before children — Account before Contact, Contact before Case
Master-detail parents before children — Cannot create detail without master
Junction objects last — Both parents must exist first
Files and attachments last — ContentVersion records reference parent IDs
Activities (Tasks/Events) last — WhoId and WhatId reference multiple object types

Object Dependency Map

For complex migrations, map the full dependency chain to visualize load order constraints. The diagram below shows a typical CRM migration with dependencies.

graph TD
    subgraph Phase0["Phase 0: Foundation"]
        USERS[Users & Roles]
        RT[Record Types]
        PB[Price Books]
    end

    subgraph Phase1["Phase 1: Independent Parents"]
        ACC[Accounts]
        PROD[Products]
        CAMP[Campaigns]
    end

    subgraph Phase2["Phase 2: Dependent Parents"]
        CON[Contacts]
        PBE[Price Book Entries]
    end

    subgraph Phase3["Phase 3: Transactional"]
        OPP[Opportunities]
        CASE[Cases]
        CONTR[Contracts]
    end

    subgraph Phase4["Phase 4: Line Items"]
        OLI[Opportunity Line Items]
        ORD[Orders & Order Items]
    end

    subgraph Phase5["Phase 5: Relationships & Activities"]
        JCT[Junction Objects]
        TASK[Tasks & Events]
        FILES[Files & Attachments]
    end

    USERS --> ACC
    USERS --> CON
    RT --> ACC
    RT --> OPP
    PB --> PBE
    PROD --> PBE
    ACC --> CON
    ACC --> OPP
    ACC --> CASE
    ACC --> CONTR
    CON --> CASE
    OPP --> OLI
    PBE --> OLI
    CONTR --> ORD
    CAMP --> OPP
    OPP --> JCT
    CON --> TASK
    ACC --> TASK
    OPP --> FILES
    CASE --> FILES

    style Phase0 fill:#e7f5ff,color:#333
    style Phase1 fill:#d0ebff,color:#333
    style Phase2 fill:#a5d8ff,color:#333
    style Phase3 fill:#74c0fc,color:#333
    style Phase4 fill:#4dabf7,color:#fff
    style Phase5 fill:#339af0,color:#fff

Lock contention prevention

When loading child records, presort them by parent ID so that records sharing the same parent land in the same batch. This reduces lock contention across parallel batches. For example, sort all Contacts by AccountId before loading so that contacts for the same account are processed together.

External ID Strategy

External IDs are the key to clean, repeatable migrations. They enable upserts and relationship resolution without Salesforce IDs.

Why External IDs Matter

Benefit	Explanation
Upsert capability	Insert new records, update existing ones in a single operation
Relationship resolution	Reference parent records by External ID instead of Salesforce ID
Idempotent loads	Re-running a load does not create duplicates
Source system traceability	Map back to the original system’s record ID
Delta migration	Easily identify records that changed since last load

External ID Design

Create an External ID field on every object that will be migrated
Use the source system’s primary key as the External ID value
Mark the field as unique and external ID (indexed automatically)
For multi-source migrations, prefix with source system identifier (e.g., SAP-12345, LEGACY-67890)

Cutover Strategies

The cutover approach is a major architectural decision that affects risk, downtime, and complexity.

Strategy Comparison

graph TD
    A[Cutover Strategy<br/>Decision] --> B{Downtime tolerance?}
    B -->|High - weekend<br/>cutover acceptable| C[Big Bang]
    B -->|Low - minimal<br/>disruption| D{Data complexity?}
    D -->|Simple - few objects<br/>few sources| E[Phased Migration]
    D -->|Complex - many systems<br/>interdependencies| F{Risk tolerance?}
    F -->|Low risk| G[Parallel Run]
    F -->|Medium risk| E

Strategy	Description	Pros	Cons
Big Bang	All data migrated in a single cutover event	Simple to plan, clean cut, no dual maintenance	High risk, requires downtime, no fallback to old data
Phased	Migrate in stages (by object, by business unit, by geography)	Lower risk per phase, progressive learning	Longer timeline, data split across systems temporarily
Parallel Run	Both old and new systems run simultaneously	Lowest risk, side-by-side validation	Highest cost, dual data entry, reconciliation burden

Cutover Timeline Comparison

gantt
    title Cutover Strategy Timelines
    dateFormat  YYYY-MM-DD
    axisFormat  %b %d

    section Big Bang
    Preparation & freeze     :a1, 2025-03-01, 1d
    Full data migration      :crit, a2, after a1, 2d
    Validation & go-live     :a3, after a2, 1d

    section Phased
    Phase 1 - Accounts       :b1, 2025-03-01, 3d
    Phase 2 - Contacts/Opps  :b2, after b1, 3d
    Phase 3 - Cases/History  :b3, after b2, 3d
    Phase 4 - Files/Activities:b4, after b3, 3d
    Final validation         :b5, after b4, 1d

    section Parallel Run
    Legacy system active     :c1, 2025-03-01, 21d
    New system active        :c2, 2025-03-01, 21d
    Dual data entry period   :crit, c3, 2025-03-01, 14d
    Reconciliation checks    :c4, 2025-03-08, 14d
    Legacy decommission      :c5, after c4, 3d

CTA presentation advice

When presenting your migration strategy, state the cutover approach explicitly and explain why you chose it. “I recommend a phased migration because the 200M record volume exceeds what can be loaded in a single weekend window, and the business cannot tolerate a week of downtime.”

Trial Migrations

Trial migrations are rehearsals that validate every aspect of the production cutover.

What to Measure

Metric	Why It Matters
Total load time	Must fit within the cutover window
Records per hour	Throughput rate for capacity planning
Error rate	Target < 1% — investigate any errors
Relationship success	% of lookups that resolved correctly
Data accuracy	Spot-check sample vs source system

Trial Migration Best Practices

Use a full-copy sandbox for realistic volume testing
Run trials with production-equivalent data volumes (not subsets)
Time every phase — you need accurate estimates for the cutover plan
Document all manual steps — they become the cutover runbook
Run at least three full trials before production cutover
Include rollback testing — verify you can restore if needed

Post-Migration Validation

Automated Validation

Record count reconciliation: Source system count vs Salesforce count per object
Checksum validation: Hash comparison on critical fields
Relationship integrity: Query for orphan records (child records with null lookup to expected parent)
Automation verification: Test that re-enabled triggers, flows, and validation rules fire correctly

Manual Validation

Business stakeholder spot-checks: Have business users verify their own data
Report comparison: Run key business reports and compare to source system reports
Process walkthroughs: Execute end-to-end business processes using migrated data
Edge case verification: Check records with special characters, large text fields, attachments

Pre-Migration Preparation

What to Disable Before Loading

Item	Why Disable	How to Re-enable
Validation rules	Migrated data may not meet current rules	Re-enable after load, backfill violations
Triggers	Avoid unintended automation during load	Re-enable, consider running trigger logic post-load
Workflows	Prevent email alerts and field updates	Re-enable after validation
Process Builder (deprecated) / Flows	Prevent automation side effects	Re-enable, test with sample records
Duplicate rules	Legacy data may have intended duplicates	Re-enable, run dedup after migration
Assignment rules	Prevent reassignment of migrated records	Re-enable for new records
Sharing recalculation	Defer until all data is loaded	Trigger manually after migration

Re-enabling automations

The most common post-migration failure is forgetting to re-enable something. Maintain a checklist of every disabled item with a responsible team member assigned to re-enable and verify each one.

Migration Anti-Patterns

1. No Trial Migrations

Going directly to production cutover without rehearsal. Every migration has surprises — discover them in sandbox.

2. Ignoring Data Quality

Loading dirty data and planning to “clean it up later.” Later never comes. Profile and cleanse before migration.

3. Wrong Sequence

Loading child records before parents, then trying to fix relationships afterward. Use External IDs and correct sequencing.

4. No Rollback Plan

Assuming the migration will succeed. Always plan for how to restore the system if migration fails.

5. Underestimating Volume

Testing with 10K records and discovering the production load of 10M records takes 40 hours instead of the planned 8-hour window.

Cross-Domain Impact

Data Modeling — Migration sequence depends on relationship types (Data Modeling)
Integration — Migration tools are integration tools (Bulk API, middleware) (Integration)
Security — Migrated data must respect sharing model; disable/re-enable carefully (Security)
Data Quality — Pre-migration profiling and cleansing (Data Quality & Governance)
Dev Lifecycle — Migration is a deployable workstream requiring environment strategy (Development Lifecycle)