Data Migration
Data migration is one of the most underestimated workstreams in any Salesforce implementation. A CTA must treat migration as a first-class architectural concern — it touches data modeling, security, integration, and governance simultaneously. Getting it wrong means launch delays, data quality issues, and eroded stakeholder confidence.
Migration Phases
Every migration follows a lifecycle regardless of scale. Skipping phases is the primary cause of migration failures.
graph LR
A[1. Planning] --> B[2. Profiling]
B --> C[3. Design]
C --> D[4. Build]
D --> E[5. Test]
E --> F[6. Execute]
F --> G[7. Validate]
G -->|Issues found| E
E -->|Redesign needed| C
style A fill:#4c6ef5,color:#fff
style G fill:#51cf66,color:#fff
Phase 1: Planning
Define scope, timeline, and success criteria before touching any data.
Key planning activities:
- Inventory all source systems and data stores
- Identify which data migrates (not everything should)
- Define data ownership and migration team roles
- Establish cutover window and downtime tolerance
- Set success criteria: record counts, field completeness, relationship integrity
- Plan rollback strategy
The “not everything” conversation
CTAs should challenge the assumption that all historical data must migrate. Question: “Do you need 10 years of closed-lost opportunities in Salesforce, or would 2 years of active data suffice with the rest accessible via an archive?” This single conversation can reduce migration scope by 60-80%.
Phase 2: Data Profiling
Analyze source data to understand quality, completeness, and structure before designing mappings.
Profiling checklist:
- Record counts per entity and per source system
- Field completeness rates (% populated)
- Data type mismatches between source and target
- Duplicate detection rates
- Referential integrity (orphan records, broken relationships)
- Character encoding issues (UTF-8, special characters)
- Date format inconsistencies
- Picklist value mapping requirements
Phase 3: Design
Create the migration architecture: field mappings, transformation rules, load sequence, and error handling.
Design deliverables:
- Field mapping documents (source field to target field)
- Transformation rules (data cleansing, format conversion, value mapping)
- Load sequence diagram (parent objects before child objects)
- External ID strategy
- Error handling and retry logic
- Validation rules to temporarily disable during load
Phase 4: Build
Develop migration scripts, ETL jobs, and automation.
Build considerations:
- Configure ETL tool connections and credentials
- Build and unit test transformation logic
- Create External ID fields where needed
- Prepare pre-migration scripts (disable triggers, workflows, validation rules)
- Build post-migration scripts (re-enable automations, run sharing recalculation)
Phase 5: Test (Trial Migrations)
Run the migration in a sandbox — ideally multiple times.
Trial migration objectives:
- Validate record counts match expectations
- Verify relationship integrity (lookups resolve correctly)
- Test data transformations produce correct results
- Measure timing (will it fit in the cutover window?)
- Identify and fix errors before production run
- Train the migration team on the execution procedure
The three-run rule
Run at least three full trial migrations. The first finds structural issues. The second tests fixes. The third proves repeatability. Do not go to production with fewer than three clean runs.
Phase 6: Execute (Cutover)
Run the production migration with the full team mobilized.
Cutover activities:
- Freeze source systems (or capture delta changes)
- Disable automations (triggers, workflows, flows, validation rules)
- Run migration in defined sequence
- Monitor progress and error logs in real-time
- Execute delta migration for changes during cutover window
- Re-enable automations
- Run sharing recalculation if needed
Phase 7: Validate
Post-migration validation confirms data integrity and completeness.
Validation checks:
- Record count reconciliation (source vs target)
- Spot-check random records for field accuracy
- Verify all relationships resolve (no orphan records)
- Run standard reports and compare to source system reports
- Test business processes with migrated data (create records, run automation)
- User acceptance testing with business stakeholders
Migration Tools
Tool Comparison
| Tool | Best For | Volume | Complexity | Cost |
|---|---|---|---|---|
| Data Loader | Simple loads, ad-hoc | Up to millions | Low | Free |
| Bulk API 2.0 | High-volume programmatic | Millions+ | Medium | Platform included |
| Informatica Cloud | Complex ETL, multiple sources | Unlimited | High | Licensed |
| MuleSoft | API-led integration + migration | Unlimited | High | Licensed |
| Jitterbit | Mid-complexity ETL | Millions | Medium | Licensed |
| Talend | Open-source ETL | Millions | Medium | Free/Licensed |
| Import Wizard | Small volumes, simple | < 50K | Very Low | Free |
Data Loader Deep Dive
Salesforce Data Loader is the standard tool for most migrations:
- Insert — Creates new records
- Update — Updates existing records (requires Salesforce ID or External ID)
- Upsert — Insert or update based on External ID match
- Delete — Soft delete records
- Hard Delete — Permanently delete (requires Bulk API enabled)
- Export / Export All — Extract data including soft-deleted records
Command-line mode enables scripting and scheduling:
# Example: command-line Data Loader for automated loadsprocess.bat <config-directory> <operation>Bulk API 2.0
For high-volume migrations, Bulk API 2.0 is the workhorse:
| Feature | Bulk API 2.0 |
|---|---|
| Record limit | 100 million records per 24-hour period |
| File format | CSV |
| Processing | Asynchronous |
| PK Chunking | Supported (for queries) |
| Serial mode | Supported (avoids lock contention) |
| Parallelism | Automatic |
Serial vs parallel mode
Use serial mode when migrating data with potential lock contention (e.g., many child records pointing to the same parent). Parallel mode is faster but can cause UNABLE_TO_LOCK_ROW errors on skewed data.
Migration Sequence
The load sequence is critical. Parent records must exist before child records can reference them. External IDs enable upserts that handle this gracefully.
graph TD
A[Phase 1: Reference Data] --> B[Phase 2: Accounts]
B --> C[Phase 3: Contacts]
C --> D[Phase 4: Opportunities]
D --> E[Phase 5: Products &<br/>Price Books]
E --> F[Phase 6: Opportunity<br/>Line Items]
F --> G[Phase 7: Cases]
G --> H[Phase 8: Activities]
H --> I[Phase 9: Junction Objects<br/>& Relationships]
I --> J[Phase 10: Files &<br/>Attachments]
style A fill:#4c6ef5,color:#fff
style J fill:#51cf66,color:#fff
Sequence Rules
- Users and roles first — OwnerId and sharing depend on users existing
- Reference data — Picklist values, record types, products, price books
- Parent objects before children — Account before Contact, Contact before Case
- Master-detail parents before children — Cannot create detail without master
- Junction objects last — Both parents must exist first
- Files and attachments last — ContentVersion records reference parent IDs
- Activities (Tasks/Events) last — WhoId and WhatId reference multiple object types
Object Dependency Map
For complex migrations, map the full dependency chain to visualize load order constraints. The diagram below shows a typical CRM migration with dependencies.
graph TD
subgraph Phase0["Phase 0: Foundation"]
USERS[Users & Roles]
RT[Record Types]
PB[Price Books]
end
subgraph Phase1["Phase 1: Independent Parents"]
ACC[Accounts]
PROD[Products]
CAMP[Campaigns]
end
subgraph Phase2["Phase 2: Dependent Parents"]
CON[Contacts]
PBE[Price Book Entries]
end
subgraph Phase3["Phase 3: Transactional"]
OPP[Opportunities]
CASE[Cases]
CONTR[Contracts]
end
subgraph Phase4["Phase 4: Line Items"]
OLI[Opportunity Line Items]
ORD[Orders & Order Items]
end
subgraph Phase5["Phase 5: Relationships & Activities"]
JCT[Junction Objects]
TASK[Tasks & Events]
FILES[Files & Attachments]
end
USERS --> ACC
USERS --> CON
RT --> ACC
RT --> OPP
PB --> PBE
PROD --> PBE
ACC --> CON
ACC --> OPP
ACC --> CASE
ACC --> CONTR
CON --> CASE
OPP --> OLI
PBE --> OLI
CONTR --> ORD
CAMP --> OPP
OPP --> JCT
CON --> TASK
ACC --> TASK
OPP --> FILES
CASE --> FILES
style Phase0 fill:#e7f5ff,color:#333
style Phase1 fill:#d0ebff,color:#333
style Phase2 fill:#a5d8ff,color:#333
style Phase3 fill:#74c0fc,color:#333
style Phase4 fill:#4dabf7,color:#fff
style Phase5 fill:#339af0,color:#fff
Lock contention prevention
When loading child records, presort them by parent ID so that records sharing the same parent land in the same batch. This reduces lock contention across parallel batches. For example, sort all Contacts by AccountId before loading so that contacts for the same account are processed together.
External ID Strategy
External IDs are the key to clean, repeatable migrations. They enable upserts and relationship resolution without Salesforce IDs.
Why External IDs Matter
| Benefit | Explanation |
|---|---|
| Upsert capability | Insert new records, update existing ones in a single operation |
| Relationship resolution | Reference parent records by External ID instead of Salesforce ID |
| Idempotent loads | Re-running a load does not create duplicates |
| Source system traceability | Map back to the original system’s record ID |
| Delta migration | Easily identify records that changed since last load |
External ID Design
- Create an External ID field on every object that will be migrated
- Use the source system’s primary key as the External ID value
- Mark the field as unique and external ID (indexed automatically)
- For multi-source migrations, prefix with source system identifier (e.g.,
SAP-12345,LEGACY-67890)
Cutover Strategies
The cutover approach is a major architectural decision that affects risk, downtime, and complexity.
Strategy Comparison
graph TD
A[Cutover Strategy<br/>Decision] --> B{Downtime tolerance?}
B -->|High - weekend<br/>cutover acceptable| C[Big Bang]
B -->|Low - minimal<br/>disruption| D{Data complexity?}
D -->|Simple - few objects<br/>few sources| E[Phased Migration]
D -->|Complex - many systems<br/>interdependencies| F{Risk tolerance?}
F -->|Low risk| G[Parallel Run]
F -->|Medium risk| E
| Strategy | Description | Pros | Cons |
|---|---|---|---|
| Big Bang | All data migrated in a single cutover event | Simple to plan, clean cut, no dual maintenance | High risk, requires downtime, no fallback to old data |
| Phased | Migrate in stages (by object, by business unit, by geography) | Lower risk per phase, progressive learning | Longer timeline, data split across systems temporarily |
| Parallel Run | Both old and new systems run simultaneously | Lowest risk, side-by-side validation | Highest cost, dual data entry, reconciliation burden |
Cutover Timeline Comparison
gantt
title Cutover Strategy Timelines
dateFormat YYYY-MM-DD
axisFormat %b %d
section Big Bang
Preparation & freeze :a1, 2025-03-01, 1d
Full data migration :crit, a2, after a1, 2d
Validation & go-live :a3, after a2, 1d
section Phased
Phase 1 - Accounts :b1, 2025-03-01, 3d
Phase 2 - Contacts/Opps :b2, after b1, 3d
Phase 3 - Cases/History :b3, after b2, 3d
Phase 4 - Files/Activities:b4, after b3, 3d
Final validation :b5, after b4, 1d
section Parallel Run
Legacy system active :c1, 2025-03-01, 21d
New system active :c2, 2025-03-01, 21d
Dual data entry period :crit, c3, 2025-03-01, 14d
Reconciliation checks :c4, 2025-03-08, 14d
Legacy decommission :c5, after c4, 3d
CTA presentation advice
When presenting your migration strategy, state the cutover approach explicitly and explain why you chose it. “I recommend a phased migration because the 200M record volume exceeds what can be loaded in a single weekend window, and the business cannot tolerate a week of downtime.”
Trial Migrations
Trial migrations are rehearsals that validate every aspect of the production cutover.
What to Measure
| Metric | Why It Matters |
|---|---|
| Total load time | Must fit within the cutover window |
| Records per hour | Throughput rate for capacity planning |
| Error rate | Target < 1% — investigate any errors |
| Relationship success | % of lookups that resolved correctly |
| Data accuracy | Spot-check sample vs source system |
Trial Migration Best Practices
- Use a full-copy sandbox for realistic volume testing
- Run trials with production-equivalent data volumes (not subsets)
- Time every phase — you need accurate estimates for the cutover plan
- Document all manual steps — they become the cutover runbook
- Run at least three full trials before production cutover
- Include rollback testing — verify you can restore if needed
Post-Migration Validation
Automated Validation
- Record count reconciliation: Source system count vs Salesforce count per object
- Checksum validation: Hash comparison on critical fields
- Relationship integrity: Query for orphan records (child records with null lookup to expected parent)
- Automation verification: Test that re-enabled triggers, flows, and validation rules fire correctly
Manual Validation
- Business stakeholder spot-checks: Have business users verify their own data
- Report comparison: Run key business reports and compare to source system reports
- Process walkthroughs: Execute end-to-end business processes using migrated data
- Edge case verification: Check records with special characters, large text fields, attachments
Pre-Migration Preparation
What to Disable Before Loading
| Item | Why Disable | How to Re-enable |
|---|---|---|
| Validation rules | Migrated data may not meet current rules | Re-enable after load, backfill violations |
| Triggers | Avoid unintended automation during load | Re-enable, consider running trigger logic post-load |
| Workflows | Prevent email alerts and field updates | Re-enable after validation |
| Process Builder (deprecated) / Flows | Prevent automation side effects | Re-enable, test with sample records |
| Duplicate rules | Legacy data may have intended duplicates | Re-enable, run dedup after migration |
| Assignment rules | Prevent reassignment of migrated records | Re-enable for new records |
| Sharing recalculation | Defer until all data is loaded | Trigger manually after migration |
Re-enabling automations
The most common post-migration failure is forgetting to re-enable something. Maintain a checklist of every disabled item with a responsible team member assigned to re-enable and verify each one.
Migration Anti-Patterns
1. No Trial Migrations
Going directly to production cutover without rehearsal. Every migration has surprises — discover them in sandbox.
2. Ignoring Data Quality
Loading dirty data and planning to “clean it up later.” Later never comes. Profile and cleanse before migration.
3. Wrong Sequence
Loading child records before parents, then trying to fix relationships afterward. Use External IDs and correct sequencing.
4. No Rollback Plan
Assuming the migration will succeed. Always plan for how to restore the system if migration fails.
5. Underestimating Volume
Testing with 10K records and discovering the production load of 10M records takes 40 hours instead of the planned 8-hour window.
Cross-Domain Impact
- Data Modeling — Migration sequence depends on relationship types (Data Modeling)
- Integration — Migration tools are integration tools (Bulk API, middleware) (Integration)
- Security — Migrated data must respect sharing model; disable/re-enable carefully (Security)
- Data Quality — Pre-migration profiling and cleansing (Data Quality & Governance)
- Dev Lifecycle — Migration is a deployable workstream requiring environment strategy (Development Lifecycle)
Sources
- Salesforce Architects: Data 360 Architecture
- Salesforce Developer Blog: Extreme Data Loading Part 4 — Sequencing Load Operations
- Salesforce Help: Object Update Order for Large Data Loads or Data Migrations
- Salesforce Help: Best Practices When You Migrate Data
- Salesforce Help: Data Loader
- Salesforce Developer: Bulk API 2.0 Developer Guide
- Salesforce Help: External ID Fields
- Trailhead: Improve Data Load Strategy in Salesforce
- CTA Study Guide: Data Domain — Migration
- Trailhead: Large Data Volumes Module