Skip to content

Migration: Quick Reference

Cheat sheet for data migration strategy at the CTA review board. For the full deep dive, see Data Migration and Data Quality & Governance.

Migration Phases at a Glance

graph LR
    A[1. Plan] --> B[2. Profile]
    B --> C[3. Design]
    C --> D[4. Build]
    D --> E[5. Test]
    E --> F[6. Execute]
    F --> G[7. Validate]
    G -->|Issues| E
    E -->|Redesign| C

    style A fill:#4c6ef5,color:#fff
    style G fill:#51cf66,color:#fff
PhaseKey DeliverableCritical Activity
1. PlanScope document, success criteriaInventory sources, define cutover window, plan rollback
2. ProfileData quality assessmentRecord counts, completeness rates, duplicate detection, encoding issues
3. DesignField mappings, transformation rulesLoad sequence, External ID strategy, error handling design
4. BuildETL jobs, migration scriptsPre-migration scripts (disable automations), post-migration scripts
5. TestTrial migration results3+ full trial runs with production-equivalent volumes
6. ExecuteCutover runbookFreeze source, disable automations, load, delta migration, re-enable
7. ValidateReconciliation reportRecord counts, relationship integrity, spot-checks, UAT

The three-run rule

Run at least 3 full trial migrations. First finds structural issues. Second validates fixes. Third proves repeatability. Never go to production with fewer than 3 clean runs.

Tool Selection Matrix

ToolVolumeComplexityCostBest For
Import Wizard< 50KVery LowFreeOne-time simple imports for admins
Data LoaderMillionsLow-MedFreeStandard migrations, ad-hoc loads, CLI scripting
Bulk API 2.0100M+MediumIncludedHigh-volume programmatic loads
Informatica CloudUnlimitedHighLicensedComplex ETL, multiple sources, data governance
MuleSoftUnlimitedHighLicensedAPI-led integration + migration, ongoing sync
JitterbitMillionsMediumLicensedMid-complexity ETL, good UI
TalendMillionsMediumFree/LicensedOpen-source ETL, budget-conscious
Dataloader.ioMillionsLowFreemiumCloud-native, drag-and-drop, scheduled loads

Tool Decision Quick Test

graph TD
    A[Migration Needed] --> B{Volume?}
    B -->|< 50K records| C[Import Wizard]
    B -->|50K - 5M| D{Complexity?}
    B -->|5M+| E{Ongoing sync<br/>needed?}
    D -->|Simple field mapping| F[Data Loader]
    D -->|Complex transforms,<br/>multiple sources| G{Budget?}
    E -->|Yes| H[MuleSoft or<br/>Informatica]
    E -->|No - one time| I[Bulk API 2.0 +<br/>Data Loader CLI]
    G -->|Licensed tool OK| H
    G -->|Free/minimal| F

    style C fill:#51cf66,color:#fff
    style F fill:#4c6ef5,color:#fff
    style H fill:#ffd43b,color:#333
    style I fill:#4c6ef5,color:#fff

Bulk API 2.0 Limits

LimitValue
Records per 24-hour period100 million
Records per job~2.5 million (internal batching at 10K)
Concurrent jobs25 (shared with Bulk API 1.0)
Batches per 24-hour period15,000 (shared with Bulk API 1.0)
File formatCSV
PK chunking (queries)Default 100K, max 250K per chunk
Processing modeParallel (default) or Serial

Serial vs Parallel

Use serial mode when loading child records that share parent records (avoids UNABLE_TO_LOCK_ROW). Use parallel mode for independent records with no shared parents. Default is parallel.

Load Sequence — Mandatory Order

graph TD
    A["1. Users & Roles"] --> B["2. Reference Data<br/>(picklists, record types,<br/>products, price books)"]
    B --> C["3. Accounts"]
    C --> D["4. Contacts"]
    D --> E["5. Opportunities"]
    E --> F["6. Line Items<br/>(OLI, QLI)"]
    F --> G["7. Cases"]
    G --> H["8. Activities<br/>(Tasks, Events)"]
    H --> I["9. Junction Objects"]
    I --> J["10. Files &<br/>Attachments"]

    style A fill:#4c6ef5,color:#fff
    style J fill:#51cf66,color:#fff

The rule: Parents before children. Masters before details. Both sides of junction before junction. Files last.

Sequence Cheat Sheet

OrderObject TypeWhy This Order
1Users, Roles, ProfilesOwnerId depends on User records existing
2Reference dataProducts, Price Books, Record Types used by other objects
3AccountsParent of Contacts, Opportunities, Cases
4ContactsParent of Cases, Activities; child of Account
5OpportunitiesChild of Account; parent of Line Items
6Line ItemsChild of Opportunity + Product (both must exist)
7CasesChild of Account + Contact
8ActivitiesWhoId/WhatId polymorphic — reference multiple object types
9Junction objectsBoth parent records must exist first
10Files/AttachmentsContentVersion references parent record IDs

External ID Strategy

BenefitHow It Helps
Upsert capabilityInsert or update in one operation — no duplicates
Relationship resolutionReference parents by External ID, not Salesforce ID
Idempotent loadsRe-run a load safely without creating duplicates
Source traceabilityMap back to original system record
Delta migrationIdentify what changed since last load

Design rules:

  • Create External ID field on every migrated object
  • Use source system primary key as the value
  • Mark as unique + External ID (auto-indexed)
  • Multi-source: prefix with source identifier (SAP-12345, LEGACY-67890)

Cutover Strategies

StrategyBest WhenRiskCostDowntime
Big Bang< 10M records, clear weekend windowHighLowSingle window
PhasedLarge volume, multiple BUs, or high riskMediumMediumMultiple small windows
Parallel RunRegulatory, financial, zero-error toleranceLowHighNone (dual entry)

Decision Quick Test

FactorBig BangPhasedParallel
Volume fits in one cutover windowX
Business cannot tolerate extended downtimeX
Multiple source systems with dependenciesX
Data errors have legal consequencesX
Budget-constrainedX
Multiple BUs with different readinessX

Pre-Migration Checklist

What to Disable Before Loading

ItemWhyRe-enable Action
Validation rulesMigrated data may violate current rulesRe-enable, backfill violations
TriggersPrevent unintended automationRe-enable, run trigger logic post-load if needed
Flows / Process BuilderPrevent side effects (emails, updates)Re-enable, test with sample records
Duplicate rulesLegacy data may have intended duplicatesRe-enable, run dedup scan post-migration
Assignment rulesPrevent reassignment of migrated recordsRe-enable for new records only
Sharing recalculationDefer until all data is loadedTrigger manually after all loads complete

Re-enable everything

The most common post-migration failure is forgetting to re-enable automations. Maintain a checklist with a named owner for each item. Verify each one post-migration.

Post-Migration Validation

CheckMethodTarget
Record count reconciliationSource count vs SF count per object100% match
Relationship integrityQuery for orphan records (null lookups)0 orphans
Field accuracySpot-check random sample (50-100 records)100% field match
Report comparisonRun key reports, compare to sourceNumbers match
Process walkthroughExecute business processes with migrated dataAll flows work
Error rateErrors during load / total records< 1%

Reverse-Engineered Use Cases

Scenario 1: Healthcare — Multi-System Consolidation

Situation: Hospital consolidating 3 legacy EMR systems into Salesforce Health Cloud. 25M patient records across systems with 30% estimated duplicate rate. Strict HIPAA compliance. 48-hour max downtime window.

What you’d do: Phased migration by source system. Phase 1: Largest EMR (15M records) as the foundation. Phase 2: Second EMR (7M) with dedup against Phase 1 data. Phase 3: Third EMR (3M) with dedup against combined data. Use Informatica for complex transforms and built-in data quality/masking. External IDs prefixed by source (EMR1-, EMR2-, EMR3-). 3 full trial runs in a full-copy sandbox. Disable all triggers, validation rules, and flows during each phase.

Why: Phased approach because 25M records will not load in 48 hours as a big bang, and dedup accuracy improves when you establish a baseline first. Informatica handles the complex cross-system field mappings and HIPAA-compliant data masking. Source-prefixed External IDs enable traceability and re-runs.

Scenario 2: Financial Services — Weekend Big Bang

Situation: Mid-size bank migrating from a legacy CRM. 3M Account records, 8M Contacts, 5M Opportunities. Clear Friday-Sunday cutover window with legacy system freeze. Single source system.

What you’d do: Big bang migration. Friday evening: freeze legacy, export final delta. Saturday: full load using Data Loader CLI in sequence (Accounts > Contacts > Opportunities > Activities > Files). Sunday: validation, UAT, re-enable automations. Rollback plan: restore Friday backup if validation fails by Sunday noon. Use Bulk API 2.0 in parallel mode for Accounts (no skew risk), serial mode for Opportunities (shared Account parents).

Why: Volume fits in a weekend window. Single source system means no cross-system dedup needed. Big bang avoids the complexity of maintaining data consistency across two systems during a phased approach. Serial mode for Opportunities prevents lock contention on shared Account records.

Scenario 3: Manufacturing — Parallel Run for ERP Data

Situation: Manufacturer integrating SAP ERP with Salesforce. Product catalog (50K), pricing (200K entries), orders (10M historical, 5K daily new). Finance team requires 100% accuracy — no tolerance for pricing errors.

What you’d do: Parallel run for 4 weeks. Initial load of product catalog and pricing via MuleSoft (ongoing bidirectional sync). Historical orders: phased load over 2 weeks using Bulk API 2.0 (not needed for daily operations, so no urgency). New orders: entered in both systems during parallel period. Daily reconciliation report comparing order totals between SAP and Salesforce. Cut over to Salesforce-only after 4 weeks of matched reconciliation.

Why: Financial data with zero error tolerance demands parallel validation. MuleSoft justified because the ongoing bidirectional sync will be needed permanently, not just for migration. Historical order load is phased because it does not block the parallel run validation.

Sources