Skip to content

Migration: Quick Reference

Cheat sheet for data migration strategy at the CTA review board. For the full deep dive, see Data Migration and Data Quality & Governance.

Migration Phases at a Glance

Left-to-right flow of the seven migration phases from Plan through Validate, with feedback loops from Validate back to Test and from Test back to Design for iterative correction.
Figure 1. The feedback loops are intentional: validation failures loop back to Test for re-runs, and structural issues discovered in testing loop back to Design. Running at least three full trial migrations in the Test phase before Execute is the standard CTA guidance for proving repeatability.
PhaseKey DeliverableCritical Activity
1. PlanScope document, success criteriaInventory sources, define cutover window, plan rollback
2. ProfileData quality assessmentRecord counts, completeness rates, duplicate detection, encoding issues
3. DesignField mappings, transformation rulesLoad sequence, External ID strategy, error handling design
4. BuildETL jobs, migration scriptsPre-migration scripts (disable automations), post-migration scripts
5. TestTrial migration results3+ full trial runs with production-equivalent volumes
6. ExecuteCutover runbookFreeze source, disable automations, load, delta migration, re-enable
7. ValidateReconciliation reportRecord counts, relationship integrity, spot-checks, UAT

The three-run rule

Run at least 3 full trial migrations. First finds structural issues. Second validates fixes. Third proves repeatability. Never go to production with fewer than 3 clean runs.

Tool Selection Matrix

ToolVolumeComplexityCostBest For
Import Wizard< 50KVery LowFreeOne-time simple imports for admins
Data LoaderMillionsLow-MedFreeStandard migrations, ad-hoc loads, CLI scripting
Bulk API 2.0100M+MediumIncludedHigh-volume programmatic loads
Informatica CloudUnlimitedHighLicensedComplex ETL, multiple sources, data governance
MuleSoftUnlimitedHighLicensedAPI-led integration + migration, ongoing sync
JitterbitMillionsMediumLicensedMid-complexity ETL, good UI
TalendMillionsMediumFree/LicensedOpen-source ETL, budget-conscious
Dataloader.ioMillionsLowFreemiumCloud-native, drag-and-drop, scheduled loads

Tool Decision Quick Test

Decision tree selecting the right migration tool based on record volume, transformation complexity, and whether ongoing sync is required after the initial load.
Figure 2. Volume sets the first boundary, but complexity and ongoing sync needs are often the decisive factors. If MuleSoft is already in the architecture for ongoing integration, it is the natural migration tool choice because the investment is amortized across both migration and steady-state operations.

Bulk API 2.0 Limits

LimitValue
Records per 24-hour period100 million
Records per job~2.5 million (internal batching at 10K)
Concurrent jobs25 (shared with Bulk API 1.0)
Batches per 24-hour period15,000 (shared with Bulk API 1.0)
File formatCSV
PK chunking (queries)Default 100K, max 250K per chunk
Processing modeParallel (default) or Serial

Serial vs Parallel

Use serial mode when loading child records that share parent records (avoids UNABLE_TO_LOCK_ROW). Use parallel mode for independent records with no shared parents. Default is parallel.

Load Sequence - Mandatory Order

Mandatory top-to-bottom load order for Salesforce data migration, placing parent objects before children and files last to prevent lookup resolution failures.
Figure 3. The parent-first rule is absolute: any attempt to load a child record before its parent exists will fail with a lookup resolution error. Files and attachments go last because ContentVersion references parent record IDs that must already exist in the target org.

The rule: Parents before children. Masters before details. Both sides of junction before junction. Files last.

Sequence Cheat Sheet

OrderObject TypeWhy This Order
1Users, Roles, ProfilesOwnerId depends on User records existing
2Reference dataProducts, Price Books, Record Types used by other objects
3AccountsParent of Contacts, Opportunities, Cases
4ContactsParent of Cases, Activities; child of Account
5OpportunitiesChild of Account; parent of Line Items
6Line ItemsChild of Opportunity + Product (both must exist)
7CasesChild of Account + Contact
8ActivitiesWhoId/WhatId polymorphic - reference multiple object types
9Junction objectsBoth parent records must exist first
10Files/AttachmentsContentVersion references parent record IDs

External ID Strategy

BenefitHow It Helps
Upsert capabilityInsert or update in one operation - no duplicates
Relationship resolutionReference parents by External ID, not Salesforce ID
Idempotent loadsRe-run a load safely without creating duplicates
Source traceabilityMap back to original system record
Delta migrationIdentify what changed since last load

Design rules:

  • Create External ID field on every migrated object
  • Use source system primary key as the value
  • Mark as unique + External ID (auto-indexed)
  • Multi-source: prefix with source identifier (SAP-12345, LEGACY-67890)

Cutover Strategies

StrategyBest WhenRiskCostDowntime
Big Bang< 10M records, clear weekend windowHighLowSingle window
PhasedLarge volume, multiple BUs, or high riskMediumMediumMultiple small windows
Parallel RunRegulatory, financial, zero-error toleranceLowHighNone (dual entry)

Decision Quick Test

FactorBig BangPhasedParallel
Volume fits in one cutover windowX
Business cannot tolerate extended downtimeX
Multiple source systems with dependenciesX
Data errors have legal consequencesX
Budget-constrainedX
Multiple BUs with different readinessX

Pre-Migration Checklist

What to Disable Before Loading

ItemWhyRe-enable Action
Validation rulesMigrated data may violate current rulesRe-enable, backfill violations
TriggersPrevent unintended automationRe-enable, run trigger logic post-load if needed
FlowsPrevent side effects (emails, updates)Re-enable, test with sample records
Duplicate rulesLegacy data may have intended duplicatesRe-enable, run dedup scan post-migration
Assignment rulesPrevent reassignment of migrated recordsRe-enable for new records only
Sharing recalculationDefer until all data is loadedTrigger manually after all loads complete

Re-enable everything

The most common post-migration failure: forgetting to re-enable automations. Keep a checklist with a named owner for each item. Verify every one after migration.

Post-Migration Validation

CheckMethodTarget
Record count reconciliationSource count vs SF count per object100% match
Relationship integrityQuery for orphan records (null lookups)0 orphans
Field accuracySpot-check random sample (50-100 records)100% field match
Report comparisonRun key reports, compare to sourceNumbers match
Process walkthroughExecute business processes with migrated dataAll flows work
Error rateErrors during load / total records< 1%

Reverse-Engineered Use Cases

Scenario 1: Healthcare - Multi-System Consolidation

Situation: Hospital consolidating 3 legacy EMR systems into Salesforce Health Cloud. 25M patient records across systems with 30% estimated duplicate rate. Strict HIPAA compliance. 48-hour max downtime window.

What you’d do: Phased migration by source system. Phase 1: Largest EMR (15M records) as the foundation. Phase 2: Second EMR (7M) with dedup against Phase 1 data. Phase 3: Third EMR (3M) with dedup against combined data. Use Informatica for complex transforms and built-in data quality/masking. External IDs prefixed by source (EMR1-, EMR2-, EMR3-). 3 full trial runs in a full-copy sandbox. Disable all triggers, validation rules, and flows during each phase.

Why: Phased approach because 25M records will not load in 48 hours as a big bang, and dedup accuracy improves when you establish a baseline first. Informatica handles the complex cross-system field mappings and HIPAA-compliant data masking. Source-prefixed External IDs enable traceability and re-runs.

Scenario 2: Financial Services - Weekend Big Bang

Situation: Mid-size bank migrating from a legacy CRM. 3M Account records, 8M Contacts, 5M Opportunities. Clear Friday-Sunday cutover window with legacy system freeze. Single source system.

What you’d do: Big bang migration. Friday evening: freeze legacy, export final delta. Saturday: full load using Data Loader CLI in sequence (Accounts > Contacts > Opportunities > Activities > Files). Sunday: validation, UAT, re-enable automations. Rollback plan: restore Friday backup if validation fails by Sunday noon. Use Bulk API 2.0 in parallel mode for Accounts (no skew risk), serial mode for Opportunities (shared Account parents).

Why: Volume fits in a weekend window. Single source system means no cross-system dedup needed. Big bang avoids the complexity of maintaining data consistency across two systems during a phased approach. Serial mode for Opportunities prevents lock contention on shared Account records.

Scenario 3: Manufacturing - Parallel Run for ERP Data

Situation: Manufacturer integrating SAP ERP with Salesforce. Product catalog (50K), pricing (200K entries), orders (10M historical, 5K daily new). Finance team requires 100% accuracy - no tolerance for pricing errors.

What you’d do: Parallel run for 4 weeks. Initial load of product catalog and pricing via MuleSoft (ongoing bidirectional sync). Historical orders: phased load over 2 weeks using Bulk API 2.0 (not needed for daily operations, so no urgency). New orders: entered in both systems during parallel period. Daily reconciliation report comparing order totals between SAP and Salesforce. Cut over to Salesforce-only after 4 weeks of matched reconciliation.

Why: Financial data with zero error tolerance demands parallel validation. MuleSoft justified because the ongoing bidirectional sync will be needed permanently, not just for migration. Historical order load is phased because it does not block the parallel run validation.

Sources

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.