Migration: Quick Reference

Cheat sheet for data migration strategy at the CTA review board. For the full deep dive, see Data Migration and Data Quality & Governance.

Migration Phases at a Glance

Figure 1. The feedback loops are intentional: validation failures loop back to Test for re-runs, and structural issues discovered in testing loop back to Design. Running at least three full trial migrations in the Test phase before Execute is the standard CTA guidance for proving repeatability.

Phase	Key Deliverable	Critical Activity
1. Plan	Scope document, success criteria	Inventory sources, define cutover window, plan rollback
2. Profile	Data quality assessment	Record counts, completeness rates, duplicate detection, encoding issues
3. Design	Field mappings, transformation rules	Load sequence, External ID strategy, error handling design
4. Build	ETL jobs, migration scripts	Pre-migration scripts (disable automations), post-migration scripts
5. Test	Trial migration results	3+ full trial runs with production-equivalent volumes
6. Execute	Cutover runbook	Freeze source, disable automations, load, delta migration, re-enable
7. Validate	Reconciliation report	Record counts, relationship integrity, spot-checks, UAT

The three-run rule

Run at least 3 full trial migrations. First finds structural issues. Second validates fixes. Third proves repeatability. Never go to production with fewer than 3 clean runs.

Tool Selection Matrix

Tool	Volume	Complexity	Cost	Best For
Import Wizard	< 50K	Very Low	Free	One-time simple imports for admins
Data Loader	Millions	Low-Med	Free	Standard migrations, ad-hoc loads, CLI scripting
Bulk API 2.0	100M+	Medium	Included	High-volume programmatic loads
Informatica Cloud	Unlimited	High	Licensed	Complex ETL, multiple sources, data governance
MuleSoft	Unlimited	High	Licensed	API-led integration + migration, ongoing sync
Jitterbit	Millions	Medium	Licensed	Mid-complexity ETL, good UI
Talend	Millions	Medium	Free/Licensed	Open-source ETL, budget-conscious
Dataloader.io	Millions	Low	Freemium	Cloud-native, drag-and-drop, scheduled loads

Tool Decision Quick Test

Figure 2. Volume sets the first boundary, but complexity and ongoing sync needs are often the decisive factors. If MuleSoft is already in the architecture for ongoing integration, it is the natural migration tool choice because the investment is amortized across both migration and steady-state operations.

Bulk API 2.0 Limits

Limit	Value
Records per 24-hour period	100 million
Records per job	~2.5 million (internal batching at 10K)
Concurrent jobs	25 (shared with Bulk API 1.0)
Batches per 24-hour period	15,000 (shared with Bulk API 1.0)
File format	CSV
PK chunking (queries)	Default 100K, max 250K per chunk
Processing mode	Parallel (default) or Serial

Serial vs Parallel

Use serial mode when loading child records that share parent records (avoids UNABLE_TO_LOCK_ROW). Use parallel mode for independent records with no shared parents. Default is parallel.

Load Sequence - Mandatory Order

Figure 3. The parent-first rule is absolute: any attempt to load a child record before its parent exists will fail with a lookup resolution error. Files and attachments go last because ContentVersion references parent record IDs that must already exist in the target org.

The rule: Parents before children. Masters before details. Both sides of junction before junction. Files last.

Sequence Cheat Sheet

Order	Object Type	Why This Order
1	Users, Roles, Profiles	OwnerId depends on User records existing
2	Reference data	Products, Price Books, Record Types used by other objects
3	Accounts	Parent of Contacts, Opportunities, Cases
4	Contacts	Parent of Cases, Activities; child of Account
5	Opportunities	Child of Account; parent of Line Items
6	Line Items	Child of Opportunity + Product (both must exist)
7	Cases	Child of Account + Contact
8	Activities	WhoId/WhatId polymorphic - reference multiple object types
9	Junction objects	Both parent records must exist first
10	Files/Attachments	ContentVersion references parent record IDs

External ID Strategy

Benefit	How It Helps
Upsert capability	Insert or update in one operation - no duplicates
Relationship resolution	Reference parents by External ID, not Salesforce ID
Idempotent loads	Re-run a load safely without creating duplicates
Source traceability	Map back to original system record
Delta migration	Identify what changed since last load

Design rules:

Create External ID field on every migrated object
Use source system primary key as the value
Mark as unique + External ID (auto-indexed)
Multi-source: prefix with source identifier (SAP-12345, LEGACY-67890)

Cutover Strategies

Strategy	Best When	Risk	Cost	Downtime
Big Bang	< 10M records, clear weekend window	High	Low	Single window
Phased	Large volume, multiple BUs, or high risk	Medium	Medium	Multiple small windows
Parallel Run	Regulatory, financial, zero-error tolerance	Low	High	None (dual entry)

Decision Quick Test

Factor	Big Bang	Phased	Parallel
Volume fits in one cutover window	X
Business cannot tolerate extended downtime		X
Multiple source systems with dependencies			X
Data errors have legal consequences			X
Budget-constrained	X
Multiple BUs with different readiness		X

Pre-Migration Checklist

What to Disable Before Loading

Item	Why	Re-enable Action
Validation rules	Migrated data may violate current rules	Re-enable, backfill violations
Triggers	Prevent unintended automation	Re-enable, run trigger logic post-load if needed
Flows	Prevent side effects (emails, updates)	Re-enable, test with sample records
Duplicate rules	Legacy data may have intended duplicates	Re-enable, run dedup scan post-migration
Assignment rules	Prevent reassignment of migrated records	Re-enable for new records only
Sharing recalculation	Defer until all data is loaded	Trigger manually after all loads complete

Re-enable everything

The most common post-migration failure: forgetting to re-enable automations. Keep a checklist with a named owner for each item. Verify every one after migration.

Post-Migration Validation

Check	Method	Target
Record count reconciliation	Source count vs SF count per object	100% match
Relationship integrity	Query for orphan records (null lookups)	0 orphans
Field accuracy	Spot-check random sample (50-100 records)	100% field match
Report comparison	Run key reports, compare to source	Numbers match
Process walkthrough	Execute business processes with migrated data	All flows work
Error rate	Errors during load / total records	< 1%

Reverse-Engineered Use Cases

Scenario 1: Healthcare - Multi-System Consolidation

Situation: Hospital consolidating 3 legacy EMR systems into Salesforce Health Cloud. 25M patient records across systems with 30% estimated duplicate rate. Strict HIPAA compliance. 48-hour max downtime window.

What you’d do: Phased migration by source system. Phase 1: Largest EMR (15M records) as the foundation. Phase 2: Second EMR (7M) with dedup against Phase 1 data. Phase 3: Third EMR (3M) with dedup against combined data. Use Informatica for complex transforms and built-in data quality/masking. External IDs prefixed by source (EMR1-, EMR2-, EMR3-). 3 full trial runs in a full-copy sandbox. Disable all triggers, validation rules, and flows during each phase.

Why: Phased approach because 25M records will not load in 48 hours as a big bang, and dedup accuracy improves when you establish a baseline first. Informatica handles the complex cross-system field mappings and HIPAA-compliant data masking. Source-prefixed External IDs enable traceability and re-runs.

Scenario 2: Financial Services - Weekend Big Bang

Situation: Mid-size bank migrating from a legacy CRM. 3M Account records, 8M Contacts, 5M Opportunities. Clear Friday-Sunday cutover window with legacy system freeze. Single source system.

What you’d do: Big bang migration. Friday evening: freeze legacy, export final delta. Saturday: full load using Data Loader CLI in sequence (Accounts > Contacts > Opportunities > Activities > Files). Sunday: validation, UAT, re-enable automations. Rollback plan: restore Friday backup if validation fails by Sunday noon. Use Bulk API 2.0 in parallel mode for Accounts (no skew risk), serial mode for Opportunities (shared Account parents).

Why: Volume fits in a weekend window. Single source system means no cross-system dedup needed. Big bang avoids the complexity of maintaining data consistency across two systems during a phased approach. Serial mode for Opportunities prevents lock contention on shared Account records.

Scenario 3: Manufacturing - Parallel Run for ERP Data

Situation: Manufacturer integrating SAP ERP with Salesforce. Product catalog (50K), pricing (200K entries), orders (10M historical, 5K daily new). Finance team requires 100% accuracy - no tolerance for pricing errors.

What you’d do: Parallel run for 4 weeks. Initial load of product catalog and pricing via MuleSoft (ongoing bidirectional sync). Historical orders: phased load over 2 weeks using Bulk API 2.0 (not needed for daily operations, so no urgency). New orders: entered in both systems during parallel period. Daily reconciliation report comparing order totals between SAP and Salesforce. Cut over to Salesforce-only after 4 weeks of matched reconciliation.

Why: Financial data with zero error tolerance demands parallel validation. MuleSoft justified because the ongoing bidirectional sync will be needed permanently, not just for migration. Historical order load is phased because it does not block the parallel run validation.

Sources

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.