Migration: Quick Reference
Cheat sheet for data migration strategy at the CTA review board. For the full deep dive, see Data Migration and Data Quality & Governance.
Migration Phases at a Glance
graph LR
A[1. Plan] --> B[2. Profile]
B --> C[3. Design]
C --> D[4. Build]
D --> E[5. Test]
E --> F[6. Execute]
F --> G[7. Validate]
G -->|Issues| E
E -->|Redesign| C
style A fill:#4c6ef5,color:#fff
style G fill:#51cf66,color:#fff
| Phase | Key Deliverable | Critical Activity |
|---|---|---|
| 1. Plan | Scope document, success criteria | Inventory sources, define cutover window, plan rollback |
| 2. Profile | Data quality assessment | Record counts, completeness rates, duplicate detection, encoding issues |
| 3. Design | Field mappings, transformation rules | Load sequence, External ID strategy, error handling design |
| 4. Build | ETL jobs, migration scripts | Pre-migration scripts (disable automations), post-migration scripts |
| 5. Test | Trial migration results | 3+ full trial runs with production-equivalent volumes |
| 6. Execute | Cutover runbook | Freeze source, disable automations, load, delta migration, re-enable |
| 7. Validate | Reconciliation report | Record counts, relationship integrity, spot-checks, UAT |
The three-run rule
Run at least 3 full trial migrations. First finds structural issues. Second validates fixes. Third proves repeatability. Never go to production with fewer than 3 clean runs.
Tool Selection Matrix
| Tool | Volume | Complexity | Cost | Best For |
|---|---|---|---|---|
| Import Wizard | < 50K | Very Low | Free | One-time simple imports for admins |
| Data Loader | Millions | Low-Med | Free | Standard migrations, ad-hoc loads, CLI scripting |
| Bulk API 2.0 | 100M+ | Medium | Included | High-volume programmatic loads |
| Informatica Cloud | Unlimited | High | Licensed | Complex ETL, multiple sources, data governance |
| MuleSoft | Unlimited | High | Licensed | API-led integration + migration, ongoing sync |
| Jitterbit | Millions | Medium | Licensed | Mid-complexity ETL, good UI |
| Talend | Millions | Medium | Free/Licensed | Open-source ETL, budget-conscious |
| Dataloader.io | Millions | Low | Freemium | Cloud-native, drag-and-drop, scheduled loads |
Tool Decision Quick Test
graph TD
A[Migration Needed] --> B{Volume?}
B -->|< 50K records| C[Import Wizard]
B -->|50K - 5M| D{Complexity?}
B -->|5M+| E{Ongoing sync<br/>needed?}
D -->|Simple field mapping| F[Data Loader]
D -->|Complex transforms,<br/>multiple sources| G{Budget?}
E -->|Yes| H[MuleSoft or<br/>Informatica]
E -->|No - one time| I[Bulk API 2.0 +<br/>Data Loader CLI]
G -->|Licensed tool OK| H
G -->|Free/minimal| F
style C fill:#51cf66,color:#fff
style F fill:#4c6ef5,color:#fff
style H fill:#ffd43b,color:#333
style I fill:#4c6ef5,color:#fff
Bulk API 2.0 Limits
| Limit | Value |
|---|---|
| Records per 24-hour period | 100 million |
| Records per job | ~2.5 million (internal batching at 10K) |
| Concurrent jobs | 25 (shared with Bulk API 1.0) |
| Batches per 24-hour period | 15,000 (shared with Bulk API 1.0) |
| File format | CSV |
| PK chunking (queries) | Default 100K, max 250K per chunk |
| Processing mode | Parallel (default) or Serial |
Serial vs Parallel
Use serial mode when loading child records that share parent records (avoids UNABLE_TO_LOCK_ROW). Use parallel mode for independent records with no shared parents. Default is parallel.
Load Sequence — Mandatory Order
graph TD
A["1. Users & Roles"] --> B["2. Reference Data<br/>(picklists, record types,<br/>products, price books)"]
B --> C["3. Accounts"]
C --> D["4. Contacts"]
D --> E["5. Opportunities"]
E --> F["6. Line Items<br/>(OLI, QLI)"]
F --> G["7. Cases"]
G --> H["8. Activities<br/>(Tasks, Events)"]
H --> I["9. Junction Objects"]
I --> J["10. Files &<br/>Attachments"]
style A fill:#4c6ef5,color:#fff
style J fill:#51cf66,color:#fff
The rule: Parents before children. Masters before details. Both sides of junction before junction. Files last.
Sequence Cheat Sheet
| Order | Object Type | Why This Order |
|---|---|---|
| 1 | Users, Roles, Profiles | OwnerId depends on User records existing |
| 2 | Reference data | Products, Price Books, Record Types used by other objects |
| 3 | Accounts | Parent of Contacts, Opportunities, Cases |
| 4 | Contacts | Parent of Cases, Activities; child of Account |
| 5 | Opportunities | Child of Account; parent of Line Items |
| 6 | Line Items | Child of Opportunity + Product (both must exist) |
| 7 | Cases | Child of Account + Contact |
| 8 | Activities | WhoId/WhatId polymorphic — reference multiple object types |
| 9 | Junction objects | Both parent records must exist first |
| 10 | Files/Attachments | ContentVersion references parent record IDs |
External ID Strategy
| Benefit | How It Helps |
|---|---|
| Upsert capability | Insert or update in one operation — no duplicates |
| Relationship resolution | Reference parents by External ID, not Salesforce ID |
| Idempotent loads | Re-run a load safely without creating duplicates |
| Source traceability | Map back to original system record |
| Delta migration | Identify what changed since last load |
Design rules:
- Create External ID field on every migrated object
- Use source system primary key as the value
- Mark as unique + External ID (auto-indexed)
- Multi-source: prefix with source identifier (
SAP-12345,LEGACY-67890)
Cutover Strategies
| Strategy | Best When | Risk | Cost | Downtime |
|---|---|---|---|---|
| Big Bang | < 10M records, clear weekend window | High | Low | Single window |
| Phased | Large volume, multiple BUs, or high risk | Medium | Medium | Multiple small windows |
| Parallel Run | Regulatory, financial, zero-error tolerance | Low | High | None (dual entry) |
Decision Quick Test
| Factor | Big Bang | Phased | Parallel |
|---|---|---|---|
| Volume fits in one cutover window | X | ||
| Business cannot tolerate extended downtime | X | ||
| Multiple source systems with dependencies | X | ||
| Data errors have legal consequences | X | ||
| Budget-constrained | X | ||
| Multiple BUs with different readiness | X |
Pre-Migration Checklist
What to Disable Before Loading
| Item | Why | Re-enable Action |
|---|---|---|
| Validation rules | Migrated data may violate current rules | Re-enable, backfill violations |
| Triggers | Prevent unintended automation | Re-enable, run trigger logic post-load if needed |
| Flows / Process Builder | Prevent side effects (emails, updates) | Re-enable, test with sample records |
| Duplicate rules | Legacy data may have intended duplicates | Re-enable, run dedup scan post-migration |
| Assignment rules | Prevent reassignment of migrated records | Re-enable for new records only |
| Sharing recalculation | Defer until all data is loaded | Trigger manually after all loads complete |
Re-enable everything
The most common post-migration failure is forgetting to re-enable automations. Maintain a checklist with a named owner for each item. Verify each one post-migration.
Post-Migration Validation
| Check | Method | Target |
|---|---|---|
| Record count reconciliation | Source count vs SF count per object | 100% match |
| Relationship integrity | Query for orphan records (null lookups) | 0 orphans |
| Field accuracy | Spot-check random sample (50-100 records) | 100% field match |
| Report comparison | Run key reports, compare to source | Numbers match |
| Process walkthrough | Execute business processes with migrated data | All flows work |
| Error rate | Errors during load / total records | < 1% |
Reverse-Engineered Use Cases
Scenario 1: Healthcare — Multi-System Consolidation
Situation: Hospital consolidating 3 legacy EMR systems into Salesforce Health Cloud. 25M patient records across systems with 30% estimated duplicate rate. Strict HIPAA compliance. 48-hour max downtime window.
What you’d do: Phased migration by source system. Phase 1: Largest EMR (15M records) as the foundation. Phase 2: Second EMR (7M) with dedup against Phase 1 data. Phase 3: Third EMR (3M) with dedup against combined data. Use Informatica for complex transforms and built-in data quality/masking. External IDs prefixed by source (EMR1-, EMR2-, EMR3-). 3 full trial runs in a full-copy sandbox. Disable all triggers, validation rules, and flows during each phase.
Why: Phased approach because 25M records will not load in 48 hours as a big bang, and dedup accuracy improves when you establish a baseline first. Informatica handles the complex cross-system field mappings and HIPAA-compliant data masking. Source-prefixed External IDs enable traceability and re-runs.
Scenario 2: Financial Services — Weekend Big Bang
Situation: Mid-size bank migrating from a legacy CRM. 3M Account records, 8M Contacts, 5M Opportunities. Clear Friday-Sunday cutover window with legacy system freeze. Single source system.
What you’d do: Big bang migration. Friday evening: freeze legacy, export final delta. Saturday: full load using Data Loader CLI in sequence (Accounts > Contacts > Opportunities > Activities > Files). Sunday: validation, UAT, re-enable automations. Rollback plan: restore Friday backup if validation fails by Sunday noon. Use Bulk API 2.0 in parallel mode for Accounts (no skew risk), serial mode for Opportunities (shared Account parents).
Why: Volume fits in a weekend window. Single source system means no cross-system dedup needed. Big bang avoids the complexity of maintaining data consistency across two systems during a phased approach. Serial mode for Opportunities prevents lock contention on shared Account records.
Scenario 3: Manufacturing — Parallel Run for ERP Data
Situation: Manufacturer integrating SAP ERP with Salesforce. Product catalog (50K), pricing (200K entries), orders (10M historical, 5K daily new). Finance team requires 100% accuracy — no tolerance for pricing errors.
What you’d do: Parallel run for 4 weeks. Initial load of product catalog and pricing via MuleSoft (ongoing bidirectional sync). Historical orders: phased load over 2 weeks using Bulk API 2.0 (not needed for daily operations, so no urgency). New orders: entered in both systems during parallel period. Daily reconciliation report comparing order totals between SAP and Salesforce. Cut over to Salesforce-only after 4 weeks of matched reconciliation.
Why: Financial data with zero error tolerance demands parallel validation. MuleSoft justified because the ongoing bidirectional sync will be needed permanently, not just for migration. Historical order load is phased because it does not block the parallel run validation.