Best Practices & Anti-Patterns

This page consolidates data architecture best practices and anti-patterns organized by topic area. Each best practice is paired with the anti-pattern it prevents, making it useful as both a design guide and a review checklist.

Data Architecture Decision Flow

flowchart TD
    A[New Data Requirement] --> B{Standard object exists?}
    B -->|Yes| C{Meets 80%+ needs?}
    B -->|No| D[Design custom object]
    C -->|Yes| E[Use standard object]
    C -->|No| D
    D --> F{Relationship type?}
    E --> F
    F --> G{Child can exist independently?}
    G -->|No| H[Master-Detail]
    G -->|Yes| I{Need roll-ups or sharing inheritance?}
    I -->|Yes| H
    I -->|No| J[Lookup]
    H --> K{Data volume > 500K?}
    J --> K
    K -->|Yes| L[Apply LDV strategy]
    K -->|No| M[Standard indexing]
    L --> N[Define archival policy]
    M --> N
    N --> O[Document in ERD]

Data Modeling Best Practices

Best Practices

1. Start with standard objects, justify custom

Always evaluate whether a standard object can serve the need before creating a custom object. Standard objects come with features that take months to replicate. Document why you rejected the standard option.

2. Choose relationships deliberately

Never default to lookup because it is easier. Evaluate each relationship against the five criteria: child independence, roll-up needs, sharing inheritance, reparenting, and cascade delete. See Decision Guides for the full flowchart.

3. Design for sharing from the start

The relationship type determines the sharing model. If child records must inherit parent sharing, master-detail is required. Changing relationship types after data is populated is painful and sometimes impossible (lookup to MD requires no nulls).

4. Use External IDs on every migrated object

External IDs enable upserts, prevent duplicates during re-runs, and provide traceability back to the source system. There is no reason not to create them.

5. Plan record types before data exists

Adding record types after hundreds of thousands of records exist requires backfilling the RecordTypeId on every record. Design record types during the data model phase, not during UAT.

6. Keep objects focused

Each object should represent one business concept. If an object has 400+ fields spanning 5 different business processes, it is a God Object. Split it.

7. Document your ERD

Maintain a current entity-relationship diagram. It is the single most referenced artifact in a CTA review board. Keep it updated as the model evolves.

Anti-Patterns

God Object

Stuffing unrelated data into Account (or any single object) because “it is related to the customer.” Result: 500+ fields, 12 record types, unmaintainable validation rules, deployment conflicts between teams, and page load times that make users switch to spreadsheets.

Lookup when Master-Detail needed

Using lookup because “we might need to reparent” when the business process never reparents. Result: No native roll-ups (building fragile trigger-based alternatives), no sharing inheritance (building manual sharing rules), no cascade delete (orphan records accumulating).

Over-engineering with junction objects

Creating junction objects for relationships that are actually one-to-many. Not every relationship is many-to-many. A junction object adds query complexity and an extra object to maintain.

LDV Best Practices

Best Practices

1. Monitor data growth proactively

Track record counts per object monthly. Set alerts at 500K, 1M, and 5M thresholds. By the time users complain about performance, you are already 6 months behind on optimization.

2. Design queries for selectivity

Every query on an LDV object should be selective. Use indexed fields in WHERE clauses. Check the Query Plan tool during development, not after deployment.

3. Request custom indexes early

Custom indexes require a Salesforce Support case and take time. Identify candidates during design and submit requests before the data reaches LDV thresholds.

4. Address data skew before it hurts

Identify potential skew during data modeling: Will one Account have 100K child records? Will a single queue own 500K Cases? Design mitigation before the skew manifests as record locking and sharing timeouts.

5. Implement archival before reaching limits

Archival is a proactive strategy, not an emergency response. Define retention policies during design and implement archival processes before objects hit LDV thresholds.

6. Use PK chunking for bulk extracts

When extracting large datasets via Bulk API, enable PK chunking to avoid timeouts. This splits the query into smaller chunks based on record ID ranges.

7. Right-size Batch Apex scope

Default scope of 200 is not always appropriate. Reduce scope for complex processing with many DML operations. Increase scope (up to 2,000) for simple, read-heavy jobs.

Anti-Patterns

Full table scans in production

Deploying list views, reports, or SOQL queries without indexed filters on objects with millions of records. The query works in the sandbox (10K records), and times out in production (5M records). Always test with production-representative data volumes.

Ignoring data skew

Allowing a single Account to accumulate 200K Contacts, or a single user to own 1M records, without mitigation. Result: Record locking, sharing recalculation timeouts, and degraded performance for the entire org (not just the skewed records).

No archival strategy

Keeping every record forever because “we might need it.” Data grows, performance degrades, storage costs increase, and users lose trust in the platform. Define retention policies and implement them.

Migration Best Practices

flowchart LR
    subgraph Preparation
        A[Profile Source Data] --> B[Design Field Mappings]
        B --> C[Disable Automations]
    end
    subgraph Trial Runs
        C --> D[Trial 1: Structure]
        D --> E[Trial 2: Validate Fixes]
        E --> F[Trial 3: Prove Repeatability]
    end
    subgraph Cutover
        F --> G[Freeze Source Systems]
        G --> H[Load Parents First]
        H --> I[Load Children]
        I --> J[Load Junctions]
        J --> K[Re-enable Automations]
    end
    subgraph Validation
        K --> L[Technical Checks]
        L --> M[Business Spot-Checks]
        M --> N{Pass?}
        N -->|No| O[Execute Rollback]
        N -->|Yes| P[Go Live]
    end

Best Practices

1. Run at least three trial migrations

The first trial finds structural issues. The second validates fixes. The third proves repeatability. Production cutover without three clean trials is reckless.

2. Profile data before mapping

Understand completeness, accuracy, duplicates, and format issues in source data before designing field mappings. Profiling reveals problems that would otherwise surface during cutover.

3. Sequence loads correctly

Always load parent objects before child objects. Use External IDs for relationship resolution. Load junction objects last. See Data Migration for the full sequence.

4. Disable automations during load

Triggers, workflows, flows, validation rules, and assignment rules should be disabled during migration loads. They were designed for interactive use, not bulk loading. Re-enable them with a checklist after migration.

5. Plan for rollback

Every migration should have a documented rollback procedure. What happens if the migration fails at step 7 of 12? Can you restore the system to its pre-migration state?

6. Validate with business stakeholders

Technical validation (record counts, relationship integrity) is necessary but not sufficient. Business users must spot-check their own data and confirm it is correct.

7. Freeze source systems during cutover

If users are still entering data in the legacy system during migration, you need a delta migration strategy. The cleanest approach is to freeze the source system during cutover.

Anti-Patterns

Testing with subsets

Loading 10K records in a trial migration when production has 10M. Everything works at 10K. The production load takes 40 hours instead of the planned 8. Use production-representative volumes for trials.

No rollback plan

“The migration will work because we tested it.” Until it does not. Network issues, API limits, data quality surprises — production always has surprises. Have a rollback procedure.

Forgetting to re-enable automations

Disabling triggers and validation rules for migration, then forgetting to re-enable them. Users create records without validation for days before someone notices. Use a checklist with assigned owners.

Data Quality Best Practices

Best Practices

1. Prevent duplicates at the point of entry

Configure matching rules and duplicate rules on key objects (Account, Contact, Lead). Alert users when creating potential duplicates. Gradually tighten from alert to block as matching rules prove accurate.

2. Define data entry standards

Required fields, picklist values, naming conventions, and format standards should be documented and enforced through validation rules, not just training.

3. Assign data stewards

Every business-critical object should have a named data steward responsible for monitoring quality, resolving issues, and approving changes to data standards.

4. Build quality dashboards

Create dashboards that show completeness rates, duplicate counts, stale record percentages, and orphan record counts. Review them monthly with data stewards.

5. Address quality at integration boundaries

Every integration point is a data quality risk. Inbound integrations should validate data before insert. Outbound integrations should handle dirty data gracefully.

6. Regular deduplication scans

Run batch deduplication scans weekly or monthly, even with real-time duplicate prevention. Duplicates still slip through (batch imports, API creates, edge cases).

7. Implement data lifecycle management

Define what happens to data as it ages. Active data is maintained. Stale data is reviewed. Aged data is archived. Expired data is deleted. See Data Quality & Governance.

Anti-Patterns

Clean up later

Loading dirty data with the plan to “clean it up after go-live.” Post-go-live teams are busy with support tickets, training, and enhancement requests. Data cleanup never gets prioritized. Clean before migration.

No data stewardship

Having no one accountable for data quality. Everyone assumes someone else is handling it. Quality degrades over time as users find workarounds and integrations inject bad data.

Over-trusting source systems

Assuming source system data is accurate because “it has been in production for 10 years.” Legacy systems accumulate technical debt in data just like in code. Profile everything.

Governance Best Practices

Best Practices

1. Classify data by sensitivity

Not all data is equal. Classify data into tiers (Public, Internal, Confidential, Restricted) and apply appropriate controls (encryption, FLS, sharing) based on classification.

2. Document retention policies

For every object, document how long records should be retained, where they should be archived, and when they should be deleted. Align with legal and compliance requirements.

3. Implement Field Audit Trail for compliance

Standard field history tracking retains data for 18-24 months. If compliance requires longer retention, use Salesforce Shield Field Audit Trail (up to 10 years) or export to an external audit system.

4. Design for GDPR from day one

If any data subjects are in the EU (or subject to similar privacy laws), design the ability to find, export, rectify, and delete a data subject’s data. Retrofitting GDPR compliance is far more expensive than building it in.

5. Separate data governance from development governance

Data governance (quality, retention, stewardship) is a business function. Development governance (CI/CD, deployment, testing) is a technical function. They need different processes, roles, and cadences.

6. Audit access patterns

Use Shield Event Monitoring to understand who accesses what data, how often, and through which channels. This informs both security design and data lifecycle decisions.

Anti-Patterns

Encrypt everything

Applying Shield Platform Encryption to every field “for security.” Encryption disables sorting, some filtering, formula references, and other features. Encrypt based on data classification, not paranoia.

No data lifecycle

Keeping all data forever with no archival or deletion strategy. Storage costs grow, query performance degrades, and compliance risk increases (you cannot comply with “right to erasure” if you do not know where data is).

Shadow IT data stores

Users maintaining spreadsheets, personal databases, or unauthorized cloud apps because the Salesforce data model does not meet their needs. The CTA solution must address these shadow systems proactively.

Checklist: Data Architecture Review

Use this checklist before presenting a data architecture at the CTA review board:

Sources

Salesforce Architects: Data 360 Architecture
Salesforce Developer: Best Practices for Deployments with Large Data Volumes
Salesforce Help: Duplicate Rules Overview
Salesforce Help: Best Practices When You Migrate Data
Salesforce Help: Salesforce Shield
CTA Study Guide: Data Domain
DAMA DMBOK: Data Management Body of Knowledge