Skip to content

LDV: Quick Reference

Cheat sheet for Large Data Volume strategy at the CTA review board. For the full deep dive, see Large Data Volumes and External Data.

LDV Threshold Table

Records per ObjectClassificationAction Required
< 500KNormalStandard best practices
500K — 1MApproaching LDVMonitor query plans, plan indexes
1M — 10MLDVCustom indexes, selective queries mandatory
10M — 100MHigh LDVSkinny tables, archival strategy, batch optimization
100M+Extreme LDVBig Objects, external storage, or Data Cloud

Volume is relative

A 500K-record object with unfiltered list views performs worse than a 5M-record object with highly selective indexed queries. Always evaluate volume alongside access patterns.

Indexing Cheat Sheet

Auto-Indexed Fields (Every Object)

Id, Name, OwnerId, CreatedDate, LastModifiedDate, SystemModstamp, RecordTypeId, all lookup/MD fields, all unique/External ID fields.

Custom Index Types

TypeDescriptionRequest Via
Single-columnOne custom fieldSalesforce Support case
Two-column (composite)Two fields togetherSalesforce Support case
Skinny tableDenormalized subset of fields in a narrow tableSalesforce Support case

Skinny Tables — Key Facts

  • Created by Salesforce Support — you cannot self-provision
  • Subset of fields from one object in a separate narrow table
  • Query optimizer uses them transparently (invisible to developer)
  • Exclude soft-deleted records (IsDeleted = true)
  • Must contact Support when adding/removing fields
  • Work on both standard and custom objects
  • Best ROI on objects with 100+ fields where queries use a small subset

Skinny table governance

Schema changes require a Support case. Factor this operational dependency into your governance model. Do not promise skinny tables unless you have a Support escalation path.

Query Selectivity Thresholds

graph TD
    A[SOQL Query] --> B{Filter field<br/>indexed?}
    B -->|No| C[Full Table Scan]
    B -->|Yes| D{Threshold check}
    D --> E{Standard Index}
    D --> F{Custom Index}
    E -->|Returns < 30% of first 1M,<br/>15% beyond. Max 1M rows| G[INDEX USED]
    F -->|Returns < 10% of first 1M,<br/>5% beyond. Max 333,333 rows| G
    E -->|Above threshold| C
    F -->|Above threshold| C
    C --> H{Object size?}
    H -->|Small < 500K| I[OK performance]
    H -->|Large LDV| J[SLOW / TIMEOUT]

    style G fill:#51cf66,color:#fff
    style J fill:#ff6b6b,color:#fff

Selectivity Rules Summary

ConditionThreshold
Standard index selective30% of first 1M records, then 15% beyond 1M. Max 1,000,000 rows
Custom index selective10% of first 1M records, then 5% beyond 1M. Max 333,333 rows
AND filtersEach filter independently selective OR combined selective
OR filtersEvery filter must be selective (one bad filter ruins it)
NOT / != operatorsGenerally non-selective — avoid on LDV
NULL checksNon-selective — WHERE Field = null almost never uses index
LIKE with leading wildcardNon-selective — LIKE '%text' cannot use index

Board technique

State specific numbers: “The custom index on External_ID__c will be selective because it targets fewer than 10% of the first million records, well under the 333,333-row maximum.” This demonstrates practical depth.

Data Skew Quick Reference

Types and Thresholds

Skew TypeTriggerCaution ZoneDanger Zone
Account skewMany children on one Account> 10,000 children> 100,000 children
Ownership skewMany records owned by one user/queue> 10,000 records> 100,000 records
Lookup skewMany records referencing same parent via lookup> 10,000 references> 100,000 references

Skew Mitigation Decision Tree

graph TD
    A[Skew Detected] --> B{Type?}
    B --> C[Account Skew]
    B --> D[Ownership Skew]
    B --> E[Lookup Skew]

    C --> C1[Split mega-accounts<br/>by region/division]
    C --> C2[Switch to lookup to<br/>avoid sharing cascade]
    C --> C3[Archive old child records]

    D --> D1[Distribute across<br/>multiple users/queues]
    D --> D2[Ensure skewed owner<br/>has no role in hierarchy]
    D --> D3[Use system integration<br/>user for API records]

    E --> E1[Eliminate catch-all<br/>parent records]
    E --> E2[Remove sparse<br/>optional lookups]
    E --> E3[Use deferred sharing<br/>calculation for batch ops]

Skew affects the entire org

A single mega-account with 200K children causes sharing recalculation delays and record locking that degrades performance for ALL users, not just those accessing the skewed records.

Archival Strategy Quick Reference

Storage OptionBest ForQuery MethodLimits
Big ObjectsAudit trails, historical transactions, IoTStandard SOQL on indexed fields (Async SOQL retired Summer ‘25)100 per org, immutable indexes
External Storage + ConnectMassive datasets needing SF visibilityOData via Salesforce Connect2,000 rows per query, callout limits
Data CloudUnified analytics, identity resolutionSQL-like on data lakeSeparate license
Hard delete + exportData with no retention requirementN/A (gone)Irreversible

Big Object Index Rules

  • Index defined at creation — immutable after
  • Max 5 fields in the index
  • First field is most significant (leftmost in composite key)
  • You can ONLY query by index fields
  • Wrong index = delete Big Object + recreate (all data lost)

Batch Apex for LDV

ParameterGuidance
Default scope200 records per execute
Complex processingReduce to 50-100
Simple operationsIncrease to 1,000-2,000
Max QueryLocator50 million records
Concurrent batch jobsUp to 5 actively processing + up to 100 in flex queue
ChainingStart next job in finish() method
StatefulUse Database.Stateful for cross-batch state (increases memory)

Platform Cache for LDV

Cache TypeScopeMax SizeUse For
Org CacheAll users30 MB default (EE/UE), purchasable up to 3 GBConfig data, reference data, exchange rates
Session CachePer userPartition-based, drawn from org totalUser prefs, wizard state, shopping cart

Pattern: Cache-aside — check cache first, query on miss, populate cache on miss.

Reverse-Engineered Use Cases

Scenario 1: Financial Services — Transaction History

Situation: Bank with 50M+ transaction records growing by 2M/month. Users need to see recent transactions (last 90 days) on Account pages. Compliance requires 7-year retention.

What you’d do: Keep last 90 days in a custom Transaction__c object with custom indexes on Account__c (lookup) + Transaction_Date__c (two-column composite index). Archive records older than 90 days to a Big Object (Transaction_Archive__b) with an index on Account_ID, Transaction_Date. Set up nightly Batch Apex job to move aged records. Surface archived data via a custom LWC that queries the Big Object via standard SOQL on indexed fields.

Why: 90 days of data stays under 6M records (manageable LDV). Big Object handles the 400M+ historical records without storage cost. Two-column index ensures selective queries by account + date range.

Scenario 2: Telecom — Case Volume with Ownership Skew

Situation: Telecom company with 15M Cases. All inbound cases initially assigned to a single “Triage” queue, causing ownership skew and record locking during batch assignment.

What you’d do: Replace single Triage queue with 5 regional triage queues. Use round-robin assignment rule to distribute incoming cases. For batch reassignment, use serial mode Bulk API with batch size of 100 to avoid lock contention. Implement custom index on Case Status + Region to ensure selective list views.

Why: Splitting the queue eliminates ownership skew. Serial mode prevents lock contention during batch operations. Regional index enables selective queries for agents filtering by their region.

Scenario 3: Retail — Product Catalog with LDV Reporting

Situation: Retailer with 8M Product records and 200M Order Line Items. Reports timing out when aggregating sales by product category.

What you’d do: Request skinny table on Order Line Item with only the fields used in reports (Product_Category__c, Amount__c, Order_Date__c, Status__c). Add custom index on Product_Category__c. For historical analytics beyond 2 years, push data to Data Cloud for SQL-based analytics. Use Platform Cache for product category reference data to avoid repeated SOQL.

Why: Skinny table reduces I/O by eliminating the 60+ unused fields on each query. Custom index ensures selective filtering by category. Data Cloud handles the analytical workload that would crush on-platform reporting.

Sources