LDV: Quick Reference

Cheat sheet for Large Data Volume strategy at the CTA review board. For the full deep dive, see Large Data Volumes and External Data.

LDV Threshold Table

Records per Object	Classification	Action Required
< 500K	Normal	Standard best practices
500K — 1M	Approaching LDV	Monitor query plans, plan indexes
1M — 10M	LDV	Custom indexes, selective queries mandatory
10M — 100M	High LDV	Skinny tables, archival strategy, batch optimization
100M+	Extreme LDV	Big Objects, external storage, or Data Cloud

Volume is relative

A 500K-record object with unfiltered list views performs worse than a 5M-record object with highly selective indexed queries. Always evaluate volume alongside access patterns.

Indexing Cheat Sheet

Auto-Indexed Fields (Every Object)

Id, Name, OwnerId, CreatedDate, LastModifiedDate, SystemModstamp, RecordTypeId, all lookup/MD fields, all unique/External ID fields.

Custom Index Types

Type	Description	Request Via
Single-column	One custom field	Salesforce Support case
Two-column (composite)	Two fields together	Salesforce Support case
Skinny table	Denormalized subset of fields in a narrow table	Salesforce Support case

Skinny Tables — Key Facts

Created by Salesforce Support — you cannot self-provision
Subset of fields from one object in a separate narrow table
Query optimizer uses them transparently (invisible to developer)
Exclude soft-deleted records (IsDeleted = true)
Must contact Support when adding/removing fields
Work on both standard and custom objects
Best ROI on objects with 100+ fields where queries use a small subset

Skinny table governance

Schema changes require a Support case. Factor this operational dependency into your governance model. Do not promise skinny tables unless you have a Support escalation path.

Query Selectivity Thresholds

graph TD
    A[SOQL Query] --> B{Filter field<br/>indexed?}
    B -->|No| C[Full Table Scan]
    B -->|Yes| D{Threshold check}
    D --> E{Standard Index}
    D --> F{Custom Index}
    E -->|Returns < 30% of first 1M,<br/>15% beyond. Max 1M rows| G[INDEX USED]
    F -->|Returns < 10% of first 1M,<br/>5% beyond. Max 333,333 rows| G
    E -->|Above threshold| C
    F -->|Above threshold| C
    C --> H{Object size?}
    H -->|Small < 500K| I[OK performance]
    H -->|Large LDV| J[SLOW / TIMEOUT]

    style G fill:#51cf66,color:#fff
    style J fill:#ff6b6b,color:#fff

Selectivity Rules Summary

Condition	Threshold
Standard index selective	30% of first 1M records, then 15% beyond 1M. Max 1,000,000 rows
Custom index selective	10% of first 1M records, then 5% beyond 1M. Max 333,333 rows
AND filters	Each filter independently selective OR combined selective
OR filters	Every filter must be selective (one bad filter ruins it)
NOT / != operators	Generally non-selective — avoid on LDV
NULL checks	Non-selective — `WHERE Field = null` almost never uses index
LIKE with leading wildcard	Non-selective — `LIKE '%text'` cannot use index

Board technique

State specific numbers: “The custom index on External_ID__c will be selective because it targets fewer than 10% of the first million records, well under the 333,333-row maximum.” This demonstrates practical depth.

Data Skew Quick Reference

Types and Thresholds

Skew Type	Trigger	Caution Zone	Danger Zone
Account skew	Many children on one Account	> 10,000 children	> 100,000 children
Ownership skew	Many records owned by one user/queue	> 10,000 records	> 100,000 records
Lookup skew	Many records referencing same parent via lookup	> 10,000 references	> 100,000 references

Skew Mitigation Decision Tree

graph TD
    A[Skew Detected] --> B{Type?}
    B --> C[Account Skew]
    B --> D[Ownership Skew]
    B --> E[Lookup Skew]

    C --> C1[Split mega-accounts<br/>by region/division]
    C --> C2[Switch to lookup to<br/>avoid sharing cascade]
    C --> C3[Archive old child records]

    D --> D1[Distribute across<br/>multiple users/queues]
    D --> D2[Ensure skewed owner<br/>has no role in hierarchy]
    D --> D3[Use system integration<br/>user for API records]

    E --> E1[Eliminate catch-all<br/>parent records]
    E --> E2[Remove sparse<br/>optional lookups]
    E --> E3[Use deferred sharing<br/>calculation for batch ops]

Skew affects the entire org

A single mega-account with 200K children causes sharing recalculation delays and record locking that degrades performance for ALL users, not just those accessing the skewed records.

Archival Strategy Quick Reference

Storage Option	Best For	Query Method	Limits
Big Objects	Audit trails, historical transactions, IoT	Standard SOQL on indexed fields (Async SOQL retired Summer ‘25)	100 per org, immutable indexes
External Storage + Connect	Massive datasets needing SF visibility	OData via Salesforce Connect	2,000 rows per query, callout limits
Data Cloud	Unified analytics, identity resolution	SQL-like on data lake	Separate license
Hard delete + export	Data with no retention requirement	N/A (gone)	Irreversible

Big Object Index Rules

Index defined at creation — immutable after
Max 5 fields in the index
First field is most significant (leftmost in composite key)
You can ONLY query by index fields
Wrong index = delete Big Object + recreate (all data lost)

Batch Apex for LDV

Parameter	Guidance
Default scope	200 records per execute
Complex processing	Reduce to 50-100
Simple operations	Increase to 1,000-2,000
Max QueryLocator	50 million records
Concurrent batch jobs	Up to 5 actively processing + up to 100 in flex queue
Chaining	Start next job in `finish()` method
Stateful	Use `Database.Stateful` for cross-batch state (increases memory)

Platform Cache for LDV

Cache Type	Scope	Max Size	Use For
Org Cache	All users	30 MB default (EE/UE), purchasable up to 3 GB	Config data, reference data, exchange rates
Session Cache	Per user	Partition-based, drawn from org total	User prefs, wizard state, shopping cart

Pattern: Cache-aside — check cache first, query on miss, populate cache on miss.

Reverse-Engineered Use Cases

Scenario 1: Financial Services — Transaction History

Situation: Bank with 50M+ transaction records growing by 2M/month. Users need to see recent transactions (last 90 days) on Account pages. Compliance requires 7-year retention.

What you’d do: Keep last 90 days in a custom Transaction__c object with custom indexes on Account__c (lookup) + Transaction_Date__c (two-column composite index). Archive records older than 90 days to a Big Object (Transaction_Archive__b) with an index on Account_ID, Transaction_Date. Set up nightly Batch Apex job to move aged records. Surface archived data via a custom LWC that queries the Big Object via standard SOQL on indexed fields.

Why: 90 days of data stays under 6M records (manageable LDV). Big Object handles the 400M+ historical records without storage cost. Two-column index ensures selective queries by account + date range.

Scenario 2: Telecom — Case Volume with Ownership Skew

Situation: Telecom company with 15M Cases. All inbound cases initially assigned to a single “Triage” queue, causing ownership skew and record locking during batch assignment.

What you’d do: Replace single Triage queue with 5 regional triage queues. Use round-robin assignment rule to distribute incoming cases. For batch reassignment, use serial mode Bulk API with batch size of 100 to avoid lock contention. Implement custom index on Case Status + Region to ensure selective list views.

Why: Splitting the queue eliminates ownership skew. Serial mode prevents lock contention during batch operations. Regional index enables selective queries for agents filtering by their region.

Scenario 3: Retail — Product Catalog with LDV Reporting

Situation: Retailer with 8M Product records and 200M Order Line Items. Reports timing out when aggregating sales by product category.

What you’d do: Request skinny table on Order Line Item with only the fields used in reports (Product_Category__c, Amount__c, Order_Date__c, Status__c). Add custom index on Product_Category__c. For historical analytics beyond 2 years, push data to Data Cloud for SQL-based analytics. Use Platform Cache for product category reference data to avoid repeated SOQL.

Why: Skinny table reduces I/O by eliminating the 60+ unused fields on each query. Custom index ensures selective filtering by category. Data Cloud handles the analytical workload that would crush on-platform reporting.