LDV: Quick Reference
Cheat sheet for Large Data Volume strategy at the CTA review board. For the full deep dive, see Large Data Volumes and External Data.
LDV Threshold Table
| Records per Object | Classification | Action Required |
|---|---|---|
| < 500K | Normal | Standard best practices |
| 500K — 1M | Approaching LDV | Monitor query plans, plan indexes |
| 1M — 10M | LDV | Custom indexes, selective queries mandatory |
| 10M — 100M | High LDV | Skinny tables, archival strategy, batch optimization |
| 100M+ | Extreme LDV | Big Objects, external storage, or Data Cloud |
Volume is relative
A 500K-record object with unfiltered list views performs worse than a 5M-record object with highly selective indexed queries. Always evaluate volume alongside access patterns.
Indexing Cheat Sheet
Auto-Indexed Fields (Every Object)
Id, Name, OwnerId, CreatedDate, LastModifiedDate, SystemModstamp, RecordTypeId, all lookup/MD fields, all unique/External ID fields.
Custom Index Types
| Type | Description | Request Via |
|---|---|---|
| Single-column | One custom field | Salesforce Support case |
| Two-column (composite) | Two fields together | Salesforce Support case |
| Skinny table | Denormalized subset of fields in a narrow table | Salesforce Support case |
Skinny Tables — Key Facts
- Created by Salesforce Support — you cannot self-provision
- Subset of fields from one object in a separate narrow table
- Query optimizer uses them transparently (invisible to developer)
- Exclude soft-deleted records (IsDeleted = true)
- Must contact Support when adding/removing fields
- Work on both standard and custom objects
- Best ROI on objects with 100+ fields where queries use a small subset
Skinny table governance
Schema changes require a Support case. Factor this operational dependency into your governance model. Do not promise skinny tables unless you have a Support escalation path.
Query Selectivity Thresholds
graph TD
A[SOQL Query] --> B{Filter field<br/>indexed?}
B -->|No| C[Full Table Scan]
B -->|Yes| D{Threshold check}
D --> E{Standard Index}
D --> F{Custom Index}
E -->|Returns < 30% of first 1M,<br/>15% beyond. Max 1M rows| G[INDEX USED]
F -->|Returns < 10% of first 1M,<br/>5% beyond. Max 333,333 rows| G
E -->|Above threshold| C
F -->|Above threshold| C
C --> H{Object size?}
H -->|Small < 500K| I[OK performance]
H -->|Large LDV| J[SLOW / TIMEOUT]
style G fill:#51cf66,color:#fff
style J fill:#ff6b6b,color:#fff
Selectivity Rules Summary
| Condition | Threshold |
|---|---|
| Standard index selective | 30% of first 1M records, then 15% beyond 1M. Max 1,000,000 rows |
| Custom index selective | 10% of first 1M records, then 5% beyond 1M. Max 333,333 rows |
| AND filters | Each filter independently selective OR combined selective |
| OR filters | Every filter must be selective (one bad filter ruins it) |
| NOT / != operators | Generally non-selective — avoid on LDV |
| NULL checks | Non-selective — WHERE Field = null almost never uses index |
| LIKE with leading wildcard | Non-selective — LIKE '%text' cannot use index |
Board technique
State specific numbers: “The custom index on External_ID__c will be selective because it targets fewer than 10% of the first million records, well under the 333,333-row maximum.” This demonstrates practical depth.
Data Skew Quick Reference
Types and Thresholds
| Skew Type | Trigger | Caution Zone | Danger Zone |
|---|---|---|---|
| Account skew | Many children on one Account | > 10,000 children | > 100,000 children |
| Ownership skew | Many records owned by one user/queue | > 10,000 records | > 100,000 records |
| Lookup skew | Many records referencing same parent via lookup | > 10,000 references | > 100,000 references |
Skew Mitigation Decision Tree
graph TD
A[Skew Detected] --> B{Type?}
B --> C[Account Skew]
B --> D[Ownership Skew]
B --> E[Lookup Skew]
C --> C1[Split mega-accounts<br/>by region/division]
C --> C2[Switch to lookup to<br/>avoid sharing cascade]
C --> C3[Archive old child records]
D --> D1[Distribute across<br/>multiple users/queues]
D --> D2[Ensure skewed owner<br/>has no role in hierarchy]
D --> D3[Use system integration<br/>user for API records]
E --> E1[Eliminate catch-all<br/>parent records]
E --> E2[Remove sparse<br/>optional lookups]
E --> E3[Use deferred sharing<br/>calculation for batch ops]
Skew affects the entire org
A single mega-account with 200K children causes sharing recalculation delays and record locking that degrades performance for ALL users, not just those accessing the skewed records.
Archival Strategy Quick Reference
| Storage Option | Best For | Query Method | Limits |
|---|---|---|---|
| Big Objects | Audit trails, historical transactions, IoT | Standard SOQL on indexed fields (Async SOQL retired Summer ‘25) | 100 per org, immutable indexes |
| External Storage + Connect | Massive datasets needing SF visibility | OData via Salesforce Connect | 2,000 rows per query, callout limits |
| Data Cloud | Unified analytics, identity resolution | SQL-like on data lake | Separate license |
| Hard delete + export | Data with no retention requirement | N/A (gone) | Irreversible |
Big Object Index Rules
- Index defined at creation — immutable after
- Max 5 fields in the index
- First field is most significant (leftmost in composite key)
- You can ONLY query by index fields
- Wrong index = delete Big Object + recreate (all data lost)
Batch Apex for LDV
| Parameter | Guidance |
|---|---|
| Default scope | 200 records per execute |
| Complex processing | Reduce to 50-100 |
| Simple operations | Increase to 1,000-2,000 |
| Max QueryLocator | 50 million records |
| Concurrent batch jobs | Up to 5 actively processing + up to 100 in flex queue |
| Chaining | Start next job in finish() method |
| Stateful | Use Database.Stateful for cross-batch state (increases memory) |
Platform Cache for LDV
| Cache Type | Scope | Max Size | Use For |
|---|---|---|---|
| Org Cache | All users | 30 MB default (EE/UE), purchasable up to 3 GB | Config data, reference data, exchange rates |
| Session Cache | Per user | Partition-based, drawn from org total | User prefs, wizard state, shopping cart |
Pattern: Cache-aside — check cache first, query on miss, populate cache on miss.
Reverse-Engineered Use Cases
Scenario 1: Financial Services — Transaction History
Situation: Bank with 50M+ transaction records growing by 2M/month. Users need to see recent transactions (last 90 days) on Account pages. Compliance requires 7-year retention.
What you’d do: Keep last 90 days in a custom Transaction__c object with custom indexes on Account__c (lookup) + Transaction_Date__c (two-column composite index). Archive records older than 90 days to a Big Object (Transaction_Archive__b) with an index on Account_ID, Transaction_Date. Set up nightly Batch Apex job to move aged records. Surface archived data via a custom LWC that queries the Big Object via standard SOQL on indexed fields.
Why: 90 days of data stays under 6M records (manageable LDV). Big Object handles the 400M+ historical records without storage cost. Two-column index ensures selective queries by account + date range.
Scenario 2: Telecom — Case Volume with Ownership Skew
Situation: Telecom company with 15M Cases. All inbound cases initially assigned to a single “Triage” queue, causing ownership skew and record locking during batch assignment.
What you’d do: Replace single Triage queue with 5 regional triage queues. Use round-robin assignment rule to distribute incoming cases. For batch reassignment, use serial mode Bulk API with batch size of 100 to avoid lock contention. Implement custom index on Case Status + Region to ensure selective list views.
Why: Splitting the queue eliminates ownership skew. Serial mode prevents lock contention during batch operations. Regional index enables selective queries for agents filtering by their region.
Scenario 3: Retail — Product Catalog with LDV Reporting
Situation: Retailer with 8M Product records and 200M Order Line Items. Reports timing out when aggregating sales by product category.
What you’d do: Request skinny table on Order Line Item with only the fields used in reports (Product_Category__c, Amount__c, Order_Date__c, Status__c). Add custom index on Product_Category__c. For historical analytics beyond 2 years, push data to Data Cloud for SQL-based analytics. Use Platform Cache for product category reference data to avoid repeated SOQL.
Why: Skinny table reduces I/O by eliminating the 60+ unused fields on each query. Custom index ensures selective filtering by category. Data Cloud handles the analytical workload that would crush on-platform reporting.