LDV: Quick Reference
Cheat sheet for Large Data Volume strategy at the CTA review board. For the full deep dive, see Large Data Volumes and External Data.
LDV Threshold Table
| Records per Object | Classification | Action Required |
|---|---|---|
| < 500K | Normal | Standard best practices |
| 500K - 1M | Approaching LDV | Monitor query plans, plan indexes |
| 1M - 10M | LDV | Custom indexes, selective queries mandatory |
| 10M - 100M | High LDV | Skinny tables, archival strategy, batch optimization |
| 100M+ | Extreme LDV | Big Objects, external storage, or Data Cloud |
Volume is relative
A 500K-record object with unfiltered list views performs worse than a 5M-record object with highly selective indexed queries. Always evaluate volume alongside access patterns.
Indexing Cheat Sheet
Auto-Indexed Fields (Every Object)
Id, Name, OwnerId, CreatedDate, LastModifiedDate, SystemModstamp, RecordTypeId, all lookup/MD fields, all unique/External ID fields.
Custom Index Types
| Type | Description | Request Via |
|---|---|---|
| Single-column | One custom field | Salesforce Support case |
| Two-column (composite) | Two fields together | Salesforce Support case |
| Skinny table | Denormalized subset of fields in a narrow table | Salesforce Support case |
Skinny Tables - Key Facts
- Created by Salesforce Support - you cannot self-provision
- Subset of fields from one object in a separate narrow table
- Query optimizer uses them transparently (invisible to developer)
- Exclude soft-deleted records (IsDeleted = true)
- Must contact Support when adding/removing fields
- Work on both standard and custom objects
- Best ROI on objects with 100+ fields where queries use a small subset
Skinny table governance
Schema changes require a Support case. Factor this operational dependency into your governance model. Do not promise skinny tables unless you have a Support escalation path.
Query Selectivity Thresholds
Selectivity Rules Summary
| Condition | Threshold |
|---|---|
| Standard index selective | 30% of first 1M records, then 15% beyond 1M. Max 1,000,000 rows |
| Custom index selective | 10% of first 1M records, then 5% beyond 1M. Max 333,333 rows |
| AND filters | Each filter independently selective OR combined selective |
| OR filters | Every filter must be selective (one bad filter ruins it) |
| NOT / != operators | Generally non-selective - avoid on LDV |
| NULL checks | Non-selective - WHERE Field = null almost never uses index |
| LIKE with leading wildcard | Non-selective - LIKE '%text' cannot use index |
Board technique
State specific numbers: “The custom index on External_ID__c will be selective because it targets fewer than 10% of the first million records, well under the 333,333-row maximum.” This demonstrates practical depth.
Data Skew Quick Reference
Types and Thresholds
| Skew Type | Trigger | Caution Zone | Danger Zone |
|---|---|---|---|
| Account skew | Many children on one Account | > 10,000 children | > 100,000 children |
| Ownership skew | Many records owned by one user/queue | > 10,000 records | > 100,000 records |
| Lookup skew | Many records referencing same parent via lookup | > 10,000 references | > 100,000 references |
Skew Mitigation Decision Tree
Skew affects the entire org
A single mega-account with 200K children causes sharing recalculation delays and record locking that hurts performance for ALL users, not just those accessing the skewed records.
Archival Strategy Quick Reference
| Storage Option | Best For | Query Method | Limits |
|---|---|---|---|
| Big Objects | Audit trails, historical transactions, IoT | Standard SOQL on indexed fields only; Batch Apex or Bulk API 2.0 for complex queries (Async SOQL retired Summer 2023) | 100 per org, immutable indexes |
| External Storage + Connect | Massive datasets needing SF visibility | OData via Salesforce Connect | 2,000 rows per query, callout limits |
| Data Cloud | Unified analytics, identity resolution | SQL-like on data lake | Separate license |
| Hard delete + export | Data with no retention requirement | N/A (gone) | Irreversible |
Big Object Index Rules
- Index defined at creation, immutable after
- Max 5 fields in the index
- First field is most significant (leftmost in composite key)
- You can ONLY query by index fields
- Wrong index means deleting the Big Object and recreating it (all data lost)
Batch Apex for LDV
| Parameter | Guidance |
|---|---|
| Default scope | 200 records per execute |
| Complex processing | Reduce to 50-100 |
| Simple operations | Increase to 1,000-2,000 |
| Max QueryLocator | 50 million records |
| Concurrent batch jobs | Up to 5 actively processing + up to 100 in flex queue |
| Chaining | Start next job in finish() method |
| Stateful | Use Database.Stateful for cross-batch state (increases memory) |
Platform Cache for LDV
| Cache Type | Scope | Max Size | Use For |
|---|---|---|---|
| Org Cache | All users | 30 MB default (EE/UE), purchasable up to 3 GB | Config data, reference data, exchange rates |
| Session Cache | Per user | Partition-based, drawn from org total | User prefs, wizard state, shopping cart |
Pattern: Cache-aside - check cache first, query on miss, populate cache on miss.
Reverse-Engineered Use Cases
Scenario 1: Financial Services - Transaction History
Situation: Bank with 50M+ transaction records growing by 2M/month. Users need to see recent transactions (last 90 days) on Account pages. Compliance requires 7-year retention.
What you’d do: Keep last 90 days in a custom Transaction__c object with custom indexes on Account__c (lookup) + Transaction_Date__c (two-column composite index). Archive records older than 90 days to a Big Object (Transaction_Archive__b) with an index on Account_ID, Transaction_Date. Set up nightly Batch Apex job to move aged records. Surface archived data via a custom LWC that queries the Big Object via standard SOQL on indexed fields.
Why: 90 days of data stays under 6M records (manageable LDV). Big Object handles the 400M+ historical records without storage cost. Two-column index ensures selective queries by account + date range.
Scenario 2: Telecom - Case Volume with Ownership Skew
Situation: Telecom company with 15M Cases. All inbound cases initially assigned to a single “Triage” queue, causing ownership skew and record locking during batch assignment.
What you’d do: Replace single Triage queue with 5 regional triage queues. Use round-robin assignment rule to distribute incoming cases. For batch reassignment, use serial mode Bulk API with batch size of 100 to avoid lock contention. Implement custom index on Case Status + Region to ensure selective list views.
Why: Splitting the queue eliminates ownership skew. Serial mode prevents lock contention during batch operations. Regional index enables selective queries for agents filtering by their region.
Scenario 3: Retail - Product Catalog with LDV Reporting
Situation: Retailer with 8M Product records and 200M Order Line Items. Reports timing out when aggregating sales by product category.
What you’d do: Request skinny table on Order Line Item with only the fields used in reports (Product_Category__c, Amount__c, Order_Date__c, Status__c). Add custom index on Product_Category__c. For historical analytics beyond 2 years, push data to Data Cloud for SQL-based analytics. Use Platform Cache for product category reference data to avoid repeated SOQL.
Why: Skinny table reduces I/O by eliminating the 60+ unused fields on each query. Custom index ensures selective filtering by category. Data Cloud handles the analytical workload that would crush on-platform reporting.
Sources
Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.