Data

Key Takeaways

Data architecture decisions cascade through security, integration, and performance. Master data modeling trade-offs (lookup vs master-detail, normalization vs denormalization), LDV strategies (indexing, skinny tables, data skew mitigation), and migration planning. Every modeling choice has downstream consequences for sharing, query performance, and API payloads.

This domain covers data architecture, large data volume strategies, data modeling, and migration planning. Data decisions affect security (sharing model), integration (API payloads), and performance (query selectivity). Every modeling choice has downstream consequences.

Topics

Core Concepts

Data Modeling: Standard vs custom objects, relationships (lookup vs master-detail), junction objects, polymorphic lookups, external objects, Big Objects, record types, Person Accounts, formula fields, roll-up summaries, ERD patterns
Large Data Volumes: LDV thresholds, indexing (standard, custom, skinny tables), query selectivity, data skew (account, ownership, lookup), archival strategies, Batch Apex, Platform Cache
Data Migration: Migration phases, tools (Data Loader, Bulk API 2.0, Informatica, MuleSoft), load sequencing, External IDs, cutover strategies (big bang, phased, parallel), trial migrations, validation

Governance & Quality

Data Quality & Governance: Data profiling, deduplication (matching rules, duplicate rules), master data management, data lifecycle, retention policies, data stewardship, compliance (GDPR, data residency, Shield)
External Data: Salesforce Connect (OData, cross-org adapter), External Objects, Big Objects, Data Cloud, data virtualization vs replication, hybrid patterns
Data Cloud Architecture: Data Cloud (Data 360) deep dive: DSO/DLO/DMO hierarchy, identity resolution, calculated insights, segments, activation, zero-copy partner network, credit consumption model

Decision Frameworks

Decision Guides: Mermaid decision flowcharts for lookup vs master-detail, standard vs custom objects, archival strategy, migration approach, Person Accounts, normalized vs denormalized, virtualization vs ETL
Trade-offs: Normalization vs denormalization, on-platform vs external data, big bang vs phased migration, standard vs custom objects, lookup vs master-detail
Best Practices & Anti-Patterns: Organized by modeling, LDV, migration, quality, and governance with paired best practice and anti-pattern for each area

Objectives

Platform architecture considerations and optimization for large data volumes (LDV)
Data modeling concepts and database design implications
Data migration strategy, considerations, and appropriate tools
Data quality, governance, and compliance
External data access patterns and virtualization

Practice

Domain Grilling: D3 Data Q&A

Key Exam Focus Areas

The CTA review board probes Domain 3 at the decision and trade-off layer, not at the feature recall layer. These are the areas where candidates are most often challenged:

LDV identification and threshold reasoning. Candidates must identify which objects will exceed volume thresholds and proactively recommend mitigation: custom indexes for selectivity gaps, skinny tables for wide objects with high scan overhead, and archival for sustained growth. Simply naming “skinny tables” without explaining when a custom index would not suffice is not enough — the board pushes on the distinction.
Data skew in real-world scenarios. Account skew (mega-accounts with 10K+ children) and ownership skew (bulk integrations creating a single-owner bottleneck) are common failure modes candidates miss because the data model looks clean in the abstract. Raise skew proactively in any B2C or high-volume integration scenario.
Relationship type cascade consequences. Choosing master-detail has permanent implications for cascade delete, sharing inheritance, and the 2-per-object hard limit. Candidates frequently propose master-detail without addressing the cascade delete or sharing implications, then cannot defend the choice under Q&A.
Migration strategy selection with justification. The board expects a cutover recommendation tied to specific scenario constraints: downtime tolerance, data volume, and risk appetite. “Phased migration” without explaining which phases, in what sequence, and why Big Bang is not viable scores poorly.
Person Accounts irreversibility. Recommending Person Accounts without flagging that enablement is org-wide and permanent is a significant flag for the board. In B2B2C scenarios, candidates must address how both Business Accounts and Person Accounts coexist and what that means for integrations filtering on IsPersonAccount.
MDM pattern selection. When a scenario involves customer data across multiple systems, the board expects a pattern recommendation (centralized, coexistence, registry, or Data Cloud-based identity resolution) with criteria for why that pattern fits. Defaulting to “Salesforce is the master” without evaluating the other systems is too shallow.
Encryption and query trade-offs. Shield Platform Encryption disables sorting, filtering in some contexts, and formula field references on encrypted fields. Recommending broad encryption without addressing these query limitations is a technical error the board will probe.
Trial migration discipline. Any migration discussion that does not include trial migrations (minimum three full runs) and a rollback plan will be challenged. The three-run rule — structural issues, fix validation, repeatability proof — is a concrete framework the board recognizes.

Data architecture decisions ripple across the entire solution. These domains are most tightly coupled:

System Architecture: Data volume and LDV constraints directly influence org design decisions. Multi-org architectures are sometimes driven by data partitioning needs (geographic data residency, volume distribution across orgs) rather than purely by business structure.
Security: The relationship type chosen in the data model determines whether child records inherit the parent’s sharing model (master-detail) or maintain independent sharing (lookup). Shield Platform Encryption decisions must account for query performance degradation on encrypted fields — sorting, filtering, and formula references are restricted.
Integration: Bulk API 2.0 is both a migration tool and an integration pattern for high-volume data exchange. External Object access via Salesforce Connect is an integration architecture concern as much as a data concern. ETL tool selection for migration uses the same evaluation criteria as middleware selection for ongoing integration.
Development Lifecycle: Data migration is a deployable workstream with its own environment strategy. Trial migrations require full-copy or partial-copy sandboxes with production-equivalent volumes. Data governance programs involve change management and training, which are organizational change topics that appear in Domain 6.

Frequently Asked Questions

What data architecture topics does the CTA exam cover?

The CTA exam covers data modeling (object relationships, junction objects, polymorphic lookups, ERD patterns), large data volume strategies (indexing, skinny tables, data skew mitigation), data migration planning (tool selection, load sequencing, cutover strategies), data quality and governance (deduplication, MDM, retention policies), and external data access (Salesforce Connect, Data Cloud).

How is Data Architecture scored in the CTA review board?

Judges evaluate whether the data model supports the required sharing model, whether LDV concerns are addressed with concrete strategies, whether the migration approach includes trial migrations and rollback plans, and whether relationship type choices (lookup vs master-detail) are defended with clear reasoning about cascading impacts on security and deletion behavior.

What are the most common mistakes in Data Architecture during the CTA exam?

Candidates frequently fail by using master-detail relationships without considering cascade delete and sharing implications, ignoring data skew in high-volume scenarios (account skew, ownership skew), proposing big-bang migration without a phased alternative, neglecting data archival for growing datasets, and not considering the query selectivity impact of their indexing strategy.

How should I handle large data volume scenarios in the CTA exam?

Start by identifying the data volumes mentioned in the scenario and mapping them against platform thresholds. Address indexing strategy (standard, custom, skinny tables), query selectivity for SOQL performance, data skew mitigation (especially account and ownership skew), archival approach (Big Objects, external storage, Data Cloud), and Batch Apex patterns for processing. Show that you understand the 2M+ record threshold where LDV strategies become critical.

Recommend Data Cloud when the scenario requires identity resolution across systems, customer segmentation, calculated insights, or activation to marketing channels. Recommend Salesforce Connect (OData, cross-org adapter) when you need real-time access to external data without replication, the data volumes are moderate, and the external system has a stable API. Consider hybrid patterns when both real-time access and analytics are needed.

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.