Data Quality & Governance

Data quality and governance are ongoing disciplines, not one-time activities. Poor data quality undermines every downstream system: reports lie, integrations fail, and users lose trust in the platform.

Data Profiling

Profiling is the first step in understanding the data. It applies to both migration scenarios and ongoing data health monitoring.

Profiling Dimensions

Dimension	What to Measure	Red Flags
Completeness	% of required fields populated	Key fields < 80% populated
Accuracy	Values match real-world reality	Stale addresses, wrong phone formats
Consistency	Same data represented the same way	”US” vs “USA” vs “United States”
Uniqueness	No unintended duplicates	Duplicate accounts, contacts
Timeliness	Data is current enough for its use	Last modified > 2 years ago
Validity	Values conform to expected formats	Dates in text fields, invalid emails

Profiling Tools

Salesforce Reports: Record counts, field completeness via formula fields
Data Loader exports: Export and analyze in Excel/Python for pattern detection
Third-party tools: Informatica Data Quality, DemandTools, Validity (RingLead)
Einstein Analytics / CRM Analytics (formerly Tableau CRM): Dashboard-based data quality monitoring
Apex scripts: Custom profiling for complex business rules

Deduplication

Duplicates are the most visible data quality problem. Salesforce provides native deduplication tools, but a CTA must design a broader strategy.

Native Salesforce Dedup

Matching Rules

Matching rules define when two records are considered potential duplicates.

Component	Description
Matching method	Exact or Fuzzy
Matching criteria	Fields to compare (Name, Email, Phone, Address)
Match key	Combination of fields that trigger comparison
Blank fields	How to handle nulls (match or skip)

Standard matching rules exist for Account, Contact, and Lead. Custom matching rules can be created for any object.

Duplicate Rules

Duplicate rules define what happens when a match is found:

Action	Effect
Alert	Warn the user but allow save
Block	Prevent the record from being saved
Report	Log the duplicate for later review

Alert vs Block trade-off

Blocking duplicates protects data quality but frustrates users and can block legitimate records (false positives). Alerting preserves user productivity but relies on users making good decisions. Most CTA solutions recommend starting with alert plus reporting, then gradually tightening to block as matching rules prove accurate.

Third-Party Deduplication

For enterprise-scale deduplication, native tools may fall short:

Tool	Capability
DemandTools (Validity)	Mass dedup, merge, standardization
Cloudingo	Automated dedup with scheduling
RingLead	Real-time and batch dedup
Informatica	Enterprise MDM with fuzzy matching
DupeCatcher	Free AppExchange duplicate prevention

Dedup Strategy Layers

Figure 1. Effective deduplication requires three coordinated layers: real-time prevention at the point of data entry, scheduled batch scans to catch duplicates that slip through, and pre-load checks at integration boundaries to upsert rather than insert when a match exists. No single layer catches everything on its own.

Master Data Management

MDM ensures that critical business entities (customers, products, employees) have a single, authoritative source of truth across all systems.

MDM Approaches

Approach	Description	When to Use
Registry	Each system maintains its own copy; a central registry maps IDs	Low integration maturity, many legacy systems
Consolidation	Data is copied to a master hub for reporting, not written back	Read-only analytics, data warehouse model
Coexistence	Multiple systems share and synchronize master data	Multiple systems of record per entity
Centralized	One system is the master; others are consumers	Clear system of record exists (e.g., Salesforce for customers)

Salesforce as MDM Hub

Salesforce can serve as the master for customer data (Account, Contact) but is rarely the right choice for all entity types:

Entity	Salesforce as Master?	Notes
Customer (B2B)	Often yes	Account/Contact is natural fit
Customer (B2C)	Sometimes	Person Accounts or Data Cloud
Product	Sometimes	CPQ scenarios; otherwise ERP
Employee	Rarely	HR systems (Workday, SAP HCM) are better fit
Financial data	No	ERP is the master
Inventory	No	ERP/WMS is the master

Data Lifecycle Management

Every record has a lifecycle. Design for the full journey, not just creation.

Figure 2. Data lifecycle management defines what happens to every record as it ages. Records that bypass the Archive stage and go directly from Maintain to Delete require careful governance, because hard deletes are irreversible and must comply with any applicable retention obligations before execution.

Lifecycle Stages

Create

Define data entry standards (required fields, validation rules, dependent picklists)
Integration-created records need quality controls (field mapping validation, dedup)
Bulk imports need pre-load quality checks

Maintain

Ongoing enrichment (address verification, firmographic data)
Periodic deduplication scans
Data steward reviews and corrections
Automation to flag stale records (e.g., Account not modified in 12 months)

Delete

Soft delete: Records go to Recycle Bin (recoverable for 15 days)
Hard delete: Permanent removal (Bulk API with hardDelete option)
GDPR right to erasure: Must be able to permanently delete all personal data for a data subject
Document deletion policies and audit trails

Data Retention Policies

Retention policies define how long different data types must be kept. Business requirements, legal obligations, and compliance mandates drive these decisions.

Designing Retention Policies

Data Category	Typical Retention	Driving Factor
Active customer records	Indefinite while customer active	Business need
Closed opportunities (won)	5-7 years	Financial audit
Closed opportunities (lost)	1-2 years	Sales analytics
Support cases	3-5 years	Service quality, legal
Email messages	1-3 years	Communication audit
Audit trail (field history)	18-24 months on-platform	Compliance
Task/Event activities	1-2 years	Business need
Debug/error logs	30-90 days	Operational

Retention vs archival

Retention defines how long data must exist somewhere. Archival defines where it lives after leaving the active database. A record can be archived (moved to Big Object) while still meeting its retention requirement.

Data Classification Framework

Data classification drives encryption, access control, retention, and compliance decisions. Establish classification tiers during data model design, not as a retrofit.

Figure 3. Data classification drives encryption, access control, and retention decisions. Establishing classification tiers during data model design prevents costly retrofits. Adding Shield Platform Encryption to a field after data exists requires re-encryption of all existing records and disables certain query and formula capabilities.

Tier	Examples	Security Controls
Restricted	SSN, credit card, health records	Shield Encryption, FLS, audit trail, right-to-erasure
Confidential	Salary, revenue, pricing strategy	FLS, sharing rules, data masking in sandboxes
Internal	Employee IDs, internal notes	Role-based access, standard sharing model
Public	Product names, company address	Portal/community visible, no restrictions

Data Governance Process Flow

Governance is not a one-time setup. It is an ongoing operational process with defined roles, cadences, and escalation paths.

Figure 4. Every data quality issue should close with a prevention step, not just a fix. Updating validation rules, tightening matching criteria, or adding a monitoring metric closes the loop so the same issue does not recur. Governance without prevention is just repeated remediation.

Data Stewardship Model

Data stewardship assigns accountability for data quality to specific people or roles.

Stewardship Roles

Role	Responsibility
Data Owner	Business executive accountable for data quality decisions
Data Steward	Hands-on responsibility for monitoring and correcting data
Data Custodian	Technical team managing data storage, security, and access
Data Consumer	End users who rely on data quality for their work

Stewardship Processes

Regular data quality reviews: Monthly or quarterly steward reviews of quality dashboards
Issue resolution workflow: Process for reporting and fixing data quality issues
Change management: Stewards approve changes to data standards, picklist values, record types
Training: Ongoing user training on data entry standards

Compliance

The General Data Protection Regulation (and similar privacy laws) imposes specific requirements on Salesforce data architecture:

GDPR Right	Salesforce Implementation
Right to access	Data export, reports, customer portals
Right to rectification	Standard edit capabilities, community self-service
Right to erasure	Hard delete, field-level encryption with key destruction
Right to portability	Data export in machine-readable format (CSV, JSON)
Right to restriction	Record-level flags, process exclusion logic
Consent management	Custom objects or Salesforce Privacy Center

Data Residency

Some regulations require data to remain within specific geographic boundaries:

Salesforce data residency: Data stored in the instance region (NA, EU, AP)
Hyperforce: Enables deployment in specific public cloud regions
Encryption: Salesforce Shield Platform Encryption with Bring Your Own Key (BYOK)
Cross-border transfers: Documented in Salesforce’s Data Processing Addendum

Salesforce Shield

Shield provides compliance-focused features:

Feature	Purpose
Platform Encryption	Encrypt data at rest for sensitive fields
Event Monitoring	Track user behavior and API activity
Field Audit Trail	Retain field history beyond standard 18-month limit (up to 10 years)

Encryption trade-offs

Encrypting fields with Shield Platform Encryption disables certain features: sorting, filtering in some contexts, formula field references, and deterministic functions. Select which fields to encrypt based on sensitivity classification, not on a blanket “encrypt everything” approach.

Data Quality Metrics Dashboard

Design a data quality dashboard that stewards review regularly:

Metric	Measurement	Target
Duplicate rate	% of records flagged as duplicates	< 5%
Completeness score	Avg % of required fields populated	> 90%
Stale records	Records not modified in 12+ months	< 20% of active records
Orphan records	Child records with broken lookups	0%
Invalid values	Records failing validation logic	< 2%
Integration errors	Failed integration record creates/updates	< 1%

Shield Encryption: data classification drives encryption decisions and field-level security
Sharing Model: data sensitivity tiers influence OWD settings and sharing rules
Integration Patterns: data quality at integration boundaries affects reliability and error rates
Data Migration: pre-migration profiling is a data quality exercise
Large Data Volumes: archival is both an LDV strategy and a governance activity
Development Lifecycle: data governance is part of organizational change management
Declarative vs Programmatic: validation rules and duplicate rules are declarative quality controls

Sources

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.