Data Quality & Governance
Data quality and governance are ongoing disciplines, not one-time activities. Poor data quality undermines every downstream system: reports lie, integrations fail, and users lose trust in the platform.
Data Profiling
Profiling is the first step in understanding the data. It applies to both migration scenarios and ongoing data health monitoring.
Profiling Dimensions
| Dimension | What to Measure | Red Flags |
|---|---|---|
| Completeness | % of required fields populated | Key fields < 80% populated |
| Accuracy | Values match real-world reality | Stale addresses, wrong phone formats |
| Consistency | Same data represented the same way | ”US” vs “USA” vs “United States” |
| Uniqueness | No unintended duplicates | Duplicate accounts, contacts |
| Timeliness | Data is current enough for its use | Last modified > 2 years ago |
| Validity | Values conform to expected formats | Dates in text fields, invalid emails |
Profiling Tools
- Salesforce Reports: Record counts, field completeness via formula fields
- Data Loader exports: Export and analyze in Excel/Python for pattern detection
- Third-party tools: Informatica Data Quality, DemandTools, Validity (RingLead)
- Einstein Analytics / CRM Analytics (formerly Tableau CRM): Dashboard-based data quality monitoring
- Apex scripts: Custom profiling for complex business rules
Deduplication
Duplicates are the most visible data quality problem. Salesforce provides native deduplication tools, but a CTA must design a broader strategy.
Native Salesforce Dedup
Matching Rules
Matching rules define when two records are considered potential duplicates.
| Component | Description |
|---|---|
| Matching method | Exact or Fuzzy |
| Matching criteria | Fields to compare (Name, Email, Phone, Address) |
| Match key | Combination of fields that trigger comparison |
| Blank fields | How to handle nulls (match or skip) |
Standard matching rules exist for Account, Contact, and Lead. Custom matching rules can be created for any object.
Duplicate Rules
Duplicate rules define what happens when a match is found:
| Action | Effect |
|---|---|
| Alert | Warn the user but allow save |
| Block | Prevent the record from being saved |
| Report | Log the duplicate for later review |
Alert vs Block trade-off
Blocking duplicates protects data quality but frustrates users and can block legitimate records (false positives). Alerting preserves user productivity but relies on users making good decisions. Most CTA solutions recommend starting with alert plus reporting, then gradually tightening to block as matching rules prove accurate.
Third-Party Deduplication
For enterprise-scale deduplication, native tools may fall short:
| Tool | Capability |
|---|---|
| DemandTools (Validity) | Mass dedup, merge, standardization |
| Cloudingo | Automated dedup with scheduling |
| RingLead | Real-time and batch dedup |
| Informatica | Enterprise MDM with fuzzy matching |
| DupeCatcher | Free AppExchange duplicate prevention |
Dedup Strategy Layers
Master Data Management
MDM ensures that critical business entities (customers, products, employees) have a single, authoritative source of truth across all systems.
MDM Approaches
| Approach | Description | When to Use |
|---|---|---|
| Registry | Each system maintains its own copy; a central registry maps IDs | Low integration maturity, many legacy systems |
| Consolidation | Data is copied to a master hub for reporting, not written back | Read-only analytics, data warehouse model |
| Coexistence | Multiple systems share and synchronize master data | Multiple systems of record per entity |
| Centralized | One system is the master; others are consumers | Clear system of record exists (e.g., Salesforce for customers) |
Salesforce as MDM Hub
Salesforce can serve as the master for customer data (Account, Contact) but is rarely the right choice for all entity types:
| Entity | Salesforce as Master? | Notes |
|---|---|---|
| Customer (B2B) | Often yes | Account/Contact is natural fit |
| Customer (B2C) | Sometimes | Person Accounts or Data Cloud |
| Product | Sometimes | CPQ scenarios; otherwise ERP |
| Employee | Rarely | HR systems (Workday, SAP HCM) are better fit |
| Financial data | No | ERP is the master |
| Inventory | No | ERP/WMS is the master |
Data Lifecycle Management
Every record has a lifecycle. Design for the full journey, not just creation.
Lifecycle Stages
Create
- Define data entry standards (required fields, validation rules, dependent picklists)
- Integration-created records need quality controls (field mapping validation, dedup)
- Bulk imports need pre-load quality checks
Maintain
- Ongoing enrichment (address verification, firmographic data)
- Periodic deduplication scans
- Data steward reviews and corrections
- Automation to flag stale records (e.g., Account not modified in 12 months)
Archive
- Move aged data to Big Objects, external storage, or Data Cloud
- Maintain reference access for compliance
- See Large Data Volumes for archival strategies
Delete
- Soft delete: Records go to Recycle Bin (recoverable for 15 days)
- Hard delete: Permanent removal (Bulk API with hardDelete option)
- GDPR right to erasure: Must be able to permanently delete all personal data for a data subject
- Document deletion policies and audit trails
Data Retention Policies
Retention policies define how long different data types must be kept. Business requirements, legal obligations, and compliance mandates drive these decisions.
Designing Retention Policies
| Data Category | Typical Retention | Driving Factor |
|---|---|---|
| Active customer records | Indefinite while customer active | Business need |
| Closed opportunities (won) | 5-7 years | Financial audit |
| Closed opportunities (lost) | 1-2 years | Sales analytics |
| Support cases | 3-5 years | Service quality, legal |
| Email messages | 1-3 years | Communication audit |
| Audit trail (field history) | 18-24 months on-platform | Compliance |
| Task/Event activities | 1-2 years | Business need |
| Debug/error logs | 30-90 days | Operational |
Retention vs archival
Retention defines how long data must exist somewhere. Archival defines where it lives after leaving the active database. A record can be archived (moved to Big Object) while still meeting its retention requirement.
Data Classification Framework
Data classification drives encryption, access control, retention, and compliance decisions. Establish classification tiers during data model design, not as a retrofit.
| Tier | Examples | Security Controls |
|---|---|---|
| Restricted | SSN, credit card, health records | Shield Encryption, FLS, audit trail, right-to-erasure |
| Confidential | Salary, revenue, pricing strategy | FLS, sharing rules, data masking in sandboxes |
| Internal | Employee IDs, internal notes | Role-based access, standard sharing model |
| Public | Product names, company address | Portal/community visible, no restrictions |
Data Governance Process Flow
Governance is not a one-time setup. It is an ongoing operational process with defined roles, cadences, and escalation paths.
Data Stewardship Model
Data stewardship assigns accountability for data quality to specific people or roles.
Stewardship Roles
| Role | Responsibility |
|---|---|
| Data Owner | Business executive accountable for data quality decisions |
| Data Steward | Hands-on responsibility for monitoring and correcting data |
| Data Custodian | Technical team managing data storage, security, and access |
| Data Consumer | End users who rely on data quality for their work |
Stewardship Processes
- Regular data quality reviews: Monthly or quarterly steward reviews of quality dashboards
- Issue resolution workflow: Process for reporting and fixing data quality issues
- Change management: Stewards approve changes to data standards, picklist values, record types
- Training: Ongoing user training on data entry standards
Compliance
GDPR and Data Privacy
The General Data Protection Regulation (and similar privacy laws) imposes specific requirements on Salesforce data architecture:
| GDPR Right | Salesforce Implementation |
|---|---|
| Right to access | Data export, reports, customer portals |
| Right to rectification | Standard edit capabilities, community self-service |
| Right to erasure | Hard delete, field-level encryption with key destruction |
| Right to portability | Data export in machine-readable format (CSV, JSON) |
| Right to restriction | Record-level flags, process exclusion logic |
| Consent management | Custom objects or Salesforce Privacy Center |
Data Residency
Some regulations require data to remain within specific geographic boundaries:
- Salesforce data residency: Data stored in the instance region (NA, EU, AP)
- Hyperforce: Enables deployment in specific public cloud regions
- Encryption: Salesforce Shield Platform Encryption with Bring Your Own Key (BYOK)
- Cross-border transfers: Documented in Salesforce’s Data Processing Addendum
Salesforce Shield
Shield provides compliance-focused features:
| Feature | Purpose |
|---|---|
| Platform Encryption | Encrypt data at rest for sensitive fields |
| Event Monitoring | Track user behavior and API activity |
| Field Audit Trail | Retain field history beyond standard 18-month limit (up to 10 years) |
Encryption trade-offs
Encrypting fields with Shield Platform Encryption disables certain features: sorting, filtering in some contexts, formula field references, and deterministic functions. Select which fields to encrypt based on sensitivity classification, not on a blanket “encrypt everything” approach.
Data Quality Metrics Dashboard
Design a data quality dashboard that stewards review regularly:
| Metric | Measurement | Target |
|---|---|---|
| Duplicate rate | % of records flagged as duplicates | < 5% |
| Completeness score | Avg % of required fields populated | > 90% |
| Stale records | Records not modified in 12+ months | < 20% of active records |
| Orphan records | Child records with broken lookups | 0% |
| Invalid values | Records failing validation logic | < 2% |
| Integration errors | Failed integration record creates/updates | < 1% |
Related Topics
- Shield Encryption: data classification drives encryption decisions and field-level security
- Sharing Model: data sensitivity tiers influence OWD settings and sharing rules
- Integration Patterns: data quality at integration boundaries affects reliability and error rates
- Data Migration: pre-migration profiling is a data quality exercise
- Large Data Volumes: archival is both an LDV strategy and a governance activity
- Development Lifecycle: data governance is part of organizational change management
- Declarative vs Programmatic: validation rules and duplicate rules are declarative quality controls
Sources
- Salesforce Architects: Data 360 Architecture
- Salesforce Architect: Well-Architected Framework - Compliant
- Salesforce Help: Duplicate Rules Overview
- Salesforce Help: Standard Matching Rules
- Salesforce Help: Salesforce Shield
- Salesforce Help: Privacy Center
- GDPR Official Text: Chapter III (Rights of the Data Subject)
- CTA Study Guide: Data Domain - Governance
- DAMA DMBOK: Data Quality Management
Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.