Data Quality & Governance
Data quality and governance are ongoing disciplines, not one-time activities. A CTA must design solutions with built-in quality controls and governance frameworks because poor data quality undermines every downstream system — reports lie, integrations fail, and users lose trust in the platform.
Data Profiling
Data profiling is the first step in understanding what you are working with. It applies to both migration scenarios and ongoing data health.
Profiling Dimensions
| Dimension | What to Measure | Red Flags |
|---|---|---|
| Completeness | % of required fields populated | Key fields < 80% populated |
| Accuracy | Values match real-world reality | Stale addresses, wrong phone formats |
| Consistency | Same data represented the same way | ”US” vs “USA” vs “United States” |
| Uniqueness | No unintended duplicates | Duplicate accounts, contacts |
| Timeliness | Data is current enough for its use | Last modified > 2 years ago |
| Validity | Values conform to expected formats | Dates in text fields, invalid emails |
Profiling Tools
- Salesforce Reports — Record counts, field completeness via formula fields
- Data Loader exports — Export and analyze in Excel/Python for pattern detection
- Third-party tools — Informatica Data Quality, DemandTools, Validity (RingLead)
- Einstein Analytics / CRM Analytics — Dashboard-based data quality monitoring
- Apex scripts — Custom profiling for complex business rules
Deduplication
Duplicates are the most visible data quality problem. Salesforce provides native deduplication tools, but a CTA must design a comprehensive strategy.
Native Salesforce Dedup
Matching Rules
Matching rules define when two records are considered potential duplicates.
| Component | Description |
|---|---|
| Matching method | Exact or Fuzzy |
| Matching criteria | Fields to compare (Name, Email, Phone, Address) |
| Match key | Combination of fields that trigger comparison |
| Blank fields | How to handle nulls (match or skip) |
Standard matching rules exist for Account, Contact, and Lead. Custom matching rules can be created for any object.
Duplicate Rules
Duplicate rules define what happens when a match is found:
| Action | Effect |
|---|---|
| Alert | Warn the user but allow save |
| Block | Prevent the record from being saved |
| Report | Log the duplicate for later review |
Alert vs Block trade-off
Blocking duplicates protects data quality but frustrates users and can block legitimate records (false positives). Alerting preserves user productivity but relies on users making good decisions. Most CTA solutions recommend alert with reporting, then gradually tighten to block as matching rules prove accurate.
Third-Party Deduplication
For enterprise-scale deduplication, native tools may not suffice:
| Tool | Capability |
|---|---|
| DemandTools (Validity) | Mass dedup, merge, standardization |
| Cloudingo | Automated dedup with scheduling |
| RingLead | Real-time and batch dedup |
| Informatica | Enterprise MDM with fuzzy matching |
| DupeCatcher | Free AppExchange duplicate prevention |
Dedup Strategy Layers
graph TD
A[Data Entry] --> B[Real-time Prevention]
B --> C{Duplicate found?}
C -->|Yes| D[Alert or Block user]
C -->|No| E[Record saved]
F[Batch Process] --> G[Scheduled Dedup Scan]
G --> H[Review duplicate sets]
H --> I[Merge or dismiss]
J[Integration] --> K[Pre-load dedup check]
K --> L{Match found?}
L -->|Yes| M[Update existing record]
L -->|No| N[Insert new record]
Master Data Management
MDM ensures that critical business entities (customers, products, employees) have a single, authoritative source of truth across all systems.
MDM Approaches
| Approach | Description | When to Use |
|---|---|---|
| Registry | Each system maintains its own copy; a central registry maps IDs | Low integration maturity, many legacy systems |
| Consolidation | Data is copied to a master hub for reporting, not written back | Read-only analytics, data warehouse model |
| Coexistence | Multiple systems share and synchronize master data | Multiple systems of record per entity |
| Centralized | One system is the master; others are consumers | Clear system of record exists (e.g., Salesforce for customers) |
Salesforce as MDM Hub
Salesforce can serve as the master for customer data (Account, Contact) but is rarely the right choice for all entity types:
| Entity | Salesforce as Master? | Notes |
|---|---|---|
| Customer (B2B) | Often yes | Account/Contact is natural fit |
| Customer (B2C) | Sometimes | Person Accounts or Data Cloud |
| Product | Sometimes | CPQ scenarios; otherwise ERP |
| Employee | Rarely | HR systems (Workday, SAP HCM) are better fit |
| Financial data | No | ERP is the master |
| Inventory | No | ERP/WMS is the master |
Data Lifecycle Management
Every record has a lifecycle. A CTA must design for the full journey, not just creation.
graph LR
A[Create] --> B[Maintain]
B --> C[Archive]
C --> D[Delete]
B --> D
A -->|Data entry, import,<br/>integration| A
B -->|Update, enrich,<br/>deduplicate| B
C -->|Move to Big Object,<br/>external storage| C
D -->|Soft delete, hard delete,<br/>data destruction| D
Lifecycle Stages
Create
- Define data entry standards (required fields, validation rules, dependent picklists)
- Integration-created records need quality controls (field mapping validation, dedup)
- Bulk imports need pre-load quality checks
Maintain
- Ongoing enrichment (address verification, firmographic data)
- Periodic deduplication scans
- Data steward reviews and corrections
- Automation to flag stale records (e.g., Account not modified in 12 months)
Archive
- Move aged data to Big Objects, external storage, or Data Cloud
- Maintain reference access for compliance
- See Large Data Volumes for archival strategies
Delete
- Soft delete — Records go to Recycle Bin (recoverable for 15 days)
- Hard delete — Permanent removal (Bulk API with hardDelete option)
- GDPR right to erasure — Must be able to permanently delete all personal data for a data subject
- Document deletion policies and audit trails
Data Retention Policies
Retention policies define how long different data types must be kept. These are driven by business requirements, legal obligations, and compliance mandates.
Designing Retention Policies
| Data Category | Typical Retention | Driving Factor |
|---|---|---|
| Active customer records | Indefinite while customer active | Business need |
| Closed opportunities (won) | 5-7 years | Financial audit |
| Closed opportunities (lost) | 1-2 years | Sales analytics |
| Support cases | 3-5 years | Service quality, legal |
| Email messages | 1-3 years | Communication audit |
| Audit trail (field history) | 18-24 months on-platform | Compliance |
| Task/Event activities | 1-2 years | Business need |
| Debug/error logs | 30-90 days | Operational |
Retention vs archival
Retention defines how long data must exist somewhere. Archival defines where it lives after leaving the active database. A record can be archived (moved to Big Object) while still meeting its retention requirement.
Data Classification Framework
Data classification drives encryption, access control, retention, and compliance decisions. A CTA must establish classification tiers during data model design, not retrofit them later.
graph TD
A[New Data Element<br/>Identified] --> B{Contains personally<br/>identifiable info PII?}
B -->|Yes| C{Regulated by<br/>GDPR/CCPA/HIPAA?}
B -->|No| D{Business-sensitive<br/>financial, strategic?}
C -->|Yes| E[RESTRICTED<br/>Shield Encryption, FLS,<br/>audit trail, erasure support]
C -->|No| F[CONFIDENTIAL<br/>FLS restrictions,<br/>sharing rules, masking]
D -->|Yes| F
D -->|No| G{Internal only<br/>or publicly shareable?}
G -->|Internal| H[INTERNAL<br/>Standard security,<br/>role-based access]
G -->|Public| I[PUBLIC<br/>No restrictions,<br/>communities/portals OK]
style E fill:#ff6b6b,color:#fff
style F fill:#ffd43b,color:#333
style H fill:#4c6ef5,color:#fff
style I fill:#51cf66,color:#fff
| Tier | Examples | Security Controls |
|---|---|---|
| Restricted | SSN, credit card, health records | Shield Encryption, FLS, audit trail, right-to-erasure |
| Confidential | Salary, revenue, pricing strategy | FLS, sharing rules, data masking in sandboxes |
| Internal | Employee IDs, internal notes | Role-based access, standard sharing model |
| Public | Product names, company address | Portal/community visible, no restrictions |
Data Governance Process Flow
Governance is not a one-time setup — it is an ongoing operational process with defined roles, cadences, and escalation paths.
graph TD
subgraph Identify["1. Identify"]
I1[Data quality issue<br/>detected]
I2[Automated alert<br/>dashboard threshold]
I3[User-reported<br/>data problem]
end
subgraph Assess["2. Assess"]
A1[Data steward<br/>evaluates severity]
A2{Impact level?}
end
subgraph Resolve["3. Resolve"]
R1[Low: Steward<br/>corrects directly]
R2[Medium: Assign to<br/>data custodian team]
R3[High: Escalate to<br/>data owner for decision]
end
subgraph Prevent["4. Prevent"]
P1[Update validation rules]
P2[Adjust matching rules]
P3[Update training materials]
P4[Add monitoring metric]
end
I1 --> A1
I2 --> A1
I3 --> A1
A1 --> A2
A2 -->|Low| R1
A2 -->|Medium| R2
A2 -->|High| R3
R1 --> Prevent
R2 --> Prevent
R3 --> Prevent
Data Stewardship Model
Data stewardship assigns accountability for data quality to specific people or roles.
Stewardship Roles
| Role | Responsibility |
|---|---|
| Data Owner | Business executive accountable for data quality decisions |
| Data Steward | Hands-on responsibility for monitoring and correcting data |
| Data Custodian | Technical team managing data storage, security, and access |
| Data Consumer | End users who rely on data quality for their work |
Stewardship Processes
- Regular data quality reviews — Monthly or quarterly steward reviews of quality dashboards
- Issue resolution workflow — Process for reporting and fixing data quality issues
- Change management — Stewards approve changes to data standards, picklist values, record types
- Training — Ongoing user training on data entry standards
Compliance
GDPR and Data Privacy
The General Data Protection Regulation (and similar privacy laws) imposes specific requirements on Salesforce data architecture:
| GDPR Right | Salesforce Implementation |
|---|---|
| Right to access | Data export, reports, customer portals |
| Right to rectification | Standard edit capabilities, community self-service |
| Right to erasure | Hard delete, field-level encryption with key destruction |
| Right to portability | Data export in machine-readable format (CSV, JSON) |
| Right to restriction | Record-level flags, process exclusion logic |
| Consent management | Custom objects or Salesforce Privacy Center |
Data Residency
Some regulations require data to remain within specific geographic boundaries:
- Salesforce data residency — Data stored in the instance region (NA, EU, AP)
- Hyperforce — Enables deployment in specific public cloud regions
- Encryption — Salesforce Shield Platform Encryption with Bring Your Own Key (BYOK)
- Cross-border transfers — Documented in Salesforce’s Data Processing Addendum
Salesforce Shield
Shield provides compliance-focused features:
| Feature | Purpose |
|---|---|
| Platform Encryption | Encrypt data at rest for sensitive fields |
| Event Monitoring | Track user behavior and API activity |
| Field Audit Trail | Retain field history beyond standard 18-month limit (up to 10 years) |
Encryption trade-offs
Encrypting fields with Shield Platform Encryption disables certain features: sorting, filtering in some contexts, formula field references, and deterministic functions. A CTA must carefully select which fields to encrypt based on sensitivity classification, not encrypt everything “just in case.”
Data Quality Metrics Dashboard
Design a data quality dashboard that stewards review regularly:
| Metric | Measurement | Target |
|---|---|---|
| Duplicate rate | % of records flagged as duplicates | < 5% |
| Completeness score | Avg % of required fields populated | > 90% |
| Stale records | Records not modified in 12+ months | < 20% of active records |
| Orphan records | Child records with broken lookups | 0% |
| Invalid values | Records failing validation logic | < 2% |
| Integration errors | Failed integration record creates/updates | < 1% |
Cross-Domain Impact
- Security — Data classification drives encryption and access control (Security)
- Integration — Data quality affects integration reliability (Integration)
- Migration — Pre-migration profiling is a data quality exercise (Data Migration)
- LDV — Archival is both an LDV strategy and a governance activity (Large Data Volumes)
- Dev Lifecycle — Data governance is part of organizational change management (Development Lifecycle)
Sources
- Salesforce Architects: Data 360 Architecture
- Salesforce Architect: Well-Architected Framework — Compliant
- Salesforce Help: Duplicate Rules Overview
- Salesforce Help: Standard Matching Rules
- Salesforce Help: Salesforce Shield
- Salesforce Help: Privacy Center
- GDPR Official Text: Chapter III (Rights of the Data Subject)
- CTA Study Guide: Data Domain — Governance
- DAMA DMBOK: Data Quality Management