Skip to content

Data Quality & Governance

Data quality and governance are ongoing disciplines, not one-time activities. Poor data quality undermines every downstream system: reports lie, integrations fail, and users lose trust in the platform.

Data Profiling

Profiling is the first step in understanding the data. It applies to both migration scenarios and ongoing data health monitoring.

Profiling Dimensions

DimensionWhat to MeasureRed Flags
Completeness% of required fields populatedKey fields < 80% populated
AccuracyValues match real-world realityStale addresses, wrong phone formats
ConsistencySame data represented the same way”US” vs “USA” vs “United States”
UniquenessNo unintended duplicatesDuplicate accounts, contacts
TimelinessData is current enough for its useLast modified > 2 years ago
ValidityValues conform to expected formatsDates in text fields, invalid emails

Profiling Tools

  • Salesforce Reports: Record counts, field completeness via formula fields
  • Data Loader exports: Export and analyze in Excel/Python for pattern detection
  • Third-party tools: Informatica Data Quality, DemandTools, Validity (RingLead)
  • Einstein Analytics / CRM Analytics (formerly Tableau CRM): Dashboard-based data quality monitoring
  • Apex scripts: Custom profiling for complex business rules

Deduplication

Duplicates are the most visible data quality problem. Salesforce provides native deduplication tools, but a CTA must design a broader strategy.

Native Salesforce Dedup

Matching Rules

Matching rules define when two records are considered potential duplicates.

ComponentDescription
Matching methodExact or Fuzzy
Matching criteriaFields to compare (Name, Email, Phone, Address)
Match keyCombination of fields that trigger comparison
Blank fieldsHow to handle nulls (match or skip)

Standard matching rules exist for Account, Contact, and Lead. Custom matching rules can be created for any object.

Duplicate Rules

Duplicate rules define what happens when a match is found:

ActionEffect
AlertWarn the user but allow save
BlockPrevent the record from being saved
ReportLog the duplicate for later review

Alert vs Block trade-off

Blocking duplicates protects data quality but frustrates users and can block legitimate records (false positives). Alerting preserves user productivity but relies on users making good decisions. Most CTA solutions recommend starting with alert plus reporting, then gradually tightening to block as matching rules prove accurate.

Third-Party Deduplication

For enterprise-scale deduplication, native tools may fall short:

ToolCapability
DemandTools (Validity)Mass dedup, merge, standardization
CloudingoAutomated dedup with scheduling
RingLeadReal-time and batch dedup
InformaticaEnterprise MDM with fuzzy matching
DupeCatcherFree AppExchange duplicate prevention

Dedup Strategy Layers

Diagram showing real-time prevention at data entry, scheduled batch deduplication scans, and pre-load dedup checks at integration boundaries as complementary layers of a dedup strategy.
Figure 1. Effective deduplication requires three coordinated layers: real-time prevention at the point of data entry, scheduled batch scans to catch duplicates that slip through, and pre-load checks at integration boundaries to upsert rather than insert when a match exists. No single layer catches everything on its own.

Master Data Management

MDM ensures that critical business entities (customers, products, employees) have a single, authoritative source of truth across all systems.

MDM Approaches

ApproachDescriptionWhen to Use
RegistryEach system maintains its own copy; a central registry maps IDsLow integration maturity, many legacy systems
ConsolidationData is copied to a master hub for reporting, not written backRead-only analytics, data warehouse model
CoexistenceMultiple systems share and synchronize master dataMultiple systems of record per entity
CentralizedOne system is the master; others are consumersClear system of record exists (e.g., Salesforce for customers)

Salesforce as MDM Hub

Salesforce can serve as the master for customer data (Account, Contact) but is rarely the right choice for all entity types:

EntitySalesforce as Master?Notes
Customer (B2B)Often yesAccount/Contact is natural fit
Customer (B2C)SometimesPerson Accounts or Data Cloud
ProductSometimesCPQ scenarios; otherwise ERP
EmployeeRarelyHR systems (Workday, SAP HCM) are better fit
Financial dataNoERP is the master
InventoryNoERP/WMS is the master

Data Lifecycle Management

Every record has a lifecycle. Design for the full journey, not just creation.

Linear flow of the data lifecycle from creation through maintenance, archival, and deletion, with annotated activities at each stage including enrichment, Big Object archival, and hard delete.
Figure 2. Data lifecycle management defines what happens to every record as it ages. Records that bypass the Archive stage and go directly from Maintain to Delete require careful governance, because hard deletes are irreversible and must comply with any applicable retention obligations before execution.

Lifecycle Stages

Create

  • Define data entry standards (required fields, validation rules, dependent picklists)
  • Integration-created records need quality controls (field mapping validation, dedup)
  • Bulk imports need pre-load quality checks

Maintain

  • Ongoing enrichment (address verification, firmographic data)
  • Periodic deduplication scans
  • Data steward reviews and corrections
  • Automation to flag stale records (e.g., Account not modified in 12 months)

Archive

  • Move aged data to Big Objects, external storage, or Data Cloud
  • Maintain reference access for compliance
  • See Large Data Volumes for archival strategies

Delete

  • Soft delete: Records go to Recycle Bin (recoverable for 15 days)
  • Hard delete: Permanent removal (Bulk API with hardDelete option)
  • GDPR right to erasure: Must be able to permanently delete all personal data for a data subject
  • Document deletion policies and audit trails

Data Retention Policies

Retention policies define how long different data types must be kept. Business requirements, legal obligations, and compliance mandates drive these decisions.

Designing Retention Policies

Data CategoryTypical RetentionDriving Factor
Active customer recordsIndefinite while customer activeBusiness need
Closed opportunities (won)5-7 yearsFinancial audit
Closed opportunities (lost)1-2 yearsSales analytics
Support cases3-5 yearsService quality, legal
Email messages1-3 yearsCommunication audit
Audit trail (field history)18-24 months on-platformCompliance
Task/Event activities1-2 yearsBusiness need
Debug/error logs30-90 daysOperational

Retention vs archival

Retention defines how long data must exist somewhere. Archival defines where it lives after leaving the active database. A record can be archived (moved to Big Object) while still meeting its retention requirement.


Data Classification Framework

Data classification drives encryption, access control, retention, and compliance decisions. Establish classification tiers during data model design, not as a retrofit.

Decision tree classifying new data elements into Restricted, Confidential, Internal, or Public tiers based on PII content, regulatory scope, and business sensitivity.
Figure 3. Data classification drives encryption, access control, and retention decisions. Establishing classification tiers during data model design prevents costly retrofits. Adding Shield Platform Encryption to a field after data exists requires re-encryption of all existing records and disables certain query and formula capabilities.
TierExamplesSecurity Controls
RestrictedSSN, credit card, health recordsShield Encryption, FLS, audit trail, right-to-erasure
ConfidentialSalary, revenue, pricing strategyFLS, sharing rules, data masking in sandboxes
InternalEmployee IDs, internal notesRole-based access, standard sharing model
PublicProduct names, company addressPortal/community visible, no restrictions

Data Governance Process Flow

Governance is not a one-time setup. It is an ongoing operational process with defined roles, cadences, and escalation paths.

Four-stage governance workflow routing detected data quality issues through steward assessment, impact-based resolution tiers, and systematic prevention steps to reduce recurrence.
Figure 4. Every data quality issue should close with a prevention step, not just a fix. Updating validation rules, tightening matching criteria, or adding a monitoring metric closes the loop so the same issue does not recur. Governance without prevention is just repeated remediation.

Data Stewardship Model

Data stewardship assigns accountability for data quality to specific people or roles.

Stewardship Roles

RoleResponsibility
Data OwnerBusiness executive accountable for data quality decisions
Data StewardHands-on responsibility for monitoring and correcting data
Data CustodianTechnical team managing data storage, security, and access
Data ConsumerEnd users who rely on data quality for their work

Stewardship Processes

  • Regular data quality reviews: Monthly or quarterly steward reviews of quality dashboards
  • Issue resolution workflow: Process for reporting and fixing data quality issues
  • Change management: Stewards approve changes to data standards, picklist values, record types
  • Training: Ongoing user training on data entry standards

Compliance

GDPR and Data Privacy

The General Data Protection Regulation (and similar privacy laws) imposes specific requirements on Salesforce data architecture:

GDPR RightSalesforce Implementation
Right to accessData export, reports, customer portals
Right to rectificationStandard edit capabilities, community self-service
Right to erasureHard delete, field-level encryption with key destruction
Right to portabilityData export in machine-readable format (CSV, JSON)
Right to restrictionRecord-level flags, process exclusion logic
Consent managementCustom objects or Salesforce Privacy Center

Data Residency

Some regulations require data to remain within specific geographic boundaries:

  • Salesforce data residency: Data stored in the instance region (NA, EU, AP)
  • Hyperforce: Enables deployment in specific public cloud regions
  • Encryption: Salesforce Shield Platform Encryption with Bring Your Own Key (BYOK)
  • Cross-border transfers: Documented in Salesforce’s Data Processing Addendum

Salesforce Shield

Shield provides compliance-focused features:

FeaturePurpose
Platform EncryptionEncrypt data at rest for sensitive fields
Event MonitoringTrack user behavior and API activity
Field Audit TrailRetain field history beyond standard 18-month limit (up to 10 years)

Encryption trade-offs

Encrypting fields with Shield Platform Encryption disables certain features: sorting, filtering in some contexts, formula field references, and deterministic functions. Select which fields to encrypt based on sensitivity classification, not on a blanket “encrypt everything” approach.


Data Quality Metrics Dashboard

Design a data quality dashboard that stewards review regularly:

MetricMeasurementTarget
Duplicate rate% of records flagged as duplicates< 5%
Completeness scoreAvg % of required fields populated> 90%
Stale recordsRecords not modified in 12+ months< 20% of active records
Orphan recordsChild records with broken lookups0%
Invalid valuesRecords failing validation logic< 2%
Integration errorsFailed integration record creates/updates< 1%


Sources

Personal study notes for the Salesforce CTA exam. Content compiled from VJ's study notes, official Salesforce documentation, community sources, and online publicly available content, then organized and presented with AI assistance. Not affiliated with Salesforce. © 2025–2026 VJ Srivastava.