Skip to content

Data Quality & Governance

Data quality and governance are ongoing disciplines, not one-time activities. A CTA must design solutions with built-in quality controls and governance frameworks because poor data quality undermines every downstream system — reports lie, integrations fail, and users lose trust in the platform.

Data Profiling

Data profiling is the first step in understanding what you are working with. It applies to both migration scenarios and ongoing data health.

Profiling Dimensions

DimensionWhat to MeasureRed Flags
Completeness% of required fields populatedKey fields < 80% populated
AccuracyValues match real-world realityStale addresses, wrong phone formats
ConsistencySame data represented the same way”US” vs “USA” vs “United States”
UniquenessNo unintended duplicatesDuplicate accounts, contacts
TimelinessData is current enough for its useLast modified > 2 years ago
ValidityValues conform to expected formatsDates in text fields, invalid emails

Profiling Tools

  • Salesforce Reports — Record counts, field completeness via formula fields
  • Data Loader exports — Export and analyze in Excel/Python for pattern detection
  • Third-party tools — Informatica Data Quality, DemandTools, Validity (RingLead)
  • Einstein Analytics / CRM Analytics — Dashboard-based data quality monitoring
  • Apex scripts — Custom profiling for complex business rules

Deduplication

Duplicates are the most visible data quality problem. Salesforce provides native deduplication tools, but a CTA must design a comprehensive strategy.

Native Salesforce Dedup

Matching Rules

Matching rules define when two records are considered potential duplicates.

ComponentDescription
Matching methodExact or Fuzzy
Matching criteriaFields to compare (Name, Email, Phone, Address)
Match keyCombination of fields that trigger comparison
Blank fieldsHow to handle nulls (match or skip)

Standard matching rules exist for Account, Contact, and Lead. Custom matching rules can be created for any object.

Duplicate Rules

Duplicate rules define what happens when a match is found:

ActionEffect
AlertWarn the user but allow save
BlockPrevent the record from being saved
ReportLog the duplicate for later review

Alert vs Block trade-off

Blocking duplicates protects data quality but frustrates users and can block legitimate records (false positives). Alerting preserves user productivity but relies on users making good decisions. Most CTA solutions recommend alert with reporting, then gradually tighten to block as matching rules prove accurate.

Third-Party Deduplication

For enterprise-scale deduplication, native tools may not suffice:

ToolCapability
DemandTools (Validity)Mass dedup, merge, standardization
CloudingoAutomated dedup with scheduling
RingLeadReal-time and batch dedup
InformaticaEnterprise MDM with fuzzy matching
DupeCatcherFree AppExchange duplicate prevention

Dedup Strategy Layers

graph TD
    A[Data Entry] --> B[Real-time Prevention]
    B --> C{Duplicate found?}
    C -->|Yes| D[Alert or Block user]
    C -->|No| E[Record saved]

    F[Batch Process] --> G[Scheduled Dedup Scan]
    G --> H[Review duplicate sets]
    H --> I[Merge or dismiss]

    J[Integration] --> K[Pre-load dedup check]
    K --> L{Match found?}
    L -->|Yes| M[Update existing record]
    L -->|No| N[Insert new record]

Master Data Management

MDM ensures that critical business entities (customers, products, employees) have a single, authoritative source of truth across all systems.

MDM Approaches

ApproachDescriptionWhen to Use
RegistryEach system maintains its own copy; a central registry maps IDsLow integration maturity, many legacy systems
ConsolidationData is copied to a master hub for reporting, not written backRead-only analytics, data warehouse model
CoexistenceMultiple systems share and synchronize master dataMultiple systems of record per entity
CentralizedOne system is the master; others are consumersClear system of record exists (e.g., Salesforce for customers)

Salesforce as MDM Hub

Salesforce can serve as the master for customer data (Account, Contact) but is rarely the right choice for all entity types:

EntitySalesforce as Master?Notes
Customer (B2B)Often yesAccount/Contact is natural fit
Customer (B2C)SometimesPerson Accounts or Data Cloud
ProductSometimesCPQ scenarios; otherwise ERP
EmployeeRarelyHR systems (Workday, SAP HCM) are better fit
Financial dataNoERP is the master
InventoryNoERP/WMS is the master

Data Lifecycle Management

Every record has a lifecycle. A CTA must design for the full journey, not just creation.

graph LR
    A[Create] --> B[Maintain]
    B --> C[Archive]
    C --> D[Delete]
    B --> D

    A -->|Data entry, import,<br/>integration| A
    B -->|Update, enrich,<br/>deduplicate| B
    C -->|Move to Big Object,<br/>external storage| C
    D -->|Soft delete, hard delete,<br/>data destruction| D

Lifecycle Stages

Create

  • Define data entry standards (required fields, validation rules, dependent picklists)
  • Integration-created records need quality controls (field mapping validation, dedup)
  • Bulk imports need pre-load quality checks

Maintain

  • Ongoing enrichment (address verification, firmographic data)
  • Periodic deduplication scans
  • Data steward reviews and corrections
  • Automation to flag stale records (e.g., Account not modified in 12 months)

Archive

  • Move aged data to Big Objects, external storage, or Data Cloud
  • Maintain reference access for compliance
  • See Large Data Volumes for archival strategies

Delete

  • Soft delete — Records go to Recycle Bin (recoverable for 15 days)
  • Hard delete — Permanent removal (Bulk API with hardDelete option)
  • GDPR right to erasure — Must be able to permanently delete all personal data for a data subject
  • Document deletion policies and audit trails

Data Retention Policies

Retention policies define how long different data types must be kept. These are driven by business requirements, legal obligations, and compliance mandates.

Designing Retention Policies

Data CategoryTypical RetentionDriving Factor
Active customer recordsIndefinite while customer activeBusiness need
Closed opportunities (won)5-7 yearsFinancial audit
Closed opportunities (lost)1-2 yearsSales analytics
Support cases3-5 yearsService quality, legal
Email messages1-3 yearsCommunication audit
Audit trail (field history)18-24 months on-platformCompliance
Task/Event activities1-2 yearsBusiness need
Debug/error logs30-90 daysOperational

Retention vs archival

Retention defines how long data must exist somewhere. Archival defines where it lives after leaving the active database. A record can be archived (moved to Big Object) while still meeting its retention requirement.


Data Classification Framework

Data classification drives encryption, access control, retention, and compliance decisions. A CTA must establish classification tiers during data model design, not retrofit them later.

graph TD
    A[New Data Element<br/>Identified] --> B{Contains personally<br/>identifiable info PII?}
    B -->|Yes| C{Regulated by<br/>GDPR/CCPA/HIPAA?}
    B -->|No| D{Business-sensitive<br/>financial, strategic?}
    C -->|Yes| E[RESTRICTED<br/>Shield Encryption, FLS,<br/>audit trail, erasure support]
    C -->|No| F[CONFIDENTIAL<br/>FLS restrictions,<br/>sharing rules, masking]
    D -->|Yes| F
    D -->|No| G{Internal only<br/>or publicly shareable?}
    G -->|Internal| H[INTERNAL<br/>Standard security,<br/>role-based access]
    G -->|Public| I[PUBLIC<br/>No restrictions,<br/>communities/portals OK]

    style E fill:#ff6b6b,color:#fff
    style F fill:#ffd43b,color:#333
    style H fill:#4c6ef5,color:#fff
    style I fill:#51cf66,color:#fff
TierExamplesSecurity Controls
RestrictedSSN, credit card, health recordsShield Encryption, FLS, audit trail, right-to-erasure
ConfidentialSalary, revenue, pricing strategyFLS, sharing rules, data masking in sandboxes
InternalEmployee IDs, internal notesRole-based access, standard sharing model
PublicProduct names, company addressPortal/community visible, no restrictions

Data Governance Process Flow

Governance is not a one-time setup — it is an ongoing operational process with defined roles, cadences, and escalation paths.

graph TD
    subgraph Identify["1. Identify"]
        I1[Data quality issue<br/>detected]
        I2[Automated alert<br/>dashboard threshold]
        I3[User-reported<br/>data problem]
    end

    subgraph Assess["2. Assess"]
        A1[Data steward<br/>evaluates severity]
        A2{Impact level?}
    end

    subgraph Resolve["3. Resolve"]
        R1[Low: Steward<br/>corrects directly]
        R2[Medium: Assign to<br/>data custodian team]
        R3[High: Escalate to<br/>data owner for decision]
    end

    subgraph Prevent["4. Prevent"]
        P1[Update validation rules]
        P2[Adjust matching rules]
        P3[Update training materials]
        P4[Add monitoring metric]
    end

    I1 --> A1
    I2 --> A1
    I3 --> A1
    A1 --> A2
    A2 -->|Low| R1
    A2 -->|Medium| R2
    A2 -->|High| R3
    R1 --> Prevent
    R2 --> Prevent
    R3 --> Prevent

Data Stewardship Model

Data stewardship assigns accountability for data quality to specific people or roles.

Stewardship Roles

RoleResponsibility
Data OwnerBusiness executive accountable for data quality decisions
Data StewardHands-on responsibility for monitoring and correcting data
Data CustodianTechnical team managing data storage, security, and access
Data ConsumerEnd users who rely on data quality for their work

Stewardship Processes

  • Regular data quality reviews — Monthly or quarterly steward reviews of quality dashboards
  • Issue resolution workflow — Process for reporting and fixing data quality issues
  • Change management — Stewards approve changes to data standards, picklist values, record types
  • Training — Ongoing user training on data entry standards

Compliance

GDPR and Data Privacy

The General Data Protection Regulation (and similar privacy laws) imposes specific requirements on Salesforce data architecture:

GDPR RightSalesforce Implementation
Right to accessData export, reports, customer portals
Right to rectificationStandard edit capabilities, community self-service
Right to erasureHard delete, field-level encryption with key destruction
Right to portabilityData export in machine-readable format (CSV, JSON)
Right to restrictionRecord-level flags, process exclusion logic
Consent managementCustom objects or Salesforce Privacy Center

Data Residency

Some regulations require data to remain within specific geographic boundaries:

  • Salesforce data residency — Data stored in the instance region (NA, EU, AP)
  • Hyperforce — Enables deployment in specific public cloud regions
  • Encryption — Salesforce Shield Platform Encryption with Bring Your Own Key (BYOK)
  • Cross-border transfers — Documented in Salesforce’s Data Processing Addendum

Salesforce Shield

Shield provides compliance-focused features:

FeaturePurpose
Platform EncryptionEncrypt data at rest for sensitive fields
Event MonitoringTrack user behavior and API activity
Field Audit TrailRetain field history beyond standard 18-month limit (up to 10 years)

Encryption trade-offs

Encrypting fields with Shield Platform Encryption disables certain features: sorting, filtering in some contexts, formula field references, and deterministic functions. A CTA must carefully select which fields to encrypt based on sensitivity classification, not encrypt everything “just in case.”


Data Quality Metrics Dashboard

Design a data quality dashboard that stewards review regularly:

MetricMeasurementTarget
Duplicate rate% of records flagged as duplicates< 5%
Completeness scoreAvg % of required fields populated> 90%
Stale recordsRecords not modified in 12+ months< 20% of active records
Orphan recordsChild records with broken lookups0%
Invalid valuesRecords failing validation logic< 2%
Integration errorsFailed integration record creates/updates< 1%

Cross-Domain Impact

  • Security — Data classification drives encryption and access control (Security)
  • Integration — Data quality affects integration reliability (Integration)
  • Migration — Pre-migration profiling is a data quality exercise (Data Migration)
  • LDV — Archival is both an LDV strategy and a governance activity (Large Data Volumes)
  • Dev Lifecycle — Data governance is part of organizational change management (Development Lifecycle)

Sources