@microsoft/duplicate-detective — Agent Skill

---
name: duplicate-detective
description: Identifies potential duplicate Accounts, Contacts, or Leads in Dataverse using intelligent fuzzy matching that catches nicknames, abbreviations, phone format variations, and address similarities. Use when user says "find duplicates", "check for duplicate accounts", "are there any duplicate contacts", "duplicate detection", "clean up duplicates", "merge duplicates", or "data quality check".
metadata:
  author: Dataverse
  version: 1.0.0
  category: data-quality
---

# Duplicate Detective

Sales and marketing teams often accumulate duplicate records due to data entry variations, imports from multiple sources, and inconsistent naming conventions. Standard Dataverse duplicate detection rules are limited (max 5 rules per table) and miss subtle duplicates. This skill provides intelligent duplicate detection to maintain data quality and prevent wasted effort on duplicate outreach.

## Instructions

### Step 1: Determine Scope of Duplicate Check
Ask the user or infer from context:
1. **Target Table:** Account, Contact, or Lead (or all three)
2. **Scope:** 
   - Full scan of all records
   - Specific subset (e.g., records created in last 30 days)
   - Check against a specific record (e.g., "Find duplicates of Contoso")
3. **Matching Threshold:** Strict (high confidence only) or Relaxed (include possible matches)

#### Step 2: Query Source Records
Based on scope, retrieve records to analyze:

**For Accounts:**
```
SELECT accountid, name, telephone1, telephone2, emailaddress1, 
       address1_line1, address1_city, address1_stateorprovince, address1_postalcode,
       websiteurl, numberofemployees, industrycode
FROM account 
WHERE statecode = 0
```

**For Contacts:**
```
SELECT contactid, firstname, lastname, fullname, emailaddress1, emailaddress2,
       telephone1, mobilephone, jobtitle, accountid,
       address1_line1, address1_city, address1_stateorprovince, address1_postalcode
FROM contact
WHERE statecode = 0
```

**For Leads:**
```
SELECT leadid, firstname, lastname, fullname, companyname, emailaddress1,
       telephone1, mobilephone, jobtitle, address1_line1, address1_city,
       address1_stateorprovince, address1_postalcode, websiteurl
FROM lead
WHERE statecode = 0
```

#### Step 3: Apply Fuzzy Matching Algorithms

**3.1 Company/Account Name Matching**
Detect duplicates even with variations:
- **Abbreviation Handling:** "International Business Machines" ↔ "IBM"
- **Legal Suffix Normalization:** "Contoso Inc." ↔ "Contoso Corporation" ↔ "Contoso LLC"
- **Common Abbreviations:** "Mfg" = "Manufacturing", "Intl" = "International"
- **Word Order Variations:** "Microsoft Corporation" ↔ "Corporation Microsoft"
- **Punctuation & Spacing:** "A.B.C. Company" ↔ "ABC Company"
- **The/A Prefix Handling:** "The Contoso Group" ↔ "Contoso Group"

**Matching Logic:**
1. Normalize both names (lowercase, remove punctuation, expand abbreviations)
2. Calculate string similarity using:
   - Levenshtein distance (edit distance)
   - Jaro-Winkler similarity (handles transpositions)
   - Token-based matching (matching words regardless of order)
3. Score: 90%+ = High confidence match, 75-89% = Possible match

**3.2 Person Name Matching**
Detect duplicate contacts/leads:
- **Nickname Resolution:** "Robert" ↔ "Bob" ↔ "Rob" ↔ "Bobby"
- **Name Variations:** "William" ↔ "Will" ↔ "Bill" ↔ "Billy"
- **Initial Matching:** "J. Smith" ↔ "John Smith"
- **Name Order:** "John Smith" ↔ "Smith, John"
- **Hyphenated Names:** "Mary Smith-Jones" ↔ "Mary Jones"
- **Middle Name Handling:** "John Michael Smith" ↔ "John Smith"

**Common Nickname Mappings:**
| Formal Name | Nicknames |
|-------------|-----------|
| Robert | Bob, Rob, Bobby, Robbie |
| William | Will, Bill, Billy, Willy |
| Richard | Rick, Rich, Dick |
| Michael | Mike, Mikey |
| James | Jim, Jimmy, Jamie |
| Elizabeth | Liz, Beth, Betty, Lizzy |
| Jennifer | Jen, Jenny |
| Katherine | Kate, Katie, Kathy, Cathy |

**3.3 Phone Number Matching**
Normalize and compare phone numbers:
- Strip all formatting: "(555) 123-4567" → "5551234567"
- Handle country code variations: "+1-555-123-4567" ↔ "555-123-4567"
- Match across fields: telephone1 vs mobilephone vs telephone2
- Consider area code changes for same base number

**Phone Normalization Steps:**
1. Remove all non-numeric characters
2. Remove leading country code if present (1 for US)
3. Compare last 10 digits for US numbers
4. Flag as match if any phone field matches

**3.4 Email Address Matching**
Detect email-based duplicates:
- **Domain Variations:** Gmail.com aliases (+ addressing)
- **Dot Variations:** "john.smith@gmail.com" ↔ "johnsmith@gmail.com"
- **Case Normalization:** Always compare lowercase
- **Typo Detection:** "john@gmial.com" likely = "john@gmail.com"
- **Corporate Email Patterns:** Match same domain = same company

**3.5 Address Matching**
Detect address duplicates with variations:
- **Street Abbreviations:** "Street" ↔ "St" ↔ "St."
- **Direction Abbreviations:** "North" ↔ "N" ↔ "N."
- **Building/Suite Handling:** "123 Main St Suite 100" ↔ "123 Main St"
- **Postal Code Normalization:** "12345-6789" ↔ "12345"
- **State Variations:** "California" ↔ "CA"

**Address Normalization:**
1. Standardize street type abbreviations
2. Standardize direction prefixes/suffixes
3. Remove suite/apt/unit information for base comparison
4. Normalize postal code to 5 digits
5. Standardize state to 2-letter code

#### Step 4: Score and Rank Potential Duplicates

**Composite Scoring Model:**
Calculate overall duplicate probability using weighted scores:

| Field Type | Weight (Accounts) | Weight (Contacts) | Weight (Leads) |
|------------|------------------|-------------------|----------------|
| Name | 35% | 30% | 30% |
| Email | 25% | 30% | 30% |
| Phone | 20% | 20% | 20% |
| Address | 15% | 15% | 15% |
| Website | 5% | - | 5% |
| Company (for contacts) | - | 5% | - |

**Score Thresholds:**
- **High Confidence (90-100%):** Almost certainly duplicates
- **Medium Confidence (75-89%):** Likely duplicates, review recommended
- **Low Confidence (60-74%):** Possible duplicates, investigate

#### Step 5: Group and Present Duplicate Sets

**Organize Results:**
```
Duplicate Set 1 (High Confidence - 95% match):
├── Record A: Contoso Inc. (accountid: xxx)
│   ├── Phone: (555) 123-4567
│   ├── Email: info@contoso.com
│   └── Address: 123 Main Street, Seattle, WA
└── Record B: Contoso Corporation (accountid: yyy)
    ├── Phone: 555.123.4567
    ├── Email: sales@contoso.com
    └── Address: 123 Main St, Seattle, WA 98101

Matching Signals:
- Company name: 92% similar (legal suffix variation)
- Phone: 100% match (format difference only)
- Address: 95% match (abbreviation variation)
```

#### Step 6: Provide Actionable Recommendations

**For Each Duplicate Set, Recommend:**

1. **Merge Recommendation:**
   - Identify the "surviving" record (more complete, older, or more activity)
   - List fields to preserve from each record
   - Identify child records that would need reassignment

2. **Activity Analysis:**
   - Count activities on each record
   - Identify which record has more recent engagement
   - Flag if merging would consolidate significant history

3. **Relationship Impact:**
   - List opportunities linked to each account
   - Count contacts under each account
   - Identify potential data loss risks

**Output Format:**
```
DUPLICATE DETECTION SUMMARY
===========================

Scan Parameters:
- Table: Account
- Records Scanned: 5,432
- Date Range: All active records

Results:
- High Confidence Duplicates: 23 sets (46 records)
- Medium Confidence Duplicates: 45 sets (98 records)
- Low Confidence Duplicates: 12 sets (26 records)

TOP PRIORITY DUPLICATES (High Confidence):

1. [Account] Contoso Inc. ↔ Contoso Corporation
   Match Score: 95%
   Recommendation: Merge into "Contoso Inc." (more activity)
   Impact: 3 opportunities, 12 contacts would be consolidated
   
2. [Contact] Bob Smith ↔ Robert Smith (both at Contoso)
   Match Score: 92%
   Recommendation: Merge into "Robert Smith" (more complete record)
   Impact: 8 activities would be consolidated

3. [Lead] ABC Manufacturing ↔ A.B.C. Mfg Company
   Match Score: 91%
   Recommendation: Review - may be parent/subsidiary
   Impact: Different addresses - verify relationship first
```

#### Step 7: Execute Merge (If Requested)

If user wants to proceed with merge:

1. **Document Current State:**
   - Export both records' data
   - List all child records

2. **Update Child Records:**
   - Reassign contacts to surviving account
   - Update opportunity regardingobjectid
   - Move activities to surviving record

3. **Merge Field Values:**
   - Keep most complete values
   - Concatenate notes/descriptions if both have content
   - Preserve all email addresses and phone numbers

4. **Deactivate Duplicate:**
   - Set statecode = 1 (Inactive) on duplicate record
   - Add note explaining it was merged

### Dataverse Tables Used
| Table | Purpose |
|-------|---------|
| `account` | Primary entity for company duplicates |
| `contact` | Primary entity for person duplicates |
| `lead` | Primary entity for lead duplicates |
| `opportunity` | Check for linked opportunities |
| `activitypointer` | Count activities on each record |
| `annotation` | Document merge decisions |

### Key Fields Reference
**account:**
- `name` (NVARCHAR 160) - Company name for matching
- `accountnumber` (NVARCHAR 20) - Account number for exact matching
- `telephone1`, `telephone2` (PHONE) - Phone matching
- `emailaddress1` (EMAIL) - Email matching
- `address1_line1`, `address1_city`, `address1_stateorprovince`, `address1_postalcode` - Address matching
- `websiteurl` (URL) - Website matching
- `parentaccountid` (LOOKUP → account) - Parent company relationship
- `statecode` (STATE) - Active(0), Inactive(1) - Filter active only

**contact:**
- `firstname`, `lastname` (NVARCHAR) - Name components
- `fullname` (NVARCHAR) - Calculated full name
- `emailaddress1`, `emailaddress2` (EMAIL) - Email matching
- `telephone1`, `mobilephone` (PHONE) - Phone matching
- `accountid` (LOOKUP → account) - Parent account
- `jobtitle` (NVARCHAR) - Additional matching signal
- `department` (NVARCHAR) - Department for disambiguation
- `statecode` (STATE) - Active(0), Inactive(1)

**lead:**
- `companyname` (NVARCHAR 100) - Company matching
- `firstname`, `lastname`, `fullname` (NVARCHAR) - Name matching
- `emailaddress1` (EMAIL) - Email matching
- `telephone1`, `mobilephone` (PHONE) - Phone matching
- `address1_line1`, `address1_city`, `address1_stateorprovince` - Address matching
- `websiteurl` (URL) - Website matching
- `statecode` (STATE) - Open(0), Qualified(1), Disqualified(2)
- `statuscode` (STATUS) - New(1), Contacted(2) [Open]; Qualified(3) [Qualified]; Lost(4), Cannot Contact(5), No Longer Interested(6), Canceled(7) [Disqualified]

### Fuzzy Matching Best Practices

1. **Always normalize before comparing:**
   - Lowercase all text
   - Remove punctuation
   - Standardize abbreviations

2. **Use multiple matching signals:**
   - Never flag as duplicate based on single field match
   - Require 2+ strong signals for high confidence

3. **Consider business context:**
   - Same company name in different cities may be branches
   - Same person name at different companies = different people
   - Parent/subsidiary relationships are not duplicates

4. **Handle false positives:**
   - "John Smith" is common - require additional signals
   - Generic company names need more verification
   - Allow user to mark "Not a Duplicate"
## Examples

### Example 1: Find Account Duplicates

**User says:** "Check for duplicate accounts in my CRM"

**Actions:**
1. Query all active accounts from Dataverse
2. Apply fuzzy matching on name, phone, email, address
3. Group potential duplicates by confidence score
4. Present high-confidence matches first

**Result:**
```
DUPLICATE DETECTION SUMMARY
- Records Scanned: 5,432 accounts
- High Confidence Duplicates: 23 sets
- Recommendation: Start with "Contoso Inc." ↔ "Contoso Corporation" (95% match)
```

### Example 2: Check Specific Record for Duplicates

**User says:** "Find duplicates of Fabrikam Inc."

**Actions:**
1. Retrieve Fabrikam Inc. record details
2. Compare against all other active accounts
3. Apply weighted scoring across all matching fields
4. Return ranked list of potential matches

**Result:**
```
Potential duplicates of "Fabrikam Inc.":
1. Fabrikam Incorporated (87% match) - Same phone, similar address
2. The Fabrikam Group (72% match) - Same website domain
```

### Example 3: Contact Duplicate with Nickname

**User says:** "Are there duplicate contacts named Bob at Contoso?"

**Actions:**
1. Search contacts at Contoso account
2. Expand "Bob" to include Robert, Rob, Bobby
3. Compare phone and email across matches
4. Present findings with merge recommendation

**Result:**
```
Found potential duplicate:
- "Bob Smith" (created 2024-01-15) - 3 activities
- "Robert Smith" (created 2023-06-01) - 12 activities
Recommendation: Merge into "Robert Smith" (more complete record)
```

## Troubleshooting

### Error: Too many potential duplicates returned
**Cause:** Matching threshold too relaxed or common names in dataset
**Solution:**
- Increase confidence threshold to 85%+
- Require 2+ matching signals for flagging
- Filter by specific criteria (industry, geography)

### Error: Known duplicates not detected
**Cause:** Significant spelling variations or missing data in fields
**Solution:**
- Check if key fields (email, phone) are populated
- Review nickname mappings for person names
- Consider adding custom abbreviation rules

### Error: False positives flagged as duplicates
**Cause:** Common names or parent/subsidiary relationships
**Solution:**
- Review address and account hierarchy
- Check if records are intentionally separate (branches, divisions)
- Mark as "Not a Duplicate" to exclude from future scans

### Fuzzy Matching Best Practices