Microsoft Excel 2024 Comprehensive Course — Beginner to Intermediate
X
Excel 2024 Microsoft 365
📗 34 Modules Foundations Data Entry Formatting Formulas Functions Charts Pivot Tables Data Analysis Protection Automation

❖ Module 24: Removing Duplicates & Conditional Formatting

Data quality issues cost South African organisations significant time and money. Duplicate records cause double-counting in financial reports, inflate headcounts, and corrupt customer databases. Conditional Formatting is the primary tool for spotting data quality problems visually — highlighting duplicates, flagging values outside acceptable ranges, marking overdue items, and building heat maps that make patterns jump off the screen. This module covers every method for finding and removing duplicates, the full power of Excel's Conditional Formatting system including formula-based rules, and practical data quality workflows for SA office environments.

24.1 Finding Duplicate Values

Before removing duplicates, you first need to find and review them. Excel provides several ways to identify duplicate values.

Method 1 — Conditional Formatting Duplicate Highlight

  1. Select the column(s) to check for duplicates (e.g., A2:A200 for an ID column)
  2. Home → Conditional Formatting → Highlight Cells RulesDuplicate Values…
  3. The dialog opens:
    • First dropdown: choose Duplicate (to highlight all repeated values) or Unique (to highlight values that appear only once)
    • Second dropdown: choose the formatting (Light Red Fill, Yellow Fill, Green Fill, or Custom Format…)
  4. Click OK — all duplicate values are immediately highlighted
  5. To review them: Data → Sort the column — duplicates cluster together for easy identification

Method 2 — COUNTIF Formula to Flag Duplicates

In a helper column (e.g., B2), enter:
=COUNTIF($A$2:$A$200,A2)

→ Returns 1 for unique values, 2+ for duplicates
→ Filter the helper column to show only values > 1 to isolate all duplicate rows

Flag as text:
=IF(COUNTIF($A$2:$A$200,A2)>1,"DUPLICATE","Unique")

Count total duplicate entries:
=COUNTIF(B2:B200,"DUPLICATE")

Method 3 — COUNTIFS for Multi-Column Duplicates

When a record is a duplicate only if multiple columns match (e.g., same First Name AND Last Name AND Date of Birth):

=COUNTIFS($A$2:$A$200,A2,$B$2:$B$200,B2,$C$2:$C$200,C2)

→ Returns the count of rows where all three columns match the current row
→ Values > 1 indicate duplicate records

Method 4 — Sorting to Surface Duplicates Visually

  1. Sort the data by the column(s) that should be unique (e.g., Employee ID)
  2. Duplicate values cluster together making them immediately visible
  3. Add a formula in a helper column: =IF(A2=A3,"DUPLICATE","") to flag when adjacent rows have the same value after sorting

24.2 Removing Duplicates — Excel's Built-in Tool

Excel's Remove Duplicates tool permanently deletes duplicate rows based on matching values in columns you select. It keeps the first occurrence and removes all subsequent duplicates.

Using Remove Duplicates

  1. Click any cell inside the data range (or inside an Excel Table)
  2. Data tab → Data Tools group → Remove Duplicates
  3. — or — Table Design tab → Tools group → Remove Duplicates (when inside a Table)
  4. The Remove Duplicates dialog opens showing all columns as checkboxes
  5. Select which columns to use to identify duplicates:
    • Tick all columns — a row is only a duplicate if every column matches another row exactly
    • Tick specific columns — rows are duplicates if only those columns match (e.g., Employee ID only)
  6. Ensure "My data has headers" is ticked if row 1 contains column headers
  7. Click OK
  8. A message confirms: "X duplicate values found and removed; Y unique values remain"
Critical Warning: Remove Duplicates is permanent and cannot be undone with Ctrl+Z after the confirmation dialog closes and you perform another action. Always:
  1. Save a copy of the original data before running Remove Duplicates
  2. Use Conditional Formatting to review duplicates first
  3. Verify which columns to check — wrong column selection silently deletes data you intended to keep

Remove Duplicates vs Keep Duplicates for Analysis

ScenarioBest Approach
Clean a list by removing exact duplicate rows permanentlyRemove Duplicates (Data → Remove Duplicates)
Find and review duplicates before deciding which to keepConditional Formatting → Duplicate Values highlight + manual review
Extract a unique list to a separate locationAdvanced Filter → Copy to another location → Unique records only
Count distinct values without deleting=SUMPRODUCT(1/COUNTIF(A2:A100,A2:A100)) or a PivotTable

24.3 Conditional Formatting — Complete Overview

Conditional Formatting (CF) automatically applies formatting (fill colour, font colour, borders, icons, data bars) to cells based on rules you define. Rules can be based on cell values, formulas, or relationships between cells. When data changes, the formatting updates automatically.

All Conditional Formatting Rule Types

Rule CategoryWhat It DoesBest For
Highlight Cells RulesHighlights cells based on value comparisons: Greater Than, Less Than, Between, Equal To, Text Contains, Date Occurring, Duplicate ValuesFlagging outliers, errors, overdue items, specific values
Top/Bottom RulesHighlights the top or bottom N items or percentage: Top 10, Bottom 10, Top 10%, Bottom 10%, Above Average, Below AveragePerformance ranking, identifying best/worst performers
Data BarsIn-cell horizontal bars proportional to cell value (Module 23)Visual magnitude comparison within a column
Colour ScalesGradient colour fill from low to high value (Module 23)Heat maps, distribution visualisation
Icon SetsSmall icons (traffic lights, arrows, stars) based on value thresholds (Module 23)KPI status indicators, performance flags
New Rule — FormulaApply formatting based on any Excel formula that returns TRUE or FALSE. The most powerful and flexible CF option.Complex conditions, cross-column rules, entire row highlighting

24.4 Highlight Cells Rules — In Depth

Applying Highlight Rules

  1. Select the cells to apply the rule to
  2. Home → Conditional Formatting → Highlight Cells Rules
  3. Choose the rule type
  4. Fill in the threshold value(s) and choose the format
  5. Click OK

All Highlight Cell Rule Types

RuleCriteriaSA Use Case
Greater ThanCell value > a specified valueHighlight invoices above R100,000 in green; flag salaries over the band maximum
Less ThanCell value < a specified valueFlag stock levels below reorder point in red
BetweenCell value is between two valuesHighlight scores between 50% and 59% in orange (borderline pass)
Equal ToCell value exactly matchesHighlight all "Pending" status cells; flag zero-value invoices
Text That ContainsCell text contains a substringHighlight all cells containing "Overdue"; flag any cell with "Error"
A Date OccurringDate falls in a period relative to today (Yesterday, Today, Tomorrow, This Week, Last Month, Next Quarter, etc.)Highlight contract renewals due this week; flag items with today's date
Duplicate ValuesValue appears more than once (or Unique: appears only once)Find duplicate employee IDs, invoice numbers, product codes

Using a Cell Reference as the Threshold

Instead of typing a fixed value, you can reference a cell for the threshold — changing the cell automatically updates the rule:

Example: highlight all salaries above the target in cell $F$1
Greater Than → in the value box, type: =$F$1

Now if you change F1 from 50000 to 60000,
the conditional formatting updates automatically.

24.5 Top/Bottom Rules

Top/Bottom Rules automatically identify the highest or lowest values in a range and apply formatting to them. Unlike Greater Than rules with a fixed threshold, these rules adapt dynamically to whatever values are in the range.

Available Top/Bottom Rules

RuleWhat It Highlights
Top 10 ItemsThe N highest values (you set N). Default is 10 but any number can be entered.
Top 10%The top N% of values (you set the percentage)
Bottom 10 ItemsThe N lowest values
Bottom 10%The bottom N% of values
Above AverageAll values greater than the arithmetic mean of the range
Below AverageAll values less than the arithmetic mean of the range

Top/Bottom Rule Examples

Salesperson leaderboard — top 3 performers highlighted gold:
Select C2:C51 (revenue column) → Top/Bottom Rules → Top 10 Items → change 10 to 3 → Custom Format → gold fill

Below-average stock alert:
Select B2:B200 (stock levels) → Top/Bottom Rules → Below Average → Red Fill

Top 20% of invoices highlighted for priority processing:
Select C2:C500 (invoice amounts) → Top/Bottom Rules → Top 10% → change to 20 → Yellow Fill

24.6 Formula-Based Conditional Formatting — The Most Powerful Option

When the built-in rules are not enough, you can write any Excel formula as a conditional formatting rule. If the formula returns TRUE (or any non-zero value), the formatting is applied. This enables powerful scenarios like highlighting entire rows based on a condition in one column, comparing values across columns, or using complex logical expressions.

Creating a Formula-Based Rule

  1. Select the range to apply formatting to (e.g., A2:F51 to highlight entire rows)
  2. Home → Conditional Formatting → New Rule…
  3. Select "Use a formula to determine which cells to format"
  4. In the formula box, type a formula that evaluates to TRUE or FALSE
  5. Click Format… to set the fill colour, font, or border
  6. Click OK twice
The Reference Rule for Formula-Based CF:
The formula is written for the top-left cell of the selected range, and it automatically applies relatively to all other cells. Use mixed references to control which parts shift and which stay fixed:
  • To check a column (same column, row shifts): $A2>0 — column A locked, row 2 shifts
  • To check a row (same row, column shifts): A$2>0 — row 2 locked, column A shifts
  • To check a fixed cell: $A$2>0 — both locked

Essential Formula-Based CF Examples

GoalSelect RangeFormulaFormat
Highlight entire row where Status = "Overdue" A2:F200 =$E2="Overdue" Red fill, white text
Highlight entire row where salary > R50,000 A2:G100 =$C2>50000 Green fill
Alternating row shading (zebra stripes) A2:F200 =MOD(ROW(),2)=0 Light grey fill
Highlight rows where invoice is overdue (past due date AND unpaid) A2:F500 =AND($D2<TODAY(),$E2="Unpaid") Orange fill
Highlight cells where the value is a duplicate in its column A2:A200 =COUNTIF($A$2:$A$200,A2)>1 Red fill
Highlight cells that are blank (missing data) B2:B100 =ISBLANK(B2) Yellow fill
Highlight weekends in a date column A2:A365 =WEEKDAY(A2,2)>5 Light blue fill
Highlight if value is above the average of the column C2:C100 =C2>AVERAGE($C$2:$C$100) Green fill
Highlight rows where SA ID number is the wrong length A2:F200 =LEN($B2)<>13 Red fill
Highlight contract end dates within 30 days A2:D100 =AND($C2>=TODAY(),$C2<=TODAY()+30) Orange fill

Row Highlighting — The Most Important Pattern

Highlighting an entire row based on a value in one column is the most requested CF technique in SA offices. The key is using a mixed reference that locks the column:

Goal: Highlight every row where column E = "Cancelled"

1. Select the entire data range: A2:H200
2. New Rule → Use a formula
3. Formula: =$E2="Cancelled"
   • $E locks the column (always check column E)
   • 2 is relative (shifts down: E3, E4, E5... for each row)
4. Format: Red fill, white bold text
5. OK

Every row where column E contains "Cancelled" turns red across all 8 columns.

24.7 Managing Multiple Conditional Formatting Rules

A range can have multiple CF rules applied to it. Understanding how they interact is critical to getting the formatting you intend.

Rule Priority (Order)

  • Rules are evaluated from top to bottom in the Manage Rules list
  • The rule at the top has the highest priority
  • When two rules both apply to the same cell, the higher-priority rule's formatting wins (unless "Stop If True" is ticked)
  • Use the ▲/▼ arrows in Manage Rules to reorder rules

Stop If True

  • When ticked on a rule, Excel stops evaluating lower-priority rules for that cell if the current rule is TRUE
  • This prevents a lower rule from overriding or modifying the formatting of a higher rule
  • Example: Rule 1 (Overdue → Red). Rule 2 (Value > R10,000 → Green). Without Stop If True on Rule 1, an overdue invoice over R10,000 could get green formatting from Rule 2. With Stop If True on Rule 1, the overdue rule wins and Rule 2 never runs for that cell.

Practical Multi-Rule Setup — Invoice Status Colour Coding

Range: A2:F500 (entire invoice table)

Rule 1 (highest priority): =$E2="Overdue" → Red fill → Stop If True: YES
Rule 2: =$E2="Paid" → Green fill → Stop If True: YES
Rule 3: AND($D2<TODAY()+7,$E2="Unpaid") → Orange fill (due soon) → Stop If True: YES
Rule 4 (lowest): =$E2="Unpaid" → Yellow fill

Overdue rows = Red (Rule 1 stops; Rules 2-4 don't run)
Paid rows = Green
Due within 7 days and unpaid = Orange
All other unpaid = Yellow

24.8 SA Data Quality Workflow

A systematic approach to identifying and resolving data quality issues in SA business spreadsheets:

Step 1: Check for Duplicates

Apply CF Duplicate Values highlight to the ID column (Employee ID, Invoice No, Product Code)
Sort by the ID column to cluster duplicates
Review: is this a true duplicate or a legitimate repeat (e.g., same customer, multiple orders)?
If true duplicate: Data → Remove Duplicates
If legitimate repeat: remove the CF highlight and continue

Step 2: Check for Missing Data

For each required column, apply CF formula rule: =ISBLANK(A2) → Yellow highlight
Or use COUNTBLANK(A2:A100) to count missing entries per column
Filter each column for blanks: AutoFilter → Text Filters → Custom → Equals (leave blank)
For SA IDs: =LEN($B2)<>13 → Red highlight (wrong length)

Step 3: Check for Out-of-Range Values

Salary out of band: =OR($C2<MinWage,$C2>MaxSalary) → Red highlight
Date in future (unexpected): =$D2>TODAY() → Orange highlight
Negative quantity (invalid): =$E2<0 → Red highlight
Score out of range: =OR($B2<0,$B2>100) → Red highlight

Step 4: Check for Inconsistent Text

Extra spaces: apply TRIM to a helper column, compare with original
Mixed case: PROPER(TRIM(A2)) in helper column → compare
Common misspellings in a category column: use COUNTIF with a valid values list
=COUNTIF(ValidList,A2)=0 → Red highlight (value not in the approved list)

24.9 Quick Self-Check

Q1: An employee database has 500 rows. You suspect there are duplicate Employee IDs in column A. Describe two methods to identify them before deciding whether to delete them.

✓ Method 1 (Conditional Formatting): Select A2:A500 → Home → Conditional Formatting → Highlight Cells Rules → Duplicate Values → choose Red fill → OK. All IDs appearing more than once are immediately highlighted red. Sort column A to cluster duplicates together for visual review. Method 2 (COUNTIF formula): in a helper column B, enter =COUNTIF($A$2:$A$500,A2) and copy down. Values of 2 or higher indicate duplicate IDs. Add a further formula =IF(B2>1,"DUPLICATE","") then filter column B to show only "DUPLICATE" rows for review. Using COUNTIF lets you see which IDs appear how many times, giving more information than CF alone before you decide what to delete.

Q2: You have an invoice table with columns: Invoice# (A), Date (B), Customer (C), Amount (D), Status (E). You want to highlight the entire row in orange for all invoices that are both Unpaid AND have a date more than 30 days in the past. Write the formula for the conditional formatting rule and describe the range to apply it to.

✓ Select the entire table range: A2:E500 (or however many rows). New Rule → Use a formula. Formula: =AND($E2="Unpaid",$B2<TODAY()-30). $E2 (column E locked, row shifts) checks if Status is "Unpaid"; $B2 (column B locked, row shifts) checks if the date is more than 30 days ago. AND ensures both conditions must be true. Format: orange fill. Click OK. Every row where Status = "Unpaid" AND Date is older than 30 days turns orange across all 5 columns. The column-locked references ($E2, $B2) ensure that as the rule applies to each row, it always checks the same columns E and B for that row.

Q3: What is the "Stop If True" option in Conditional Formatting Rules Manager, and why would you use it?

✓ "Stop If True" tells Excel to stop evaluating lower-priority rules for a cell if the current rule's condition is TRUE and its formatting has been applied. Without it, multiple rules can fire on the same cell, and lower-priority rules may override or partially overwrite the higher-priority formatting. Example: Rule 1 (Overdue → Red, Stop If True = YES) and Rule 2 (Amount > 50000 → Green). An overdue invoice for R80,000 would normally trigger both rules. Without Stop If True on Rule 1, Rule 2 would also run and might change the fill to green, overriding the red overdue warning. With Stop If True on Rule 1, Excel applies red and immediately stops — Rule 2 never runs for that cell. Use Stop If True when a higher-priority condition should "win" completely and prevent any other formatting from applying.

Q4: You want alternating row shading (grey on even rows, white on odd rows) for a data range A2:G200 using Conditional Formatting. Write the formula and explain how it works.

✓ Select A2:G200 → New Rule → Use a formula → =MOD(ROW(),2)=0 → Format: light grey fill. ROW() returns the current row number for each cell being evaluated. MOD(ROW(),2) calculates the remainder when the row number is divided by 2: even rows return 0, odd rows return 1. So MOD(ROW(),2)=0 is TRUE for even rows (2, 4, 6, 8...) and FALSE for odd rows (1, 3, 5, 7...). Even rows get the grey fill; odd rows get no special formatting (they stay white/unformatted). There is no cell reference in this formula at all — it evaluates independently for each cell based purely on which row it is in.

Q5: A column of product codes (column B) should each be 8 characters long (e.g., "PROD0001"). How do you use Conditional Formatting to highlight any codes that are the wrong length?

✓ Select B2:B200 (the product code column) → Home → Conditional Formatting → New Rule → Use a formula → enter: =LEN(B2)<>8. Format: Red fill → OK. LEN(B2) counts the number of characters in the cell. <>8 means "not equal to 8". The formula returns TRUE for any cell where the product code is not exactly 8 characters — shorter codes, longer codes, or blank cells all get the red highlight. Note that B2 in the formula is a relative reference (no $ signs) because we want it to shift to B3, B4, etc. as the rule applies to each row in the selection.

Q6: You use Remove Duplicates on a 1,000-row dataset and Excel reports "150 duplicate values found and removed; 850 unique values remain." Immediately afterwards, you realise you should have checked the "Customer Name" column in addition to the "Invoice Number" column in the Remove Duplicates dialog. Can you undo this? What should you do next time?

✓ If you pressed Ctrl+Z immediately (before doing anything else) and the undo is still available in the undo history, you can undo. However, if you closed the dialog, clicked elsewhere, or performed any action, the removal may be permanent. Remove Duplicates warns that it cannot be undone once the confirmation dialog is dismissed and subsequent actions are performed. Correct recovery: restore from the backup copy of the file you should have made before running the operation. This is why the critical workflow is: (1) Save a copy of the original file (File → Save As with a new name, or copy the sheet to a new workbook) BEFORE running Remove Duplicates. (2) Use Conditional Formatting to review duplicates first. (3) Only run Remove Duplicates after carefully reviewing which rows are genuine duplicates vs legitimate repeats. Next time: in the Remove Duplicates dialog, tick all columns that together define a unique record — in this case both "Invoice Number" AND "Customer Name" to prevent removing an invoice number that appears for two different customers.

✓ Module 24 Complete — You Have Learned:

  • Finding duplicates — 4 methods: CF Highlight → Duplicate Values; COUNTIF helper column (=COUNTIF($A$2:$A$200,A2) > 1 = duplicate); COUNTIFS for multi-column duplicates; sorting to cluster and =IF(A2=A3,"DUPLICATE","") to flag adjacent matches
  • Remove Duplicates tool — Data → Remove Duplicates or Table Design → Remove Duplicates; column selection (all = exact row duplicate; specific = key column match); My data has headers; confirmation message; critical warning (permanent, cannot undo — always save a copy first)
  • Remove Duplicates vs alternatives — comparison of Remove Duplicates, CF highlight for review, Advanced Filter for unique list extraction, SUMPRODUCT for distinct count
  • Conditional Formatting overview — all 6 rule categories (Highlight Cells, Top/Bottom, Data Bars, Colour Scales, Icon Sets, Formula-based)
  • Highlight Cells Rules — all 7 types (Greater Than, Less Than, Between, Equal To, Text Contains, Date Occurring, Duplicate/Unique); using a cell reference as threshold (=$F$1 for dynamic threshold)
  • Top/Bottom Rules — all 6 options (Top 10 Items, Top 10%, Bottom 10 Items, Bottom 10%, Above Average, Below Average); dynamic — adapts when data changes
  • Formula-based CF — New Rule → Use a formula; reference rule (formula for top-left cell, applies relatively to all cells); mixed references for row/column locking ($E2 for column check, A$2 for row check); 10-formula example table including entire row highlight, zebra stripes (MOD(ROW(),2)=0), duplicate detection (COUNTIF), blank detection (ISBLANK), weekend highlight (WEEKDAY), SA ID length check, contract expiry warning, overdue AND unpaid, above average
  • Row highlighting pattern — select entire row range (A2:H200), formula with locked column ($E2="Cancelled"), row number shifts per row
  • Multiple CF rules — priority order (top = highest); conflict resolution; Stop If True (prevents lower rules from running); 4-level invoice colour-coding example
  • SA data quality workflow — 4-step process: duplicate check, missing data (ISBLANK + COUNTBLANK), out-of-range values (OR with min/max), inconsistent text (TRIM/PROPER + valid list check)

← Back to All Modules