Data quality issues cost South African organisations significant time and money. Duplicate records cause double-counting in financial reports, inflate headcounts, and corrupt customer databases. Conditional Formatting is the primary tool for spotting data quality problems visually — highlighting duplicates, flagging values outside acceptable ranges, marking overdue items, and building heat maps that make patterns jump off the screen. This module covers every method for finding and removing duplicates, the full power of Excel's Conditional Formatting system including formula-based rules, and practical data quality workflows for SA office environments.
Before removing duplicates, you first need to find and review them. Excel provides several ways to identify duplicate values.
When a record is a duplicate only if multiple columns match (e.g., same First Name AND Last Name AND Date of Birth):
=IF(A2=A3,"DUPLICATE","") to flag when adjacent rows have the same value after sortingExcel's Remove Duplicates tool permanently deletes duplicate rows based on matching values in columns you select. It keeps the first occurrence and removes all subsequent duplicates.
| Scenario | Best Approach |
|---|---|
| Clean a list by removing exact duplicate rows permanently | Remove Duplicates (Data → Remove Duplicates) |
| Find and review duplicates before deciding which to keep | Conditional Formatting → Duplicate Values highlight + manual review |
| Extract a unique list to a separate location | Advanced Filter → Copy to another location → Unique records only |
| Count distinct values without deleting | =SUMPRODUCT(1/COUNTIF(A2:A100,A2:A100)) or a PivotTable |
Conditional Formatting (CF) automatically applies formatting (fill colour, font colour, borders, icons, data bars) to cells based on rules you define. Rules can be based on cell values, formulas, or relationships between cells. When data changes, the formatting updates automatically.
| Rule Category | What It Does | Best For |
|---|---|---|
| Highlight Cells Rules | Highlights cells based on value comparisons: Greater Than, Less Than, Between, Equal To, Text Contains, Date Occurring, Duplicate Values | Flagging outliers, errors, overdue items, specific values |
| Top/Bottom Rules | Highlights the top or bottom N items or percentage: Top 10, Bottom 10, Top 10%, Bottom 10%, Above Average, Below Average | Performance ranking, identifying best/worst performers |
| Data Bars | In-cell horizontal bars proportional to cell value (Module 23) | Visual magnitude comparison within a column |
| Colour Scales | Gradient colour fill from low to high value (Module 23) | Heat maps, distribution visualisation |
| Icon Sets | Small icons (traffic lights, arrows, stars) based on value thresholds (Module 23) | KPI status indicators, performance flags |
| New Rule — Formula | Apply formatting based on any Excel formula that returns TRUE or FALSE. The most powerful and flexible CF option. | Complex conditions, cross-column rules, entire row highlighting |
| Rule | Criteria | SA Use Case |
|---|---|---|
| Greater Than | Cell value > a specified value | Highlight invoices above R100,000 in green; flag salaries over the band maximum |
| Less Than | Cell value < a specified value | Flag stock levels below reorder point in red |
| Between | Cell value is between two values | Highlight scores between 50% and 59% in orange (borderline pass) |
| Equal To | Cell value exactly matches | Highlight all "Pending" status cells; flag zero-value invoices |
| Text That Contains | Cell text contains a substring | Highlight all cells containing "Overdue"; flag any cell with "Error" |
| A Date Occurring | Date falls in a period relative to today (Yesterday, Today, Tomorrow, This Week, Last Month, Next Quarter, etc.) | Highlight contract renewals due this week; flag items with today's date |
| Duplicate Values | Value appears more than once (or Unique: appears only once) | Find duplicate employee IDs, invoice numbers, product codes |
Instead of typing a fixed value, you can reference a cell for the threshold — changing the cell automatically updates the rule:
Top/Bottom Rules automatically identify the highest or lowest values in a range and apply formatting to them. Unlike Greater Than rules with a fixed threshold, these rules adapt dynamically to whatever values are in the range.
| Rule | What It Highlights |
|---|---|
| Top 10 Items | The N highest values (you set N). Default is 10 but any number can be entered. |
| Top 10% | The top N% of values (you set the percentage) |
| Bottom 10 Items | The N lowest values |
| Bottom 10% | The bottom N% of values |
| Above Average | All values greater than the arithmetic mean of the range |
| Below Average | All values less than the arithmetic mean of the range |
When the built-in rules are not enough, you can write any Excel formula as a conditional formatting rule. If the formula returns TRUE (or any non-zero value), the formatting is applied. This enables powerful scenarios like highlighting entire rows based on a condition in one column, comparing values across columns, or using complex logical expressions.
$A2>0 — column A locked, row 2 shiftsA$2>0 — row 2 locked, column A shifts$A$2>0 — both locked| Goal | Select Range | Formula | Format |
|---|---|---|---|
| Highlight entire row where Status = "Overdue" | A2:F200 | =$E2="Overdue" |
Red fill, white text |
| Highlight entire row where salary > R50,000 | A2:G100 | =$C2>50000 |
Green fill |
| Alternating row shading (zebra stripes) | A2:F200 | =MOD(ROW(),2)=0 |
Light grey fill |
| Highlight rows where invoice is overdue (past due date AND unpaid) | A2:F500 | =AND($D2<TODAY(),$E2="Unpaid") |
Orange fill |
| Highlight cells where the value is a duplicate in its column | A2:A200 | =COUNTIF($A$2:$A$200,A2)>1 |
Red fill |
| Highlight cells that are blank (missing data) | B2:B100 | =ISBLANK(B2) |
Yellow fill |
| Highlight weekends in a date column | A2:A365 | =WEEKDAY(A2,2)>5 |
Light blue fill |
| Highlight if value is above the average of the column | C2:C100 | =C2>AVERAGE($C$2:$C$100) |
Green fill |
| Highlight rows where SA ID number is the wrong length | A2:F200 | =LEN($B2)<>13 |
Red fill |
| Highlight contract end dates within 30 days | A2:D100 | =AND($C2>=TODAY(),$C2<=TODAY()+30) |
Orange fill |
Highlighting an entire row based on a value in one column is the most requested CF technique in SA offices. The key is using a mixed reference that locks the column:
A range can have multiple CF rules applied to it. Understanding how they interact is critical to getting the formatting you intend.
A systematic approach to identifying and resolving data quality issues in SA business spreadsheets:
Q1: An employee database has 500 rows. You suspect there are duplicate Employee IDs in column A. Describe two methods to identify them before deciding whether to delete them.
✓ Method 1 (Conditional Formatting): Select A2:A500 → Home → Conditional Formatting → Highlight Cells Rules → Duplicate Values → choose Red fill → OK. All IDs appearing more than once are immediately highlighted red. Sort column A to cluster duplicates together for visual review. Method 2 (COUNTIF formula): in a helper column B, enter =COUNTIF($A$2:$A$500,A2) and copy down. Values of 2 or higher indicate duplicate IDs. Add a further formula =IF(B2>1,"DUPLICATE","") then filter column B to show only "DUPLICATE" rows for review. Using COUNTIF lets you see which IDs appear how many times, giving more information than CF alone before you decide what to delete.
Q2: You have an invoice table with columns: Invoice# (A), Date (B), Customer (C), Amount (D), Status (E). You want to highlight the entire row in orange for all invoices that are both Unpaid AND have a date more than 30 days in the past. Write the formula for the conditional formatting rule and describe the range to apply it to.
✓ Select the entire table range: A2:E500 (or however many rows). New Rule → Use a formula. Formula: =AND($E2="Unpaid",$B2<TODAY()-30). $E2 (column E locked, row shifts) checks if Status is "Unpaid"; $B2 (column B locked, row shifts) checks if the date is more than 30 days ago. AND ensures both conditions must be true. Format: orange fill. Click OK. Every row where Status = "Unpaid" AND Date is older than 30 days turns orange across all 5 columns. The column-locked references ($E2, $B2) ensure that as the rule applies to each row, it always checks the same columns E and B for that row.
Q3: What is the "Stop If True" option in Conditional Formatting Rules Manager, and why would you use it?
✓ "Stop If True" tells Excel to stop evaluating lower-priority rules for a cell if the current rule's condition is TRUE and its formatting has been applied. Without it, multiple rules can fire on the same cell, and lower-priority rules may override or partially overwrite the higher-priority formatting. Example: Rule 1 (Overdue → Red, Stop If True = YES) and Rule 2 (Amount > 50000 → Green). An overdue invoice for R80,000 would normally trigger both rules. Without Stop If True on Rule 1, Rule 2 would also run and might change the fill to green, overriding the red overdue warning. With Stop If True on Rule 1, Excel applies red and immediately stops — Rule 2 never runs for that cell. Use Stop If True when a higher-priority condition should "win" completely and prevent any other formatting from applying.
Q4: You want alternating row shading (grey on even rows, white on odd rows) for a data range A2:G200 using Conditional Formatting. Write the formula and explain how it works.
✓ Select A2:G200 → New Rule → Use a formula → =MOD(ROW(),2)=0 → Format: light grey fill. ROW() returns the current row number for each cell being evaluated. MOD(ROW(),2) calculates the remainder when the row number is divided by 2: even rows return 0, odd rows return 1. So MOD(ROW(),2)=0 is TRUE for even rows (2, 4, 6, 8...) and FALSE for odd rows (1, 3, 5, 7...). Even rows get the grey fill; odd rows get no special formatting (they stay white/unformatted). There is no cell reference in this formula at all — it evaluates independently for each cell based purely on which row it is in.
Q5: A column of product codes (column B) should each be 8 characters long (e.g., "PROD0001"). How do you use Conditional Formatting to highlight any codes that are the wrong length?
✓ Select B2:B200 (the product code column) → Home → Conditional Formatting → New Rule → Use a formula → enter: =LEN(B2)<>8. Format: Red fill → OK. LEN(B2) counts the number of characters in the cell. <>8 means "not equal to 8". The formula returns TRUE for any cell where the product code is not exactly 8 characters — shorter codes, longer codes, or blank cells all get the red highlight. Note that B2 in the formula is a relative reference (no $ signs) because we want it to shift to B3, B4, etc. as the rule applies to each row in the selection.
Q6: You use Remove Duplicates on a 1,000-row dataset and Excel reports "150 duplicate values found and removed; 850 unique values remain." Immediately afterwards, you realise you should have checked the "Customer Name" column in addition to the "Invoice Number" column in the Remove Duplicates dialog. Can you undo this? What should you do next time?
✓ If you pressed Ctrl+Z immediately (before doing anything else) and the undo is still available in the undo history, you can undo. However, if you closed the dialog, clicked elsewhere, or performed any action, the removal may be permanent. Remove Duplicates warns that it cannot be undone once the confirmation dialog is dismissed and subsequent actions are performed. Correct recovery: restore from the backup copy of the file you should have made before running the operation. This is why the critical workflow is: (1) Save a copy of the original file (File → Save As with a new name, or copy the sheet to a new workbook) BEFORE running Remove Duplicates. (2) Use Conditional Formatting to review duplicates first. (3) Only run Remove Duplicates after carefully reviewing which rows are genuine duplicates vs legitimate repeats. Next time: in the Remove Duplicates dialog, tick all columns that together define a unique record — in this case both "Invoice Number" AND "Customer Name" to prevent removing an invoice number that appears for two different customers.