fbpx

How to Find and Remove Duplicates in Excel

In working with data from any source, you need to make sure you don't have duplicate values. In many systems, this can be particularly cumbersome. In my favorite manual data processing tool, Excel, there are a few ways to simplify this process. I've used this technique to remove duplicates from Excel for years for its simple nature and easy execution. If you have another way I'd love to hear it in the comments below. Enjoy!

Step 1 – Sort Data

Sort data in ascending order based on the key field. This is necessary for step 2 in which we count things up. Having a key field also ensures we have something that should be unique and we can use to identify our duplicates. It's important that a duplicate value in your key field is truly a duplicate. Sometimes the key field might not be truly unique, or your data may have additional attributes that you need to include in a new compound-key.

Step 2 – Count Keys

  • In a blank column adjacent to the key field (eg. OrderID in Col B) enter the following formula in cell A2.
    =IF(COUNTIF($B$1:B1,B2)>0,1,0)

The idea here is that we count how many time a key shows up. Make sure when you copy the formula down that the first cell reference in the CountIF() formula is locked and doesn't change. This will show you where the duplicates exist. If a duplicate of your key does NOT exist then you will see zero, if not you'll see 1 indicating you have a duplicate value in your data.

Step 3 – Add a Filter

Add a filter to your data, and look for any values of 1 in the column indicating duplicates. From here you can now selet these rows and delete them. At this point your data should be free of duplicates. If you're unsure sort your data again and check the formula from Step 2.

Complete Steps

  • Sort data in ascending order based on the key field
  • In a blank column adjacent to the key field (eg. OrderID in Col B) enter the following formula in cell A2.
    =IF(COUNTIF($B$1:B1,B2)>0,1,0)
  • Add a filter to your data, and look for any values of 1 in the column indicating duplicates
  • Select all the rows you've filtered here (eg. the duplicates) and then delete them
  • Revel in your accomplishment

This method works for identifying a simple ID field that is duplicated. If you don't have a unique ID or you need to create one complete this simple step then follow the previous steps described

Create a Compound Key

  • Add a column for the key (typically to the left of the entire data set)
  • In that column enter a formula to create the key. In this example, I'm combining the Order Date (Column D) and Customer Name (Column L).
    =D2&L2
  • Follow the above steps to add a column to identify the duplicate values and then remove them
  • Revel in your accomplishment

Download and View Solution



1 Comment

Comments are closed.