Filter unique values ​​or remove duplicate values. Removing duplicate rows in Excel How to remove duplicate rows in excel

Excel has several ways to filter unique values ​​and also remove duplicate values.

Learn about filtering unique values ​​and removing duplicate values

Filtering unique values ​​and are two similar tasks because the goal is to provide a list of unique values. However, when filtering unique values, you should temporarily hide duplicate values. However, deleting duplicate values ​​means permanently deleting duplicate values.

A duplicate value is one for which all values ​​in at least one row are identical to all values ​​in another row. Comparison of duplicate values ​​is based on what is displayed in the cell, not based on the value stored in the cell. For example, if you have the same date value in different cells, such as "3/8/2006" and another "Mar 8, 2006", the values ​​will be unique.

Check before removing duplicates: Before removing duplicate values, it is recommended that you first try filtering (or conditionally formatting by - unique values) to confirm that the expected results are achieved.

Filtering unique values

To filter a range of cells or tables by location, follow these steps:

    Click the button Filter the list in place.

To copy the filter results to another location, follow these steps:

The unique values ​​from the range will be copied to the new location.

Removing Duplicate Values

Because you are permanently deleting data, it is recommended that you copy the original range of cells or table to another sheet or workbook before deleting duplicate values.

Follow the steps below.

Issues encountered when removing duplicates from structured or intermediate data

It is not possible to remove duplicate values ​​from structured structure data or with subtotals. To remove duplicates, you must remove the structure and subtotals. For more information, see Structuring a list of data in a worksheet and Removing subtotals.

Conditional formatting of unique and duplicate values

Note: You cannot conditionally format fields in the Values ​​area of ​​a PivotTable report with unique or duplicate values.

Quick formatting

Follow the steps below.


Advanced Formatting

Follow the steps below.

In Excel for the web, you can remove duplicate values.

Removing Duplicate Values

When you delete duplicate values, the only effect is on the values ​​in the range of cells or table. Other values ​​outside the range of cells or table are not changed or moved. Removing duplicates preserves the first occurrence of a value in the list and also removes other identical values.

Important: You can always press a button cancel to return data after removing duplicates. On the other hand, before deleting duplicate values, it is recommended that you copy the original range of cells or table to a worksheet or another workbook.

Follow the steps below.

    Select a range of cells or make sure the active cell is in the table.

    On the tab data click the button Remove duplicates.

    In the dialog box, clear the check box for columns where you do not want to remove duplicate values.

    Note: Data will be removed from all columns, even if all columns are not selected at this stage. For example, if you select Column1 and Column2 (but not Column3), then the "Key" used to find duplicates is the value of both Column1 & Column2. If a duplicate is found in Column1 and Column2, the entire row will be deleted, including the data in Column3.

When you work with a huge amount of data in Excel, it is very easy to accidentally make a mistake and enter the same data again. This is how duplicates arise. This, of course, increases the volume of all summary information, and in some cases confuses the formulas that are supposed to summarize and calculate various variables. You can find duplicates manually, but why, if there are several ways to remove duplicate rows automatically in Excel.

Method 1: Standard removal of duplicates

The simplest way, like a line, is to use the appropriate tool located in the ribbon.

So, you need:

  1. Holding down the left mouse button, select the required range of cells within which to search and automatically delete duplicate rows.
  2. Go to the "Data" tab.
  3. Click on the "Remove Duplicates" tool, which is located in the "Data Tools" group.
  4. In the window that appears, check the box next to “My data contains headers.”
  5. Check the boxes next to the names of the columns in which the search will be carried out. Please note that if all the boxes are checked, rows that are completely repeated in all columns will be considered a duplicate. Thus, if you want to remove duplicates from one column, then you need to leave only one checkbox next to its name.
  6. Click "OK".

As soon as you click on the button, the search for duplicates of the selected area will begin, and then delete them. As a result, a window will appear in which the report will be displayed. Now you know how to remove duplicate rows in Excel, but only the first method, the second is next.

Method 2: Using a Smart Table

Duplicates can be removed in a similar way as described in this article. This time, in the story of how to remove duplicate rows in Excel, a “smart table” will be used.

To remove duplicates in Excel, you need to do the following:

  1. As in the previous method, first select the range of cells where you need to remove duplicates.
  2. Click the "Format as Table" button, which is located on the "Home" tab in the "Styles" tool group.
  3. Select the style you like (any) from the drop-down menu.
  4. In the window that appears, you need to confirm the previously specified range of cells, and if it does not match, then reassign it. Also check the box next to “Table with headers”, if so, click the “OK” button.
  5. The Smart Table has been created, but that's not all. Now you need to select any table cell so that the “Designer” item appears in the tabs and go directly to this tab.
  6. In the tool ribbon, click the "Remove duplicates" button.

After this, a window for removing duplicate lines will appear. It will be similar to what was presented in the first method, so carry out all subsequent actions in accordance with the first instructions.

Conclusion

So we looked at two ways to delete rows with duplicate values ​​in Excel. As you can see, there is nothing complicated about this, and using the instructions, you can complete this operation in a few seconds. The example was given in the 2016 version of the program, but you can delete duplicate rows in Excel 2010 and other versions in the same way.

Finding and manually removing duplicate values ​​in an Excel spreadsheet, especially in large documents, is a very risky and impractical task. It is possible that when you visually check the cells, you may miss duplicates. And then, a colossal amount of time is spent deleting each cell.

In this article, we will look at how you can remove duplicates in Excel using various automated standard functions.

Advice! To make it easier to familiarize yourself with ways to remove duplicate rows, columns, and cell values, open a draft template or a copy of the project you are working on in the editor, so that by mistake during editing you do not lose valuable data or break the format of the original.

Method No. 1

1. While holding down the left mouse button, use the cursor to mark the boundaries of the table area (individual rows, columns) where repetitions need to be removed. Or select the entire project.

2. In the editor menu, go to the “Data” tab.

3. In the “Working with Data” block, click the “Remove Duplicates” button.

4. In the panel that appears, select the columns in which you want to remove identical values ​​by setting labels. If there are a lot of elements, use the “Select All” and “Deselect” options to quickly configure deletion. Click OK.

5. Once the table rows and columns are cleared of duplicates, a message will appear indicating how many unique values ​​are left.

Advice! Repetitions can be restored immediately after deletion by clicking the left arrow icon in the upper left corner of Excel.

Method No. 2

1. Click on the table you are editing.

2. On the “Data” tab, in the “Sorting and Filter” block, click the “Advanced” subsection.

If you need to create a new table containing only unique source cells:

1. In the “Advanced Filter” panel, click the “Copy result to another location” radio button.

2. Click the button located on the right side of the “Place result in range” field.

3. Click on a free Excel workspace where you will need to place the filtered table. After clicking, the cell code will appear in the field. Close it and go to filter options.

4. Click the “Unique records only” box and click “OK”.

5. After filtering, a version of the original table without duplicates will appear in the specified location.

To edit a document without making copies:

  • in the “Advanced Filter” panel, set the processing mode to “Filter list in place”;
  • Click the mouse to enable the “Only unique records” add-on;
  • Click “OK”.

When the task of optimizing a database arises or its structure changes, sometimes a related task arises of organizing the already accumulated data. It is good if the table is already in a normal form during development, and the entire system is organized in such a way that it does not accumulate unnecessary duplicate information. If this is not the case, then when finalizing such a system, you want to get rid of all redundant data and do everything with the highest quality.

In this article we will consider the task of removing duplicate rows in a database table. I would like to note right away that we are talking about the need to remove duplicate lines. For example, records in the order table with the fields “order code”, “product code”, “customer code”, “order date” can differ only in the order code, since one customer can order the same product several times on the same day once. And the main indicator here that everything is correct is the presence of a key field.

If we see a table full of duplicate fields, with no clear need for each entry, then this is exactly what needs to be fixed.

An example of a clearly redundant table:

Now let's look at how we can solve this problem. Several methods can be used here.


1. You can write a function to compare and iterate through all the data. It takes a long time, and you don’t always want to write code for one-time use.


2. Another solution is to create a select query that groups the data so that only unique rows are returned:

SELECT country_id, city_name
FROM mytable
GROUP BY country_id, city_name

We get the following sample:

Then we write the resulting data set into another table.


3. These solutions use additional program code or additional tables. However, it would be more convenient to do everything using only SQL queries without additional tables. And here is an example of such a solution:

DELETE a.* FROM mytable a,
(SELECT

FROM mytable b

)c
WHERE
a.country_id = c.country_id
AND a.city_name = c.city_name
AND a.id > c.mid

After executing such a query, only unique records will remain in the table:

Now let's take a closer look at how it all works. When requesting deletion, you must set a condition that will indicate which data should be deleted and which should be left. We need to remove all non-unique entries. Those. if there are several identical records (they are the same if they have equal country_id and city_name values), then you need to take one of the lines, remember its code and delete all records with the same country_id and city_name values, but a different code (id).

SQL query string:

DELETE a.* FROM mytable a,

indicates that the deletion will be performed from the mytable table.

The select query then generates a auxiliary table where we group the records so that all records are unique:

(SELECT
b.country_id, b.city_name, MIN(b.id) mid
FROM mytable b
GROUP BY b.country_id, b.city_name
)c

MIN(b.id) mid – forms the column mid (abbreviation min id), which contains the minimum id value in each subgroup.

The result is a table containing unique records and the first row id for each group of duplicate records.

Now we have two tables. One general one containing all records. Extra lines will be removed from it. The second contains information about the rows that need to be saved.

All that remains is to create a condition that states: you need to delete all lines where the country_id and city_name fields match, but the id will not match. In this case, the minimum id value is selected, so all records whose id is greater than the one selected in the temporary table are deleted.


It is also worth noting that the described operation can be performed if there is a key field in the table. If you suddenly come across a table without a unique identifier, then simply add it:

ALTER TABLE ` mytable` ADD `id` INT(11) NOT NULL AUTO_INCREMENT , ADD PRIMARY KEY (`id`)

By running such a query, we get an additional column filled with unique numeric values ​​for each row of the table.

We carry out all the necessary actions. After the operation to clear the table of duplicate records is completed, this field can also be deleted.

Duplicate data in Excel can lead to many problems when working with data. It doesn't matter whether you are importing data from any database, receiving it from a colleague or friends. The more data in your file, the more difficult it is to find and remove duplicates in Excel.

In this article, we will take a closer look at effective practices for finding and removing duplicates.

Finding and highlighting duplicates with color in Excel

Duplicates in tables can occur in different forms. These can be duplicate values ​​in one column or in several, as well as in one or more rows.

Finding and highlighting duplicates in one column in Excel

The easiest way to find and highlight duplicates in Excel is to use conditional formatting.

How to do it:

  • Let's select the area with the data in which we need to find duplicates:
  • On the “Home” tab on the Toolbar, click on the menu item “Conditional Formatting” -> “Rules for highlighting cells” -> “Repeating values”:
  • In the pop-up dialog box, select “Repeated” in the left drop-down list, and in the right drop-down list, select what color duplicate values ​​will be highlighted. Click “OK” button:
  • After this, in the selected column, duplicates will be highlighted in color:

Clue: Don't forget to check your table data for extra spaces. To do this, it is better to use the TRIM function.

Finding and highlighting duplicates in multiple columns in Excel

If you need to calculate duplicates in multiple columns, then the process for calculating them is the same as in the example described above. The only difference is that for this you need to select not one column, but several:

  • Select the columns with data in which you need to find duplicates;
  • On the “Home” tab on the Toolbar, click on the menu item “Conditional Formatting” -> “Rules for highlighting cells” -> “Repeating values”;
  • In the pop-up dialog box, select “Repeating” in the left drop-down list, and in the right drop-down list, select what color repeating values ​​will be highlighted in. Click “OK” button:
  • After this, duplicates in the selected column will be highlighted in color:

Find and highlight duplicate rows in Excel

Finding duplicates of duplicate cells and entire rows of data are different concepts. Please note the two tables below:

The tables above contain the same data. Their difference is that in the example on the left we were looking for duplicate cells, and on the right we found entire repeating lines with data.

Let's look at how to find duplicate rows:

  • To the right of the table with the data, we will create an auxiliary column in which, opposite each row with the data, we will put a formula that combines all the values ​​of the table row into one cell:

=A2&B2&C2&D2

In the auxiliary column you will see the combined table data:

Now, to identify duplicate rows in the table, do the following steps:

  • Select the area with data in the auxiliary column (in our example this is a range of cells E2:E15 );
  • On the “Home” tab on the Toolbar, click on the menu item “Conditional Formatting” -> “Rules for highlighting cells” -> “Repeating values”;
  • In the pop-up dialog box, select “Repeating” in the left drop-down list, and in the right drop-down list, select what color the repeating values ​​will be highlighted in. Click “OK” button:
  • After this, duplicate lines will be highlighted in the selected column:

In the example above, we highlighted the lines in the created auxiliary column.

But what if we need to highlight the rows not in the auxiliary column, but the rows themselves in the data table?

To do this, let's do the following:

  • Just like in the example above, we will create an auxiliary column, in each row of which we will enter the following formula:

=A2&B2&C2&D2

Thus, we will receive the collected data of the entire table row in one cell:

  • Now, let's select all the table data (except for the auxiliary column). In our case these are range cells A2:D15 ;
  • Then, on the “Home” tab on the Toolbar, click on the “Conditional Formatting” -> “Create Rule” item:
  • In the “Create a formatting rule” dialog box, click on the “Use a formula to determine the cells to format” option and in the “Format values ​​for which the following formula is true” field, insert the formula:

=COUNTIF($E$2:$E$15,$E2)>1

  • Don't forget to set the format of the duplicate lines found.

This formula checks the range of data in the auxiliary column and, if there are duplicate rows, highlights them in color in the table:

How to remove duplicates in Excel

Above we learned how to find duplicates and how to highlight them with color. Below you will learn how to remove them.

How to remove duplicates in one Excel column

If your data is located in one column and you want to remove all duplicates, then do the following:

  • Select the data;
  • Go to the Toolbar, “Data” tab –> “Working with data” section -> “Remove duplicates”:
  • In the Remove Duplicates dialog box, check the box next to “My data contains headers” if the column range you selected has a header. Also, make sure that the column you need is selected in the “Columns” menu:
  • Click “OK”

After this, the system will remove all duplicates in the column, leaving only unique values.

Advice. Be sure to back up your data before any de-duplication operations. You can also delete duplicates on separate sheets to avoid accidentally deleting data.

How to Remove Duplicates in Multiple Columns in Excel

Let's imagine that we have sales data like the table below.